What’s all this FEC stress stuff?
With the advent of 400G Ethernet, it has become critically important to fully understand and validate Forward Error Correction (FEC) logic and the related software and firmware. 400G Ethernet uses PAM-4 signalling optically and electrically. Because of the susceptibility to the impact of noise, distortion, and other disturbances, most links will run with a raw error floor. Using FEC, the normal channels and links used should be able to run error free (post-FEC) under normal conditions and the FEC corrected error rate can give important information on the channel margin.
Now that we are operating in what is essentially an ‘always errored’ environment, the FEC encoder and decoder are a critical part of any 400G Ethernet system. Knowing that the FEC is correctly designed and implemented will give you the needed confidence that you are bringing a stable and compliant product to the market.
Fortunately, VIAVI offers a suite of powerful tools to help validate and troubleshoot FEC logic. To help demystify the “how?” and the “why?” of these tools I’ve put together a list of frequently asked questions (plus answers) on how these applications can be best used to help stress and debug 400GE products.
Question: A lot of products offer a table or graphical view of errored symbols per codeword, isn’t this enough to validate a FEC?
Answer: The table view of errored symbols in a codeword, first developed by the VIAVI team for earlier ONT products, has been widely ‘adopted’ by others in the ecosystem. Although a useful view of the link performance, it is not capable of stressing and validating FEC logic. It merely shows how many errors the link is producing at any time; it cannot probe the complex FEC logic in a DUT. We recommend the FEC errored symbol view always be combined with other tools, such as our advanced error analysis, to gather more insight into the true nature of the errors.
Question: The ONT offers two FEC test applications, what is the difference?
Answer: FEC logic needs to be validated in two ways. The first way is ‘logical correctness’. To do this, the ONT generates a comprehensive set of error vectors which overlay the FEC structure to ‘walk through’ potential troubling error combinations. In 400GE the FEC codeword consists of 544 symbols of 10 bits, meaning a huge potential number of combinations of errored positions must be run through (starting with 1 errored symbol in each of the 544 positions). The ONT algorithm runs through the most appropriate set of error combinations to give solid fault coverage in a reasonable run time. Then it can ‘probe’ issues with the ability to precisely position the errors with the codeword to validate and stress exact gates in the implementation. Such modes are invaluable for engineers looking to implement the FEC in FPGA or ASIC.
The second way investigates how ‘stable’ the implementation of the FEC is to power supply integrity issues. When the FEC is running under error conditions more and more logical gates are switching. This causes dynamic current spikes and if the power supply layout (both on the PCB and within the IC) is not robust enough the FEC may fail. So, it is possible that a FEC is logically correct and can pass the standard suite of FEC logical testing. But when the tests are repeated in an aggressive dynamic manner (specifically designed to stress the power supply) errors may occur. Such applications can help optimized FPGA floor planning, PCB design, layout & decoupling and the optimization of DC-DC converters, especially dynamic output impedance.
The bottom line? By using these tools, you can go to market with confidence. Not only in the logical correctness of the FEC (and remember – this can also test and validate the firmware harness that network elements use to monitor the FEC operation) but also in the implementation of the FEC and associated power supplies and layout.
Thanks for reading and be sure to come back for the next blog “What’s all this OSNR stuff?” co-written with Matthew Adams.