Making all the right mistakes: PAM-4 and error fingerprinting
Growing up in Castleisland, County Kerry, my family was lucky to know the famous journalist Con Houlihan. With his fantastic command of the written word he was often asked to advise on the progress of young student’s written English by concerned mothers. Con’s stock reply would normally be ‘Well, they are making all the right mistakes. Much has been written over the years on Con’s wit and reason, now here I will do my best to emulate him with an eye towards today’s world of 400G Ethernet.
Here at VIAVI we have a long legacy of troubleshooting and debugging the NRZ interfaces used on 100G and below. Even at 25G NRZ per lane, the well tried and tested tools used in bit error rate testers (BERTs) generally proved enough to troubleshoot the systems and components up to 100G. But the world of 400G brought in PAM-4 signalling. This has driven a whole new level of complexity and challenges to the interface SERDES & module troubleshooting which classic BERTs cannot address.
The main challenge now is going beyond ‘knowing’ we have an error. Indeed, the majority of classic BERTs do no more than count errors, but this is far from helpful in PAM-4 links where you tend to always have errors anyway. To address this increased level of complexity, VIAVI has developed a whole suite of tools that go well beyond simple error counting. In fact, the tools really fingerprint and classify the error, providing real insight into the root cause and what elements and aspects may need improvement or adjustment.
With PAM-4 we now see several areas that are essential in any error analysis tools. These include:
- Where the error occurs in the PAM-4 symbol – is it associated with a level (a bit like errored ‘1’ and ‘0’ used in NRZ BERTs) or is it associated with a transition? With PAM-4 signalling, especially in the optical domain, you may see that some transitions are more error prone due to unequal eye opening (remember, PAM-4 has THREEE eyes). In the optical domain incorrect or non-linear AGC could cause eye height distortion leading to an increased error rate associated with those transitions.
- Is the error isolated or a burst? If it is an isolated error, how isolated is it? Burst errors can be troublesome for FECs but can also cause CDRs in the SERDES to lose lock or increase jitter. Tools that clearly identify bursts and the cause will help maintain a healthy FEC margin.
- Is the burst ‘really’ a burst or is it actually a bit-slip? With the move to more complex DSP-based SERDES we see the challenge of bit slips being one major issue. Basic test sets just see a slip as an error or error burst. Of course the root cause in a bit slip is very different, so applications that clearly identify bit-slips vs. bursts are a critical tool in the tester.
- What is occurring before the error, is there a pattern sensitivity? With the addition of the SSPR(Q) patterns we can really stress pattern sensitivity and baseline wander.
- Is there any pattern to the error cadence or bursts? Sometimes errors on the high speed SERDES are not directly related to the high-speed link. We have seen issues where poorly placed and decoupled DC-DC converters have cause SI issues for the high-speed links – giving errors at the rate of the DC-DC switcher rate. The ONT supports a view of the errors in a format that helps investigate issues like this. They are not unusual in the very confined space of a QSFP-DD module where SI is a challenge for the fast and slow signals.
Now here’s the combination of wit and reason that you’ve been waiting so patiently for: In the words of my fellow Castleislander Con, using an ONT on the PAM-4 signal helps see if you are making all the right mistakes!