Stop Analyzing the Critical Path

The typical approach to increasing the FMAX of an Intel FPGA design is to look at the top failing setup paths in TimeQuest and attempt to optimize them. Every FPGA engineer has spent countless hours doing this. However, with the HyperFlex architecture, we are now in the “era of retiming” as I like to call it, where it is actually misleading and arguably incorrect to analyze the top failing paths. I know this can be very surprising to FPGA engineers and I’ll explain why in this article.

This explanation assumes that you have a good understanding of retiming.

On previous architectures the correct thing to do was to analyze the top, say, 100 failing setup paths in TimeQuest and look for patterns in the list, or even better, to compile multiple seeds and look for patterns across compiles. You could then confidently focus your efforts on the paths that showed up most frequently.

Figure 1. Top failing paths in TimeQuest

With the HyperFlex architecture in Stratix 10 you must instead focus your attention on the critical chain. This is preached in the Stratix 10 High Performance Design Handbook [1] but I want to explain it further since it’s a point of confusion for FPGA designers. There are 2 main reasons.

First I’ll remind you of the definitions of critical path and critical chain.

The critical path is the single register-to-register path with the longest delay in the design. It is therefore the path that limits FMAX. *

Figure 2. Simple design with a critical path of 3.5ns which limits FMAX to 286 MHz. [2]
The critical chain is the sequence of register-to-register paths that cannot be retimed further in order to achieve higher FMAX on the design. In other words, the Hyper-Retimer has balanced the delays along this chain as best as it can and this chain is the main bottleneck in the design. If you want to increase FMAX you must change the design in a way that enables the Hyper-Retimer to further balance and reduce path delays in this chain.

Figure 3. The critical chain in a design where the retimer has balanced the delays as best as it can, resulting in a critical path of 3ns and FMAX of 333 MHz. [2]
It is important to note that one path in the critical chain will be the critical path of the entire design. But the critical path identified in that particular compile is less relevant because in another compile/seed with a slightly different place-and-route, the Retimer may relocate the registers differently so that some other path in the chain technically becomes the critical path. So chasing critical paths is not fruitful because they can so easily change from compile to compile as the Retimer repositions registers differently on the critical chain. Therefore it is much more effective to look at the chain of paths that the Retimer tells you is the performance bottleneck. If you compile multiple seeds (say 10), you will often see the same chain appear consistently, very clearly telling you where to focus your attention.

Like a broken record, I’ll repeat: The critical path within the critical chain will vary from seed to seed due to retiming. This is why you shouldn’t focus on the top failing path in TimeQuest since it will constantly change.

Your next step will be to figure out a way to change the design to allow more effective retiming of the reported critical chain (ie. add more registers, alleviate retiming restrictions, restructure the logic to reduce retiming dependencies, etc). The Retimer and FastForward reports will give you suggestions on what to do. Once your changes are implemented the Retimer will figure out the optimal way to balance the registers in that chain for you.

A second, often overlooked, reason why it is misleading to look at the top failing paths is that the Retimer does not optimize the second-place critical chain! There’s no point in doing that since it would consume runtime for no FMAX gain. Once the Retimer hits the FMAX-limiting chain, it stops. By the definition of critical chain it simply cannot reduce and balance the path delays further and therefore the design’s FMAX cannot be improved. Since the second-place chain in the design (ie. the chain that’s almost as bad as the critical chain but not quite) does not get optimized by the Retimer, it will likely contains paths that are almost as critical as the design’s ultimate critical path. Thus, these paths are likely to show up in the top 100 failing paths in TimeQuest. But these paths might actually be good candidates for retiming, if the Retimer had worked on them. This would mean they’re not true design bottlenecks and would not show up in the top-100 list if, I’ll say it again, the Retimer had worked on them. But it doesn’t work on them because there’s no point — the design is still limited by the ultimate critical chain.

If you’re wondering why the Retimer doesn’t work on them as a way to identify the 2nd- and 3rd-place critical chains to give you more information on what to optimize, this is because this “lookahead” is precisely what the FastForward tool does. FastForward tells you what the next critical chain will be if you solve the current critical chain. And it keeps iterating like this until it hits a wall. It’s also a good idea to see if FastForward hits the same “future” critical chains across seeds.

So in summary, on Stratix 10, it is most correct to focus your optimization efforts on the critical chain identified by the Retimer. If you’ve been an FPGA engineer for a long time, this will feel like an unnatural change of habit (it sure did for me in the beginning).

* Technically a path starting or ending at an I/O pin could be critical but we’ll keep it simple and consider only reg-reg paths since that’s most common.

References

[1] Intel Stratix 10 High-Performance Design Handbook. [Local Copy]

[2] Hutton, Mike. “Understanding How the New Intel® HyperFlexTM FPGA Architecture Enables Next- Generation High-Performance Systems“. [Local Copy]