Let’s review a few fundamentals of FPGA circuit performance. FPGA designers will roll their eyes at this, I know, but stick with me. The propagation delay of each register-to-register path in a design is the sum of the delays through the logic and interconnect on that path. The maximum frequency of the design (its FMAX) is dictated by the single path in the design with the longest delay (called the critical path). So it doesn’t matter if most paths in the design are short and fast — the FMAX of the whole design is ultimately limited by the one path that’s long.
Let’s now consider a hypothetical, simple FPGA design, consisting of a straight sequence of register-to-register paths (called a chain in Intel FPGA terminology), shown in Figure 1.
This concept can be extended beyond the simple ‘chain’ design to a design of any configuration.
Conceptually, in order to achieve balanced path delays, you could try to shuffle around the logic and routing so that an equal amount exists between registers. Or you could approach it backwards and leave the logic and routing in place and just move the registers around to achieve balanced delays, which is what the Hyper-Retimer does.
Now we come to the definition of retiming. Retiming is the practice of relocating registers in an already placed-and-routed design with the goal of balancing path delays between registers. As path delays get closer to being fully balanced, the FMAX of the circuit increases. One-by-one, the Hyper-Retimer optimizes the performance-limiting longest paths in the design by attempting to reduce their path delay. It does this by moving registers in an effort to balance that delay across the chain of paths to which that path belongs. All register movements are made in such a way that the functionality of the circuit remains exactly the same.
For many years Quartus had already been performing some retiming in the early stages of the Fitter, but of course, its effectiveness has been limited by the fact that it’s difficult to predict path delays before the design is fully placed-and-routed! With HyperFlex, it now also performs extreme retiming after place-and-route, which is a very effective time to do so since all of the path delays are known with high accuracy.
Gains in FMAX are achieved even if you don’t achieve exact delay equality, as shown in Figure 2.
[1] Hutton, Mike. “Understanding How the New Intel® HyperFlexTM FPGA Architecture Enables Next- Generation High-Performance Systems“. [Local Copy]