The target audience for this post is digital system design engineers; knowledge of digital design is assumed.

What are clock domains?

When they first became an issue, clock domain crossings (CDCs) were determined by the inability of static timing analysis (STA) tools to determine the timing relationship between two individual signals. To a certain extent, that’s still true. The problem is one of scale. When you only had one clock, the problem did not exist. When you had two or three clocks, the problem was manageable as a small number of exceptions. When you have dozens or hundreds of clocks, CDCs are no longer exceptions; there are just too many of them.  You need an automated tool to find them all, but that tool itself is operating according to a set of rules. How do you determine whether or not a situation reported by the new tool is itself an exception to the CDC rules? And if there turn out to be too many of those new exceptions, how do you separate the signal from the noise? The first step is to understand what the problem really is, and how to adjust the tool to cut down the noise. In short, you need to specify your clock domains, and you need to do it correctly to avoid false negatives (clean misses) as well as false positives (noise).

You need to answer the question, what is a clock domain? The answer goes right back to the beginning: two clocks whose timing relationship can be discerned by the STA tool are in the same clock domain. If the STA tool cannot determine that timing relationship, the clocks are in different domains, and signals that pass between them must be synchronized. The problem now is that you need to make that determination before you run STA.

In short, clocks that are derived from the same source and whose frequencies are integer multiples of one another are in the same clock domain. If ClkB is derived from ClkA by a simple divide-by, then ClkA and ClkB are in the same clock domain. If ClkC is the inverse of ClkB, they’re all in the same clock domain.

Two clocks from different sources, even if they have the same frequency specification, should not be labeled as being in the same clock domain. All specifications have error bars, and if the two clocks do not maintain the same phase relationship over time, then data passing from one clock domain to the other is subject to intermittent metastability issues, and should be synchronized. In this case, the STA tool would not detect a problem if both clocks were specified as having the same frequency. It is up to the designer to know that two clocks that seem to be in the same domain actually are not.

An Analogy

Picture two marching bands playing different songs at different tempos approaching the same city street intersection from opposite directions. Each is marching on the right side of the street, and each is turning right, so they will not collide. However, the leftmost column from one band has to turn left and join the other band without breaking stride, breaking formation, or missing a beat. Replace the musicians in bands with data bits from different clock domains, and you have a CDC.

CDCanalogy

It would be much simpler if there were a left-turn lane of sorts, and better still if there were a traffic cop or marching coach there to ease the transition. That would be a synchronizer.

Types of CDC issues

There are three main kinds of problems encountered when moving between clock domains. They are: metastability; data loss; and loss of data synchronization due to data reconvergence.

Metastability issues

The most obvious problem with moving data between clock domains is metastability. Given two back-to-back-flip flops with asynchronous clocks, the second flip-flop will eventually see a violation of setup or hold time. Its output will be unpredictable, and could even remain at a voltage level between the highest zero and the lowest one for more than a clock cycle before it settles. This condition is known as metastability, and passing that state on to downstream logic compounds the problem to the point where your system could fail. The simplest way of addressing metastability issues is to add a second consecutive flip flop in the receiving clock domain. This squares the MTBF, which is usually sufficient for most applications. When transferring a single bit signal between clock domains, this is almost always the solution of choice.

Data Reconvergence

Another problem that must be addresses is that of data reconvergence, when two data signals are combined after being independently synchronized between the same two clock domains. This is a problem because synchronization is inherently an arbitration to avoid metastability. A new value will be correctly clocked, without metastability, on one of two successive receiving clock cycles. There’s no way of knowing which. The two signals in question can be arbitrated differently and can end up being clocked into the receiving domain on different clock cycles when correct operation depends upon their remaining in step.

The implication is that, for a data bus that crosses clock domains, having individual synchronization on each of the bits will not work reliably. One solution is to generate a single-bit “data valid” flag which indicates that the data is stable. Synchronize that flag across domains, and then use it to enable the clocking of the data bus into the new domain.

Another solution is to ensure that the data itself is “gray” (only one bit changing on any given clock cycle) with respect to the receiving clock. This is easier when crossing from a slower to a faster domain because you can be sure there will not be multiple changes from the perspective of the receiving domain.

Losing Data

It stands to reason that if new data is generated on each cycle of a continuous fast clock, there’s no possibility of transferring all of that data to a bus of the same width in a slower clock domain, even if metastability and reconvergence issues are addressed. This also applies to two clocks that are ostensibly of the same frequency, but happen to be off by a few parts per million; eventually, data will be lost. The borderline worst-case scenario is for jitter and phase shift to conspire between two almost identical clock domains such that the receiving clock experiences a hold time violation at the leading edge of the data and a setup time violation at the trailing edge of the data, so that the data is missed altogether.

The data on the faster (transmitting) clock must be held to accommodate the receiving clock if all the data is to be transferred. You must ensure that the data on the transmitting bus is valid during an active edge of the receiving clock. With no synchronization between clocks, one might argue that you must guarantee that data is valid for one cycle of the receiving clock, plus the value of the phase and jitter uncertainties of both clocks, plus the greater of one setup time or hold time. But without more advanced synchronization techniques, you can hold the transmitting data only an integer number of cycles, and simply holding the data for a fixed number of clock cycles of the transmitting clock may not be feasible.

CDC synchronization schemes

With any transfer of data between two asynchronous clock domains with no handshaking or other feedback, there are only three logical possibilities. The ratio of data from the transmitting domain to the receiving domain where X>1 can be X:1 (in other words, some data is lost), it can be 1:X (in other words, some data is duplicated), or it can be 1:1 (everything balances out). Assume for now that the third possibility can be ignored. For some systems, given unstable data rates of approximately equal frequency, it is possible to both duplicate data occasionally and to lose data occasionally.

In any CDC situation, it is important to know whether the transmitting domain or the receiving domain is faster, and by approximately how much. A solution that works for a frequency ratio greater than 2:1 may not work for a ratio of less than 2:1. You must also discover (or decide) how much data loss or data duplication is tolerable. The simplest case of CDC (assuming that data loss is a bad thing) is moving data from a slow domain to a fast domain and tolerating data duplication in the receiving domain.
One of the more robust and complex CDC solutions involves the use of a FIFO data buffer, in which data is written into a memory and then read out from an independent data port. In a RAM-based FIFO, each data address is considered occupied when the transmitting clock writes to it and unoccupied when the receiving clock reads from it. It is also possible to have an asynchronous fall-through FIFO. In either case, the FIFO will have “full” and “empty” states as part of its structure. A FIFO-based synchronization scheme can work only if the number of data writes equals the number of data reads over time, with the size (or “depth”) of the FIFO dictating how long that time can be. If you use a FIFO, you should assume that the system will eventually have to deal with a full or empty condition.

The only way to neither lose nor duplicate data is to create some sort of feedback from the receiving domain to the transmitting domain. If this feedback regulates the clock, then the two clocks become synchronous to a certain degree (depending on the response time of the feedback mechanism). If the feedback regulates the data rate without regulating the clock rate (so that data remains valid for a variable number of clock cycles in the faster domain), this qualifies as a handshake. Perhaps the simplest form of handshake is a variation of the “data valid” flag described above for data reconvergence, in which the receiving domain sends a signal back to the transmitting domain to indicate that the “data valid” flag should be deactivated.

Advertisements