A great deal of effort is spent in some design teams to achieve high fault coverage test programs in pursuit of enhanced reliability. While this is a laudable goal, one must understand the difference between a fault and a defect to avoid unnecessary and unproductive effort.

The goal of a comprehensive test is not fault coverage but defect coverage, that is, the ability to discern defective chips from functional chips. While a defect is a physical flaw in a physical chip, many people fail to realize that a fault is only an abstract model of a defect, and as such has an inherently limited accuracy. One must also realize that computer simulations are inherently incapable of predicting success and can only predict failure. A simulation that “passes” has simply failed to predict a failure. A good simulation will account for the vast majority of possible failure modes without taking forever to run. In order to fulfill both requirements, it must compromise.

The most common fault model used in digital design is the single stuck-at model. The assumption here is that the defect behaves as though it is shorted to power or ground, that is, one or zero. There are CMOS process engineers who have gone entire careers without coming across a stuck-at one or a stuck-at zero defect. Most defects in CMOS are various sorts of opens and shorts which do behave as though they are stuck at one or stuck at zero under most conditions.

As an example of the limitations of the single stuck-at model, consider the following true story: The prototype chips of a new design have exhibited a consistent failure mode on the tester only in test pattern Block A;  Block B works fine. A close examination of this failure mode leads the design team to postulate a stuck-at defect in a particular location. The simulation pattern for Block A is modified to include the corresponding stuck-at fault, the simulation is re-run, and lo and behold, the failure mode of Block A is replicated. Problem solved, right? Just for the sake of completeness, the team decides to run the same fault on Block B. That too fails, after having passed with the chips on the tester in the real world. This shows that the defect in question behaves as though it is stuck-at under the conditions of Block A but not under the conditions of Block B. Even though the corresponding fault is detected by both tests, only Block A detects this particular defect. The fault and the defect do not behave the same under all conditions.

In fact, the single stuck-at fault model is only about 98% or 99% accurate. This means that chasing 99.5% or 99.9% fault coverage is a fool’s errand. Once you surpass the accuracy of the fault model, you have no way of really knowing whether or not the additional fault coverage detects any additional defects. Add to this the fact of diminishing returns, that each additional tenth of a percentage point in fault coverage takes more and more patterns, and it’s obvious that you would be far better off pursuing other fault models, such as bridging faults or delay faults, than pushing the single stuck-at model beyond its inherent accuracy.