Back Up

(Taken from the website of Andy Lowe some time in 1999. Where are you now, Andy?)

One way that we have found to effectively organize thought in the absence of a well-defined methodology is what we call "The Ten Bugs in the Known Universe." Our statement that there are ten bugs in the universe of programming has been called bold (the estimate may in fact be high), but it is nonetheless useful.

Our list is similar in spirit to the style guidelines enumerated in The Elements of Programming Style by Brian Kernighan and P.J. Plauger. We are also indebted to the more rigorous "A Discipline of Programming" by the great Edsger W. Dijkstra. Dijkstra deprecates even the concept of "debugging" as being necessitated by sloppy thinking. Perhaps. Yet as we are not perfect, let us not allow defects in our thought processes to prohibit us from accomplishing anything. So, debug we must.

Let's assume for a moment that there are in fact only ten categories of bugs in all programming. It must follow that these ten bugs keep reappearing in different forms over and over again. If only we could itemize them and devise tools to distinguish them, we might be able to get a handle on the problems we have been trying to solve so frantically and reactively for so long. Maybe if we can name them, we can tame them. We try to do just that here, with our observation of the

Ten Bugs in the Known Universe.

  1. "It's not doing what I thought it would"
  2. We don't agree on our interface
  3. Uninitialized variable
  4. Memory trasher
  5. Off-by-one error
  6. "What specification?"
  7. You're not doing what you think you're doing!
  8. Poorly defined guards
  9. Poorly defined exit conditions
  10. It's not a bug, it's a feature

Let us examine each of these in turn and identify where they might crop up.


"It's not doing what I thought it would."

This problem is especialliy characteristic of novices and even experts sometimes in very complex systems which are hard to understand. It can have many different manifestations depending on the context, from syntactic typographical errors ("typos") into more fundamental "thinkos" as quoted..

for example, use versus reference. mistaking a pointer for the structure and vice versa.

Good compilers and syntax checkers can find many of these syntactic problems. Sometimes the problem is really understanding what the compiler is trying to tell you.

Sources of error here include failing to understand the implications of a complex interface; failing to take into account latency of trashing, examples in socket programming can be easily found, etc.


We don't agree on our interfaces

The lack of a clear specification can be really vexing because in the presence of a well-defined body of code, you might expect that there is an interface, somewhere, even if it is just in the designer's head.

But if the interface isn't articulated, then there is no way to make reference to it. In this case, the programmer may make assumptions about the nature of his inputs or outputs. He or she may survive testing and even customer acceptance.

But then those assumptions are violated, and in the absence of a specification, it's impossible to say which side of the interface the bug is really on. Think about it. You have a component (the interface) to which no individual or group is assigned. Oops.

So one way to combat this is to try to extract an interface specification from the existing application and test it for validity in the normally expected usage cases and compare that with how the interface actually behaves in clearly out-of-bounds conditions.

The real tough part comes right at the limits, implicitly or explicitly defined as the case may be.

In other words, if the interface is not defined, then the code's behavior itself defines the interface.

Once you analyze the situation in this way, an infererence can be made: if two pieces of code don't agree on the nature of their inputs or outputs then what you have is a job of post-hoc specification on your hands. This is covered elsewhere (cf. Ted J. Biggerstaff & Alan J. Perlis, Software Reusability: Concepts & Models, New York: ACM (Addison-Wesley), 1989. ).


Uninitialized Variable

This passive problem is not nearly as nasty as his evil twin, stack trasher, and sinister cousin, memory trasher can sometimes be. It is one of the disciplines of programming to maintain consistency and completeness with respect to heap and stack memory usage, and manipulation of pointers and collections.

The C language is both notorious and beloved for its loose typing. Most static data structures behave deterministically. Errors in or failure to initialize these structures will generally either always fail or always succeed, even if coded improperly, and thus will be picked up in the first rounds of testing. (You did run it at least once before you checked it in, right?)


Memory Trasher

What is it? Where does it come from? How do you find it?

It is like a meteor from space.

Normal C++ and Java memory management through disciplined use of closed allocators and deallocators (new and delete) along with user hooks through constructors and destructors is, in our opinion much safer and usable than working with live wires like malloc() and free(), whose easy abuse is responsible for catastrophic failures like rockets unintentionally crashing into Venus.

The memory trasher is a vicious bug, and together with a special case, the stack trasher, have probably cost more money in software development over the years than any other single category of problem. The prohibition of pointers from Java may be one of the greatest advances in software reliability in decades.

Debugging facilities for stack and heap mismanagement, are in our opinion, ripe for implementation in hardware, and you should use hardware support whenever you have it. Some architectures already support exception handling and the use of protected memory pages. These will help you to recover from certain categories of error (such as jumping through a null pointer, or accessing memory at an invalid address), but they will not help you find or correct them.

We have two recommendations for helping to find and correct memory trashers and leakers, especially the most vile, latent variety which elude conventional system testing mechanisms. The first is to use debugging memory allocators. Such memory allocators work in a variety of ways, but the simplest variety put "guard rails" at the head and tail of each heap allocation. Some also keep linked lists of the allocated segments for later sanity checking, and record the module and line number of the call which allocated the block.

Then, at intervals throughout execution, the list can be traversed checking the guard rails for the appropriate test patterns. By progressively narrowing the intervals between tests, the instance of the error may be isolated.

Stack errors may be similarly trapped, but this requires support from the compiler or sophisticated debuggers to help tag the stack frames.

The best way to find and fix such problems is through disciplined and thorough unit testing at the module level -- before integration with other modules. If all modules undergo such testing, the chances for unanticipated errors to crop up during integration is significantly reduced.

During unit testing, it is also prudent to use a memory allocator which provides the ability to "squeeze" memory. That is, out of memory conditions can be simulated at progressively narrower intervals throughout program execution. This helps to flush out the latent problems hiding in your code.

Kernighan and Plauger advise you to program defensively, meaning to write code that can handle erroneous input. We go a step further and advise you to intgrate defensively. That is, do not attempt to integrate your code with other modules until you have not only verified your own behavior, but have also made sure your colleagues have been just as careful. Forcing them through such a gate will surely be your most reliable path to a successful (and uneventful!) integration party.


Off-by-one error

Off-by-one errors happen much more frequently than you might suspect, typically resulting in confusion in counting -- for example the famous fencepost error may be helpful to think about: "If I want to put up a fence forty feet long, and I have four ten-foot sections, how many fenceposts do I need?" Other off-by-one errors are due to erroneous unit conversion or phase conversion or failing to agree on whether to begin counting at zero or one, rounding or unanticipated math errors, inaccuracies etc. Whenever you have to count something, always ask yourself: "am I counting fenceposts or fence sections?" And also ask yourself, "do I start counting at zero or one?"


"What specification?"

If you don't understand the problem, you can't provide the solution. This is really kind of a fundamental point and needs elaboration. Who owns the specification for your product? Who really understands it, in its entirety at least at some level?

If you answered "no one" then bzzzz -- wrong! At the very least, your customer does. If your pieces don't fit together well enough to solve his or her problem, then you lose.

We use the term specification to denote the user-requirements document. More normally "specification" refers to the design specification which is derived from the user-requirements. What we're trying to get at here, is that these documents represent a communication stream between the intended users of a product and its designers. since the product-cycle has been reduced from years to a year, to nine months to next quarter, this communication must be effective, or the project will not achieve complete success.

Technology-driven organizations have a natural tendency to try to lead their audiences where they don't want or know that they want to go. That implies a learning process on the part of the customer. Market-driven organizations more often follow their customers into novel problem spaces. This implies a learning process on the part of the developers. The "What specification?" error, therefore, represents two different but symmetrical problems.

This problem is similar to the "We don't agree ..." error in that it represents a communication problem. The difference lies in which entities are communicating. The latter is probably more severe or a least embarassing in that components within the system don't even agree on what is happening. On the other hand, the former can be fatal to the extent that your customer may refuse to accept your product because of it.

An analogy might be politics -- in which you may have two models of politicians -- one who leads through vision, and persuades, cajoles orders or otherwise compels people to follow. With enlightened "generalship", if you will, this can be a successful strategy. This is not often followed in a democracy, where the concept is put on its head and the people (voters/customers/workers/investors), are the holders of the vision or wisdom, and a successful politician leads by following. That is, taking into consideration what the people want, he then decides what his positions are. So let's contrast these two styles of statesmanship through examples, pros and cons etc.

Should we design an analytical model for these categories of statesman, we might be able to prove that all innovation stems from the visionary, whereas the democratic or opportunistic strategy is more successful at consolidation and maintenance.

What sort of statesman (or project manager) are you? Do you wish to lead your customer, or take a poll before you decide? In any case, clearly defined requirements have two characteristics: a well-defined problem statement and a well defined satisfiability criteria.

That is, only when you know what you are trying to do, can you unequivocally demonstrate when you have done it.

This would imply that satisfiability tests must be defined before design is begun, and that this must ultimately be the responsibility of the customer.


You're not doing what you think you're doing!

The differences between the "You're not doing ..." error and the "It's not doing ..." error are subtle, but important. The latter referred to a fundamental lack of comprehension of what is going on. As noted, this is common and expected behavior of novices, and sometimes even happens to experts. In contrast, the former never goes away. This problem refers to the situation where the diagnostician fully understands the difference between case A and case B, and the evidence points to case A, but he stubbornly only observes symptoms of case B. Consider a preacher who has since the Eisenhower administration predicted the arrival of Armageddon sometime in 1988. Imagine him giving his first sermon one bright Sunday morning in January of 1989. Such is the chagrin of an experienced programmer trapped in the vice of "You're not doing ..."!

We have observed the behavior of very talented programmers with a wide variety of training and education levels, and the breakthrough characteristic of the very best is, in our opinion, an intuitive ability to discriminate previously unobserved but relevant facts rather than continually review known but irrelevant ones. This "knack" may perhaps be taught and refined, but it was certainly not in our Computer Science curriculum.

If you suffer from lack of clarity, or are not concentrating, you must first become aware of that. High performers in all fields must develop the ability to concentrate, but computer programming can be uniquely taxing to the intellect (especially at 3:00 on Monday morning before the Comdex show opens). This is really a psychological capability -- an awareness that, for example, "My jaw is tight, maybe I need some sugar." You sit on your butt so long, it turns out that stimming (that twitchy leg motion people sometimes get), helps to keep your blood pressure up.

Maybe you suffer from gridlock of the mind, where you repeatedly retread the same erroneous thought patterns: "It's broken and I didn't change anything at all, except for that, but it can't be that."

Such examples seem obvious once pointed out, but many is the programmer hour spent in "You're not ...". The trick is simply to be aware of your situation, your own state of mind as well as the completeness and accuracy of your powers of observation. Not only do you need to know which tools to apply to discriminate the component, which to apply to identify the module and line of code causing your problem, you need to know when the tool you are using is inadequate -- when to drop it in favor of another one.


Poorly-Defined Guards

We take the definition of a guard from A Discipline of Programming:

"a guard acts as a sentry to a body of code and does not permit execution of that body unless a defined set of conditions is met."

Poorly-defined guards are unfortunately very common. In standard C code, you normally think of guards in terms of error checking. For example, all logic depending on a successful call to malloc() should be guarded against it returning a null pointer. This is tedious and sometimes obscures the readability of the code. Readability directly relates to maintainability, and neither should inhibit reliability.

Structured exception handling provides a mechanism to define default behavior for specific failure modes, so that reliability can be enhanced through an assurance that we can guard against all potential failures in some default way, without precluding the ability to guard specific failures in specific ways.

A body of code without a guard appears naked to the trained eye. More often, we see ill-defined guards. Such code masks potential "bogons" -- that is latent bugs. The behavior of such logic outside its intended context cannot be predicted.


Poorly-Defined Exit Conditions

An exit condition is the test which decides if a computation is complete. The most common instance of this sort of problem is an infinite loop. There are several causes for infinite loops, but "Poorly-Defined ..." is the most obvious one of them. For example, consider an exit condition which tests against an invariant:

for (i=0, j=0; j<1; i++) ;

We're sure none of our readers will ever write a loop with an exit condition like this. Other instances of this bug are premature loop termination, premature function returns or block exits and so on.


It's not a bug, it's a feature!

This cliche more properly describes a subject of a marketing text than a software text. But without properly defined requirements, how can we label any behavior erroneous? On the other hand, we would be wise to keep our minds open to the possibility of a happy accident. Rare as they are, they still occur, and we never know when we might stumble over a gem.

Simply remember that redefining your requirements is sometimes an option. Relaxing your constraints expands the space of potentially successful solutions, and may move your solution across the line from failure to success.

When you find yourself with a defect or potential defect in your software, try to categorize it according to this enumeration. Remember that you may be looking at a compound problem. If you are looking at something that does not easily fit into any of these categories, try to break your problem down into two or more simpler ones. Become familiar with all of your tools, and don't neglect them

Finally, we leave you with our motto: "Where there's one bug, there's two." Don't stop looking because you have found a problem. The chances are excellent that there is another one nearby.