What Are Software Bugs?
What Are Software Bugs?
A software bug is an error, flaw, or fault in an application. This error causes the application to produce an unintended or unexpected result, such as crashing or producing invalid results.
As software developers, we deal with software bugs all the time. We know one when we see one, right? But what is a software bug, exactly? What can we learn from looking at all the different pieces of information that make up a software bug – its anatomy, if you will?
Since we spend so much time preventing, identifying, and correcting bugs, there's a great deal of value in examining, defining and naming the various parts of a bug. This process helps developers use more precise language and reasoning in our approach to debugging. This, in turn, may help us reduce the amount of effort that goes into debugging. Lastly, it may help software companies like ours decide how to plan our processes and procedures, making us more effective at creating bug-free software.
What Causes Software Bugs?
Software bugs can be caused by many factors, including unclear requirements, programming errors, software complexity, lack of communication, timeline deviation, errors in bug tracking, documentation errors, deviation from standards, and much more.
How to Identify a Software Bug
The sighting is the event that lets you know that the bug exists. It could be a test failure, a customer report of a problem, a crash, or a hang. The information that is captured when the bug is first sighted is almost never enough to help us identify the cause or behavior of the defect itself.
The symptom is the specific way the program isn't behaving as expected. I think of it as "the program should do X and instead it does Y." It is more specific than the sighting because often when a program first fails, the person who makes the sighting isn't paying attention to it at a level to give a clear symptom. It may take trying it two or three times before the symptom becomes clear.
This is the set of steps necessary for an arbitrary user to reproduce the symptom with at least some probability. It can include manual inputs and settings, data files or database contents, or configuration details.
The description is the full write up of the bug. It should include the symptom as well as some kind of articulation of the context in which the symptom can be seen. If a full reproducer is available, then including that is ideal. Usually, the more precise the information the better. In practice, however, most bug descriptions are less than perfect. The description often starts with minimal information and gets more precise as more is learned.
This is usually related to the part of the program that is responsible for doing what the program does when the symptom occurs. A program may, for example, crash because it dereferences an invalid memory address. Often it isn't too hard to find the failure part of a bug – but then you have to start looking for the cause. (Where did the invalid address come from?)
There may be one or more steps of cause and effect that separate the initial defect in the code from the final failure that lead to the symptom.
This is the actual mistake in the program itself. It is the cause at the beginning of the effect chain. Sometimes it is a single line, word, or even character. Generally, this can only be determined by analyzing the behavior of the program to find each link in the cause-effect chain.
How to Resolve Software Bugs
Once we identify the defect, how do we resolve the bug? And how do we prevent more in the future?
Many times defects will exist in the code that do not cause any noticeable effects to the user. The defect may be on a bit of code that is only executed in unusual circumstances (or not executed at all – dead code). An example might be a confluence of multiple input values or settings. The trigger is the set of all the conditions that are necessary for the defect and the effect chain to cause the symptom.
Sometimes during the bug analysis process, one or more techniques will be discovered that can prevent the symptom from occurring, but they don’t actually address the defect. The classic example of this is to restart or reset a program before the resource leak reaches the point of termination. Another example is forcing users to follow a certain constrained series of steps to avoid setting up the trigger conditions. Workarounds may be very helpful in the short term but should never be confused with resolutions.
Once the defect is identified, one or more resolutions may be proposed. This could be a one-line change, or it could involve a refactoring of the entire program.
These are the steps that can be used to verify that the bug has been resolved. These can also be used to inform the creation of a regression test, which will quickly detect this defect or a similar defect if it is re-introduced at some later date.
This involves examining the circumstances that occurred or systems that were in place when the defect was introduced. What was it about the design, communication, documentation, or software development process that allowed the defect to be introduced in the first place?
Tools for Finding Software Bugs
As developers, we frequently talk about bugs because, let's face it, writing software is hard. We don’t always get it 100% right the first time. But when we talk about software bugs, we often do so in a way that minimizes or directs attention away from the hard and important work of discovering, properly specifying, analyzing, resolving, and testing bugs.
If we can agree that the components and artifacts that I have outlined above are all important and relevant to most bugs, then we can start asking ourselves how we can work better as software organizations to tackle bugs. For example, I’ve highlighted the difference between a sighting, a symptom, a reproducer, and a description. Often these get conflated.
When we fail to recognize that each of these bug parts is important, we get situations where we miss bugs. If we insist on a reproducer before we even start talking about a bug, we may miss sightings that are “sporadic” or that occur to a user who is unlikely or ill-equipped to create a careful write up of a reproducer. That means we may have bug sightings, but fail to record or resolve those bugs.
By recognizing the distinctions between the different parts of a bug, we can start to ask ourselves, “What can we do to make sure that we capture all the sightings?” This might mean creating descriptions that don’t quite rise to the standard of reproducers and then putting a process in place to monitor these. We can then create more detailed descriptions, including full reproducers as they are developed. Eventually, some or all of these bugs will be resolved.
Similarly, it's important to distinguish between the failure and the defect. We must recognize the value of workarounds, but also understand the difference between a workaround and a resolution. As software programmers and engineers, we have a responsibility to identify and resolve bugs in the software we create. This is a non-trivial task and one that is worth approaching methodically:
- What do you think about this taxonomy?
- Do the bugs you deal with resemble what I am describing?
- Do these features help you see any way your tech support, quality assurance, and software development processes could be improved?
One option may be to start using a debugging tool, like TotalView for HPC. Designed for high-performance computing (HPC) environments, TotalView provides powerful functionality to make debugging as easy as possible. You can correct bugs, memory issues, and crashes in your high-scale, parallel and multicore C, C++, and Fortran applications. With TotalView, you get unparalleled visibility into running programs, unmatched control over thread states, and a unique conceptual view to aid analysis.