Illustrative Explanation of Fault, Error, Failure, bug, and Defect in Software
- July 29, 2024
- 17 min read
- Software quality
Table of Contents
Software do not always behave as expected. Mistakes in the implementation or in the requirements specification cause issues in software. The common terminologies used to describe software issues are Fault, Error, Failure, Bug and Defect.
More often than not, there are ambiguities with the meanings of these terminologies, with the most ambiguous being Error.
I believe we can clarify these ambiguities by:
- Separately looking at their meanings during the development and runtime phases of the software life cycle.
- Clearly defining the scope of the subject program on which the terminologies are applied, as well as its environment.
In this article, I will clarify the meanings of these terminologies using taxonomy and show their relationships with illustrative examples.
Skip to the section Fault, Error, Failure, Bug, and Defect if you are only looking for the definitions.
Newsletter
Subscribe to our newsletter and stay updated.
Prerequisites
Let’s set some grounds on concepts that will make this article easy to read. Keep the following in mind:
There are expected (right) and unexpected (wrong). In order to talk about issues, we should have a clear domain for what is considered expected and what is considered an issue, regardless of the scope of the analysis. For examples, a line of code is either right or wrong in regard to the rest of the code. The same goes for a program input, which is expected (accepted) or unexpected (non-accepted).
All terminologies assume the subject program as reference. In other words, we define the terminologies in accordance with a reference program that has a clearly defined requirement specification. The specified requirements define what is expected and unexpected.
Everything that is not part of the developed code of the reference program is considered external. This includes all dependency libraries and any other component that belongs to the same software as the reference program. The reference program can be a function of a larger program.
Illustration
Suppose we have a requirement to create a Python function that dumps the README.md
file of a local code repository directory to the terminal. The function is expected to:
- Take as input the path to the local code repository directory.
- Dump the content of the
README.md
file, located in the code repository directory, to the standard output. - Return
True
if the dump is successful. - Return
False
if the file does not exist. - Work on both Windows and Linux Operating System (OS).
Below, we have an implementation of the function, which contains problematic code (Problematic tab). We also have the function with the problematic code fixed (Correct tab).
- Problematic
- Correct
1import os
2def dump_topdir_readme(topdir: str):
3 FILENAME = "README.md"
4
5 # [issue-1] The path separator won't work on linux
6 file_path = topdir + "\\" + FILENAME
7
8 print("# Dumping file path", file_path)
9
10 # [issue-2] The path existence check is incorrect
11 if not os.path.exists(topdir):
12 print("# missing path")
13 return False
14
15 with open(file_path, "r") as fp:
16 print(fp.read())
17
18 return True
1import os
2def dump_topdir_readme(topdir: str):
3 FILENAME = "README.md"
4
5 # issue-1 corrected using 'os.path.sep'
6 file_path = topdir + os.path.sep + FILENAME
7
8 print("# Dumping file path", file_path)
9
10 # issue-2 corrected using 'file_path'
11 if not os.path.exists(file_path):
12 print("# missing path")
13 return False
14
15 with open(file_path, "r") as fp:
16 print(fp.read())
17
18 return True
Looking at the problematic code, we see that on line 6, a backslash (\
) is used as the file system path separator. This works for Windows but not for Unix-based OSes.
Additionally, on line 11, the code checks for the existence of topdir
instead of the README.md
file. As a result, the Python open built-in function may raise an unexpected exception on line 15.
Test scenarios
Assuming that the content of the README.md
file is the string “Hello World!”, Here are few scenarios.
Case 1: Windows OS, topdir
exists and contains the README.md
file
- Problematic
- Correct
>>> dump_topdir_readme('repo')
# Dumping file path repo\README.md
Hello World!
True
>>> dump_topdir_readme('repo')
# Dumping file path repo\README.md
Hello World!
True
The problematic program behaves as expected.
Case 2: Windows OS, topdir
exists but the README.md
file does not
- Problematic
- Correct
>>> dump_topdir_readme('repo')
# Dumping file path repo\README.md
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 14, in dump_topdir_readme
FileNotFoundError: [Errno 2] No such file or directory: 'repo\\README.md'
>>> dump_topdir_readme('repo')
# Dumping file path repo\README.md
# missing path
False
The problematic program behaves unexpectedly.
Case 3: Windows OS, topdir
does not exist
- Problematic
- Correct
>>> dump_topdir_readme('repo')
# Dumping file path repo\README.md
# missing path
False
>>> dump_topdir_readme('repo')
# Dumping file path repo\README.md
# missing path
False
The problematic program behaves as expected.
Case 4: Linux OS, topdir
exists and the README.md
file exists
- Problematic
- Correct
>>> dump_topdir_readme('repo')
# Dumping file path repo\README.md
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 14, in dump_topdir_readme
FileNotFoundError: [Errno 2] No such file or directory: 'repo\\README.md'
>>> dump_topdir_readme('repo')
# Dumping file path repo/README.md
Hello World!
True
The problematic program behaves unexpectedly.
Newsletter
Subscribe to our newsletter and stay updated.
Fault, Error, Failure, Bug, and Defect
In this section, we provide the IEEE definitions of the terminologies and explain their nuances with the support of our illustration. We will examine these terms in relation to the artifact (program, input, output, etc.), the program scope, and the software lifecycle phases (development and runtime).
Error
1. human action that produces an incorrect result. 1
2. difference between a computed, observed, or measured value or condition and the true, specified, or theoretically correct value or condition.
3. erroneous state of the system. 2ISO/IEC/IEEE 24765:2017 Systems and software engineering — Vocabulary
When considering the software development phase, an error is a human mistake during development, also known as a programming mistake. We refer to this as a programming error. This view of an error aligns with the IEEE definition (1).
Examples of programming errors are the 2 errors in the problematic program from our illustration. The first error is mistakenly using a backslash (\
) as a filesystem path separator (issue-1). The second error is checking for the existence of the top directory instead of the README.md
file (issue-2).
Not every mistake during programming is a programming error. For example, a misspelling of the variable file_name
as file_nme
throughout the code would not lead to a fault.
Following are the definitions for various artifact when considering the software runtime/operational phase:
Environment Dependencies: An error is a human mistake during the packaging or setup/installation of the program and its dependencies. Examples include missing or incorrect software libraries, incompatible OS, or unmet hardware requirements. We call this a dependency requirement error. Static linking or dynamic linking of a compiled program with the wrong library or version is also a dependency requirement error. This view of error aligns with the IEEE definition (1).
Input artifact: An error is an unexpected input value or action that the program does not accept. This is an input that falls outside the nominal working scope of the program unit. In this context, it is called an input error. This aligns with the IEEE definition (1).
Examples include Case 2 (correct) and Case 3 from our illustration. Another example is a program that divides a numbera
by a numberb
. An input error occurs ifb
is 0. A program input can also be a configuration (input specified by the developer or administrator). A mistake in configuration is called a configuration error.Output artifact: An error is an output that intently indicates that something unexpected or non-accepted happened during execution. This output does not fall into the nominal working scope of the program unit, even though it follows the requirement specification. In this context, it is called error output. This aligns with the IEEE definition (2).
Examples include Case 2 (correct) and Case 3 from our illustration, whereFalse
is returned when the program cannot perform the task. This error output is caused by an input error. Resource access errors may also cause error output. For example, a program unable to write to a file due to a full disk.
Similarly, some dependency requirement errors may cause error output. For instance, if a program that starts another program as a subprocess cannot find the required program, it may produce an error output.
Terminologies like error message and error code standardize the representation of error output in systems and libraries.Environment Resources: An error is an error output resulting from resource access issues. Resource access refers to any interaction by the program with its environment, including processors, memory, system calls, and remote servers. This type of error is called a resource access error.
Program artifact: An error is an unexpected internal behavior or state of the program caused by problematic, incorrect, or imperfect code. In other words, a programming error or dependency requirement error introduces a fault in the program or its environment, which may cause an unexpected internal state. This internal state is called a program error. This view aligns with the IEEE definitions (3).
An example is the Case 4 (problematic) test scenario from our illustration, where the value offile_path
is unexpected after line 6.
Summary of "error" taxonomies
Configuration Error
Dependency Requirement Error
Error Code
Error Message
Error Output
Input error
Resource Access Error
Program Error
Programming Error
Fault
1. manifestation of an error in software.
2. incorrect step, process, or data definition in a computer program. 3
3. situation that can cause errors to occur in an object. 4
4. defect in a hardware device or component.
5. defect in a system or a representation of a system that if executed/activated could potentially result in an error. 2
Note 1 to entry: A fault, if encountered, can cause a failure. Faults can occur in specifications when they are not correct.ISO/IEC/IEEE 24765:2017 Systems and software engineering — Vocabulary
The IEEE definitions of fault may seem contradictory due to the use of the word error. The term error is used to refer to the causes of faults (1) and the results of faults (3) and (5). However, as clarified in the section on error, the use of error in definition (1) refers to mistakes made during the software development phase (programming errors). In contrast, the use in definitions (3) and (5) refers to program errors.
Based on the IEEE definitions (1), (2), and (3), in the context of software—rather than hardware—we infer that a fault is the presence, absence, or both, of program elements that can lead to a program error upon execution. We refer to this as a program fault.
Program faults eventually lead to program errors, which can then lead to failures. Just as multiple programming errors can be made in the same program, a program may contain multiple program faults. Correcting one fault does not necessarily correct another.
There are two variants of program faults:
programming faults: These are program faults resulting from programming errors in the target program.
Examples of programming faults are the 2 faults introduced by programming errors in the problematic program from our illustration. The first fault is the presence of a backslash (\
) as the file system path separator instead of a forward slash (/
) or an OS-specific path separator (issue-1). The second fault is the lack of a check for the existence of theREADME.md
file path (issue-2).Specification faults: These are program faults that arise not from programming errors but from mistakes in the requirement specification. Although the system can be implemented according to the specification, the behavior is unintended and execution leads to failure. These faults are present in the program due to mistakes in the specification.
Faults are injected into a program either during development or at runtime/operation.
Programming faults are introduced by programming errors during development.
Dependency requirement errors can lead to situations that cause program errors. According to IEEE definition (3), these situations are considered faults. We call them dependency faults. These faults are injected at runtime/operation.
Similarly, a program fault that exists in a program dependency will be injected into the program’s process at run time. This is also a dependency fault. The fault will be injected regardless of whether the dependency is included with the program at build time (static linking) or at runtime (dynamic linking).
A program fault in a dependency library of a reference program may cause a failure in the execution of the reference program. Such a fault is external to the reference program. We call it an external program fault.
A program fault such as mishandling a resource access error within the reference program is internal to it. We call this an internal program fault.
Resources of the reference program are part of the overall software system, and, according to IEEE definitions (4) and (5), they may also have faults. These faults can cause the resources to fail when accessed by the reference program. An example is a poorly implemented web server. We refer to faults in resources as resource faults.
Resource faults can be either permanent or temporary (transient/intermittent).
A resource fault does not necessarily lead to an (external) program fault.
Consider a program that displays the current temperature at a location by calling a weather server web API which returns the temperature as a numeric value in a string format. The program receives this string, parses it to obtain the numeric temperature, and produces an error output if the value is non-numeric.
If the weather server has a fault that causes it to return a non-numeric string, this fault will be handled by the program without causing an external fault.
However, if the weather server has a fault that alters the temperature value and returns an incorrect numeric value as a string, the effect of this fault will propagate to the program and cause a failure.
Note
A fault exists according to its context.
Let’s illustrate this with the example of a function div
that divides an integer a
by an integer b
, and is part of a reference program:
1def div(a: int, b: int):
2 return a / b
If there is a place in the reference program where
div
is called with the possibility ofb
being0
, then the program has a division by zero fault.If every call to
div
in the reference program ensures thatb
is non-zero, then the program does not have a division by zero fault.If the reference program is a library that only exposes
div
in its API, and the documentation specifies thatb
should not be zero, then the function does not have a division by zero fault. However, if this constraint is not specified in the documentation, there is a division by zero fault indiv
.
Summary of "fault" taxonomies
Dependency Fault
External Fault
Internal Fault
Permanent Fault
Program Fault
Programming Fault
Resource Fault
Specification Fault
Temporary Fault
Failure
1. termination of the ability of a system to perform a required function or its inability to perform within previously specified limits; an externally visible deviation from the system’s specification. 2
2. violation of a contract. 4
Note 1 to entry: A failure can be produced when a fault is encountered.ISO/IEC/IEEE 24765:2017 Systems and software engineering — Vocabulary
A program fails when an externally visible deviation from the specification, contract, or expected behavior is observed during execution. A failure can manifest as 5:
- Value Failure (incorrect result): An output value is computed that is inconsistent with proper execution.
- Timing Failure: The service is delivered either too early or too late.
- Halting Failure: The service is never delivered. This occurs when software crashes or has an unintended infinite loop.
Failures do not always have the same consequences. Some are benign, while others are catastrophic.
Moreover, a failure may be perceived the same by all users (consistent failure) or differently (inconsistent failure or Byzantine failure).
Examples of failures are observed in case 2 and case 4 of the problematic program in our illustration. In these cases, the program produced incorrect results.
A failure of a resource caused by a resource fault is called resource failure. Some resource failures can cause the reference program to fail. For example, the illustrative program mentioned in the fault section, which accesses local temperature through a web API, would produce a resource failure if there is a fault in the resource.
Summary of "failure" taxonomies
Benign Failure
Catastrophic Failure
Consistent Failure
Halting Failure
Inconsistent Failure
Resource Failure
Timing Failure
Value Failure
Bug
ISO/IEC/IEEE 24765:2017 Systems and Software Engineering — Vocabulary considers “bug” and fault as equivalent.
However, bugs are often viewed not just as any faults, but specifically as faults that exist in released production or operational software. In this context, faults detected by various testing techniques are not typically referred to as bugs.
Bugs are usually reported after a failure has been observed. The severity of a bug is often tied to the consequences of the corresponding failures, as well as their probability of occurrence.
Defect
1. imperfection or deficiency in a work product where that work product does not meet its requirements or specifications and needs to be either repaired or replaced. 1
2. an imperfection or deficiency in a project component where that component does not meet its requirements or specifications and needs to be either repaired or replaced. 6
3. generic term that can refer to either a fault (cause) or a failure (effect). 7
EXAMPLE:(1) omissions and imperfections found during early life cycle phases and (2) faults contained in software sufficiently mature for test or operation.ISO/IEC/IEEE 24765:2017 Systems and software engineering — Vocabulary
According to ISO/IEC/IEEE 24765:2017 Systems and Software Engineering — Vocabulary, the term “defect” may refer to either a fault or a failure, depending on the context.
Newsletter
Subscribe to our newsletter and stay updated.
Conclusion
In this article, we revisited the meanings of the main terminologies used when referring to software issues. The terminology with the most diversity in usage is error.
Most articles attempting to clarify these terminologies often overlook aspects of error that are not contrary to requirement specification, such as input error and error output.
We also observed that some articles consider only programming errors and not program errors, while others focus on program errors and not programming errors.
To reduce confusion, Heimerdinger and Weinstock proposed avoiding the term “error” altogether, using only “fault” and “failure” to characterize software issues. They suggest viewing each program unit as a reference for the semantics of fault and failure. Thus, what was previously defined as a program error would now be considered a failure of the problematic program unit.
What are the standard methods to mitigate the impact of faults in software? The main approaches are:
- Fault Tolerance: Assume that faults will exist and prevent them from causing failures.
- Fault Removal: Find and remove faults in the software before it is released into production or operation.
- Fault Forecasting: Predict the most likely components to contain faults to reduce the cost and effort of fault removal.
IEEE 1044-2009 IEEE Standard Classification for Software Anomalies, 2 ↩︎ ↩︎
ISO/IEC 15026-1:2013 Systems and software engineering — Systems and software assurance — Part 1: Concepts and vocabulary ↩︎ ↩︎ ↩︎
ISO/IEC 25040:2011 Systems and software engineering — Systems and software Quality Requirements and Evaluation (SQuaRE) — Evaluation process, 4.27 ↩︎
ISO/IEC 10746-2:2009 Information technology — Open Distributed Processing — Reference Model: Foundations, 13.6.3 ↩︎ ↩︎
Barbacci, Mario; Klein, Mark; Longstaff, Thomas; & Weinstock, Charles. Quality Attributes. CMU/SEI-95-TR-021. Software Engineering Institute. 1995. - 4.3.1 ↩︎
A Guide to the Project Management Body of Knowledge (PMBOK® Guide) — Fifth Edition ↩︎
IEEE 982.1-2005 IEEE Standard Dictionary of Measures of the Software Aspects of Dependability, 2.1 ↩︎