Reasoning Your Way to Linux
Study Shows Open Source Code Superior

Steven J. Vaughan-Nichols
Monday, April 7, 2003 12:22:20 PM
All open source programmers believe in their heart of hearts that
open source is not only the best way to write software, it produces
the best possible software. It's a point that's been argued endlessly,
but until recently there hasn't been any hard proof from an non-partisan third party demonstrating that open source code was actually
superior to closed source. Now, thanks to research
(http://www.reasoning.com/downloads/opensource.html) done by
Reasoning, a leading automated software
inspection service vendor, objective proof is here: Open source is better.
Or, to be more precise, Reasoning, using their automated C and C++ source code inspection service, Illuma
(http://www.reasoning.com/solutions/index.html), found that there were 0.10 defects per thousand lines of source code (D/KLSC) in
Linux's 2.4.19 TCP/IP stack compared to an average of 0.55 D/KLSC in
five different proprietary TCP/IP implementations. Four of the five
proprietary stacks have been on the market for over ten years. In
short, Linux triumphed over mature, proven programs.
TCP/IP was chosen as the target for this study because it's the
"fundamental protocol that underlies the Internet" and "the
functional requirements are well defined and stable, the
implementation is non-trivial, and it is a critical component of
every computer system and many embedded devices." In addition, by
keeping the study's focus on a narrow, well-defined area, Reasoning
was trying to avoid the apples and oranges problems of comparing full-scale applications and operating systems.
To put Linux's results in a broader context, In Reasoning's most
recent analysis of 200 commercial projects totaling 35 million lines
of source code, 33% of these programs had D/KLSCs below 0.36; 33% had
D/KLSCs between 0.36 and 0.71, and the remaining third had more than
0.71 D/KLSCs. "Thus, the TCP/IP implementation in the Linux operating
system ranks in the upper third, while the composite code quality of
the commercial implementations ranks in the middle." In short,
Linux's TCP/IP implementation is excellent while the six commercial
implementations, while not awful, are in the middle of the pack.
If .71 D/KLSC sounds good to you, you're not a programmer. For
mature programs, it's downright lousy. And since Illuma is designed
to seek out critical coding errors, such as memory leaks, NULL
pointer references, out of bounds array accesses, and uninitialized
variables, there are programming mistakes that come back to haunt
first users and then developers. These are the kind of foul-ups that
can lead to program lock-ups and even system failures.
Now Illuma is not perfect. For example, what appears to be an
uninitialized variable in Linux's TCP/IP stack turned out to be a
variable that's assigned before use by a tiny built-in interpreter.
Still, the problems it finds are one that any developer worth his
salt needs to investigate further if for no other reason than to
document questionable code that could be mistaken for an error by a
human quality assurance programmer. And, beyond that, the simple fact
remains that the open source code had less than 20% of the average
errors of the proprietary code.
Why is that? Reasoning doesn't take a position, but simply lists the
usual reasons given by open source advocates. For example, open
source users don't just report bugs, but actually track them and fix
them. And, with peer source code review, defects tend to be found
quickly and only the best code survives. This, in turn, means that
programmers will present only their best efforts since they know that
the only way to rise to the top of the open source world is to
deliver excellent code that can withstand public scrutiny.
The study itself has some problems. Due to confidentiality
agreements, for example, we don't know what five proprietary TCP/IP stacks were
reviewed.
Automated code testing itself has in the past been known to result in
many false positives. Reasoning tries to avoid these problems and
minimize the time it takes to deal with such problems by including in
its error reports the location and circumstances of problems and
using statistical analysis and other tools to identify the parts of
the code with the greatest risk, so developers can focus their
energies on what are potentially the most critical problems.
Still, when all is said and done, the bottom line is quite simple.
Open source code is much cleaner than proprietary source code. What more need be said?