Home Logo

LinkedIn Logo

Twitter Logo

Blog

A few Billion Lines of code Later: Using Static Analysis to Find Bugs in the Real World

2016-04-18

In this article authors document their experience of commercializing a static-analysis based bug finding tool. In particular, authors outline typical pitfalls that future tool-smiths will find useful to transition an awesome research prototype into a functional and viable commercial product.

Article link

Abstract

In 2002, COVERITY commercialized a research static bug-finding tool. Not surprisingly, as academics, our view of commercial realities was not perfectly accurate. However, the problems we encountered were not the obvious ones. Discussions with tool researchers and system builders suggest we were not alone in our naïveté. Here, we document some of the more important examples of what we learned developing and commercializing an industrial strength bug-finding tool.


Thoughts

Overall, a well-written easy to follow article. Although the lessons summarized here are in the domain of software defect detection, the advice in general is valuable for tool-smiths eager to transition their research prototype into a product. I next summarize some of their observations.

You can't check code you don't see.

"How do I run your tool?"

"Oh, it's easy. Just type 'cov-build' before your build command."

"Build command? I just push this [GUI] button..."

Authors provide instances, when their tool could not even get to the source code that needed analysis. This is often attributed to various flavors, including non-standard and outdated practices, of build processes involved. In the process of hacking their way through the build system if authors found issues with the process, customers were hostile towards suggestions of fixing or upgrading the build system. This phenomenon is attributed to the additional work perceived by a customer to make the tool run.

The tools needs to adapt to the target environment. Expecting the the target environment to change is a pitfall. A sentiment that has been echoed by Sadowski and colleagues in their work: Tricorder: Building a Program Analysis Ecosystem.

You can't check code you can't parse.

The C language does not exist; neither does Java, C++, and C#. While a language may exist as an abstract idea, and even have a pile of paper (a standard) purporting to define it, a standard is not a compiler.

Authors anecdotally comment on how language is an abstract idea only concretized by the compiler used by a programmer. With multitude of compilers and compiler extensions available, a simple activity of parsing code, which is pre-cursor to static analysis, has been rendered non-trivial. In some cases changing the compilers require costly re-certifications, for instance in safety and mission critical software. In other cases, customers resorted to non-standard use of compiler extensions especially in embedded system domains to strive for efficiency while sacrificing standards.
Furthermore, customers resort to questioning the abilities of tool itself for not being able to reliably parse a piece of code that clearly compiled in their environment.

This problem was something, that I personally as a software engineering researcher had not really thought about.

Do bugs matter?

"So?"

"Isn't that bad? What happens if you hit it?"

"Oh, it'll crash. We'll get a call." [Shrug.]

Authors comment on how customers often do not agree on defect severity, even though defect are generally seen in negative connotation. Oftentimes, customer retort to blaming the tool is broken for a reported defect they do not understand or think is trivial. Authors comment that, generally a defect report that is not-understood by the customers is deemed as false-positive.

These days I have been very fortunate to collaborate with Brittany Johnson the related area of studying miscommunications in tool notifications.

How to handle cluelessness.

You cannot often argue with people who are sufficiently confused about technical matters; they think you are the one who doesn't get it.

Authors provide a valuable gem in dealing with customers that lack the required technical skills to appreciate tool output. Authors recommend to have the tool demonstration presentations in lage groups at client-site. Their insight is that, a large group is likely to increase the probability that atleast one person appreciates the tool output as an outcome of having directly dealt with the issue reported by the tool.

I find this observation very intersting. In retrospect, it is easier to connect the dots. For instance, whenever I presented my work to an audience outside my area of work (say industry day presentions) I was fortunate to find a champion in the crowd to support me in my claims by relating my technical-findings to the issues they encountered.

What happens when you can't fix all the bugs?

If you think bugs are bad enough to buy a bug-finding tool, you will fix them. Not quite.

Authors present the insight that not all of the reported defects will be fixed, even though customers are paying good money to find defects. The general heuristic is that if as a development team you have not touched a piece of code for years, do not modify it (even for a defect fix) to avoid any breakage.

The stated heuristic has some research grounding in the work of Yin and colleagues in their work:How Do Fixes Become Bugs? A Comprehensive Characteristic Study on Incorrect Fixes in Commercial and Open Source Operating Systems. In the work they report on how bug fixing can also introduce newer defects into the system.

Do missed errors matter?

Given previous observation that all reported defects will not be fixed, tool-smiths may naively assume then the defects that have not been reported do not matter. However, unreported defects or False Negatives do matter as clients often-times deliberately introduce a defect into the source code, for a tool to detect. Not finding such defects leads to tool being labeled inefficient and loss of business. Thus, False Negatives must be taken seriously.

Churn

Users really want the same result from run to run.

Authors argue that the tool users often do not take it kindly for the different output generated by the tool over relseases attributed to analysis.

However, there is gradual shift in the industry towards DevOps : a rapid and incremental releases of software systems. Given this shift of paradigm, it will be intersting to see whether the churn property as described by authors will still hold.

Myth: More analysis is always good.

While nondeterministic analysis might cause problems, it seems that adding more deterministic analysis is always good. Bring on path sensitivity! Theorem proving! SAT solvers! Unfortunately, no.

Authors argue that best analysis are those that can run cheaply and efficiently even though being non-deterministic. A sentiment that has been echoed by Sadowski and colleagues in their work: Tricorder: Building a Program Analysis Ecosystem.

No bug is too foolish to check for.

Authors state that given enough code, developers will write almost anything you can think of. They argue that foolish defects are typically of the serious nature.

False Positives

False positives do matter. In our experience, more than 30% easily cause problems.

Authors present a very different take on why false postives matter. From a business perspective, an expensive tool needs someone with power from client side to champion it. Such people often have at-least one enemy. Having false positives (a significant number of them) is equivalent to supplying ammunition that would embarrass the tool champion internally, resulting in loss of business.

This observation forced me to think of false positives in a social light as opposed to a statistical measure that I am used to. :smile:

Overall, I throughly enjoyed reading this article.