Coding standards: having more errors in code than code

August 20th, 2009

I ran LINT version 9, configured to report the violations of the rules in the MISRA C++ 2008 coding standard, on a C++ source file. LINT is perhaps the most famous tool for statically checking C and C++ source code. MISRA stands for the Motor Industry Software Reliability Association, mandating adherence to its coding standards throughout the automotive industry.

The source file I tried has several KLOC worth of code, and the output of the preprocessor takes about 1M – pretty normal for C++ where a "Hello, world!" program generates 3/4M of preprocessed output. The output of LINT takes 38M. That's 38x more errors than code.

We're not finished parsing this output so I'm not sure which rules cause most violations and whether they can be clustered somehow to compress the 38M into something resembling comprehensible narrative in contents and size. The only thing basic attempts at parsing revealed at this point is that the distribution of the violations is roughly geometric, with the majority of the errors reporting violations of a minority of the rules.

Therefore, my only way of conveying some insight into the MISRA rules enforced by LINT is to look at a toy example. My example will be a Hello, world program – 2 LOC or 3/4M worth of code depending on your perspective. I'll assume LINT is told to ignore standard libraries, so it will actually be closer to 2 LOC.

#include <iostream>
int main() { std::cout << "Hello, world" << std::endl; }

From this program, LINT will produce 4 error messages when configured to enforce MISRA C++ 2008:

The "int" in "int main" violates an advisory rule to avoid using built-in types and instead use typedefs indicating the size and signedness of the type, such as int32_t, INT or signed32T. Many an automotive project use a mixture of 2 or 3 of these conventions, which is compliant with the MISRA guidelines and presumably results from the history of merging or integrating code bases and/or teams. (I believe that in the particular case of main, the C and C++ standards both mandate the use of int; I didn't check if you can use a typedef to spell int but I'm certain that you can't have main() return an int32_t on a platform where int is 16b. Anyway, it appears that LINT doesn't bother to special-case main() – but you can do that yourself in its configuration file or right there in the source code, as you will have to do in many other cases.)
The first left shift operator violates a MISRA rule disallowing the use of bitwise shift on signed types, or so it does according to LINT, which presumably checks whether the operands are of an unsigned integral type and reports an error if they are not (the other option is that it figures an output stream or a literal character array are "signed", but I can't see how they can be unless it's a signature we're talking about rather than signedness). The MISRA rule is based on the fact that the behavior of bitwise shift is implementation-defined and thus not portable. I do believe that there does not exist a 32b machine which does not use the 2's complement representation for integers and is a target of an automotive application. A notable share of automotive applications use signed integers to represent fixed point numbers, and I believe all of them rely on the 2's complement semantics of bitwise shifts to emulate multiplication and division.
The second left shift operator is reported as violating the same rule.
The two left shift operators as a whole are reported to violate the rule disallowing dependence on C operator precedence. That is, in order to correctly understand this program, a reader would have to know that (std::cout << "Hello, world!") would be evaluated first and then its output would be shifted to the left by std::endl. MISRA strives to prevent confusion, based on a well-founded assumption that few programmers know the rules of operator precedence and evaluation order, and LINT enforces the rules defined based on these premises.

I hope this gives some insight on the general code/errors ratio.