dtm: (Default)
[personal profile] dtm
So my coworker may have found a bug in one of our big jewels-of-the-company core libraries. We send off the bug report along with a short sample program that demonstrates the bug; to make a long bug short, it tries a certain sequence of operations in a specific order and fails (the library call returns a bizarre error status) half-way through.

We get back the message: "we couldn't reproduce the effects you mentioned; when we compiled the code here it coredumps at the first attempted operation".

So we go back-and-forth for quite some time about build environments, linking issues, and whatnot, and eventually are granted access to the exact files they are using to compile with. (we asked for this in order to get the options being passed to the compiler; they also sent us the copy of the demo program source code that they were working with) And we attempt to compile it, run it, and sure enough, it coredumps, but it prints a little message first. Well, we'd never put in such a message so we ask them about it and they say "Oh, we just modified your code to put little "here I am" messages in it."

Ok, we think, no problem, so we take the code back to our environment and attempt to compile it there - and it coredumps on startup. So we then decide on a whim to do a diff with our original code.

They had changed the library function call we had been using to one that did the same thing but with different semantics. They had adjusted most of the parameters, but not all, so the effect was that they were passing the library function call a null pointer.

We fix their code, and then my coworker (who is regrettably very soft-spoken on the phone; I have a feeling that he might be able to deliver a severe tongue-lashing in Russian, but English isn't his forte) explains very carefully to them the problem that they had and eventually they agree to try our fixed version.

Beware the programmer who claims only cosmetic changes, for they will change deep behavior at a whim and then deny it unto heaven.

The real problem, as I see it, was not so much C programming mistakes (which are forgivable, especially in the person whose job it is to screen bug reports before handing them back to the big library gurus), but the fact that the person who modified our code to use a different function call didn't tell us.
And now a small rant about C and the equivalence of a null pointer and an uncasted-to-anything 0.

The reason this guy didn't catch the passing-a-null-pointer error was because the version we had was calling this:

void ourfunctioncall(...., int misval, ...)

For that parameter we were using a preprocessor symbol defined in our big-jewels-library's header file. Call it NVAL. (The names of all symbols and functions have been changed to protect the innocent). So our function call looked like:

ourfunctioncall(...., NVAL, ...)

When the person who modified our code decided to use a different function call, they used something with the prototype:

void otherfunctioncall(...., int *misvalarray, ...)

The function call they were using, as you might guess, differed from our version in that it took a whole array instead of a single value. They modified all the other parameters to be array, but they missed this one. And the compiler didn't complain.

Why? Because NVAL was defined in the preprocessor to be 0. Had it been any other value, the compiler never would have accepted it as valid for an int * argument (at the very least they would have gotten an implicit cast warning), but 0 is special. This is one of those deeply embedded design flaws of C. null should be a special reserved word in the compiler, and 0 should not be an acceptable substitute.

(I know, I know, it would break almost every bit of C code out there. I still think it's a flaw in C.)

September 2024

S M T W T F S
1234567
891011121314
15161718192021
22232425 262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 6th, 2025 06:57 am
Powered by Dreamwidth Studios