Being on both sides of the bug database

¶Being on both sides of the bug database

Bug databases are fun. As a software developer, it's where you get to hear all of the ways in which you were stoned and the code you checked in last night was utter crap. If you're lucky, it just causes a minor problem, you check in a fix, quality assurance (QA) reports it as good, and you're done. If you're unlucky, you get the mystery bug from hell that everyone encounters on a daily basis but for which no one can identify the true cause. As no software is bug-free, handling bugs can seem Sisyphean at times, the list of bugs continuously growing almost as fast as you can shrink it.

Submitting bugs on someone else's product is an interesting experience, because in that case I'm on the other side of the fence — the tester instead of the developer. It's a bit humbling to have your bug kicked back as By Design because you missed something in obvious in the UI. You learn to be a bit friendlier to testers when you have to fight to get a bug fixed the same way that they do. It also quickly shows that the worst kind of tester you can have is another developer, because they keep wondering why you can't fix the bug that seems so blindingly obvious to them (even though they don't have the source code).

Reproducing a bug

One of the major prerequisites for fixing a bug is getting it to happen locally. Occasionally you get a bug that is so blindingly obvious that you don't even need to run the program to find broken code, but that's rare, and even then that might not be enough — because the code bug you found might not actually be the one you're looking for. Besides, if you can't reproduce the bug, it's hard to tell if you've actually fixed it. I can't stress enough that finding a solid reproduction case greatly increases the chances of a bug getting fixed. At a very minimum, it'll take at least two tries to fix a bug, one to reproduce the bug, and a second to verify that it doesn't happen with the fix. It'll usually take a lot more, to ensure that the bug doesn't happen randomly, because the first few fixes don't work, or to better verify the one that does. This is much more likely to go smoothly if the bug happens every time and in thirty seconds than if it only occurs 10% of the time and after three hours.

The best reproduction steps are those that are really easy and really quick. For instance, let's say VirtualDub crashes if you hit 'x' in the main window. Chances are I can fix this bug in five minutes, because I can get it to happen really quickly, there isn't a lot of code involved, and I can very quickly verify that it doesn't happen any more after I've fixed it. If I get a bug report that a crash happens about every tenth run of a 100GB file, that's bad news, because not only does it take forever to test, but odds are I can't create the same 100GB file, and I'd have to repeat the test about fifty times to be sure it didn't happen anymore. I rarely ever fix a bug on such a description; usually I have to go back and forth with the user to narrow down the cause, determine a smaller and faster set of repro steps, or meditate on the crash report and hope that it includes additional clues to narrow down the faulty code path.

Involving third-party code in a repro case is a bad idea. It makes the bug harder to test, because you have to get the third-party component and figure out how to use it, and it's not yours so you usually can't debug it, modify it, or look at the source code. Sure, it might be open source, but even then it's not necessarily code that your familiar with or that you can change (especially if it comes with the OS). Worst of all, third-party components can have bugs; vendors write and ship buggy code like everyone else does. Some crashes, most notably resource leaks or memory trashing problems, can be extremely difficult to track to the culprit, even with a debugger. Thus, whenever I report or try to reproduce a bug, I always attempt to exclude third-party components from the repro case whenever possible. That way, nobody is wondering whether the bug is in the main program or in XYZ.DLL — it's undoubtedly your fault.

Incidentally, a sure way to piss off a developer is to report that a bug occurs 100% of the time and then admit later that you only saw it happen once and didn't try to reproduce the problem. Knowing whether a bug consistently occurs or not is valuable information, because it can indicate a random or timing-sensitive condition that may help isolate the location of the buggy code. Also, this generally means that there is irrelevant information in the bug report that can mislead, such as saying to press Q/R/T keys in a specific order when the bug occurs whether you do that or not.

Finally, if you know that only specific versions of a program have a particular problem, that can be critical information. For instance, say a user knows that only versions of VirtualDub between 1.5.6 and 1.5.8 trash video with a particular filter. That is very valuable information, because I keep source code for all versions of VirtualDub in a Perforce depot and can immediately isolate the problem to changes that went into those versions. The narrower the range, the fewer diffs in the suspect list. If that is not enough to find the bad code, I can start with the 1.5.5 code and begin introducing diffs until the problem occurs. This can greatly shorten the time to fix. People looking at bugs I've submitted at the Microsoft Product Feedback Center might wonder why I indicate whether Visual C++ 6.0 and Visual Studio .NET 2003 also have a bug that I report on Visual Studio 2005; this is the reason.

Forwarding bugs

Ah, shifting the blame. Always fun to do, especially if you supply comments along with the forward. Sure, you could send the bug back to the testers as just "I can't reproduce," but why not push the stake in farther and include comments to the effect of "we think the problem may have something to do with your testing methodology" or "we would like you to reconfirm that this is happening"? Or even better yet, forward it to some unsuspecting peer whose list is too short for his own good. The best is when you can put on the asbestos suits and experience of full-fledged war of "it's a bug - it's not a bug" between engineering and QA.

Or not.

Although there is a lot of variation in the way different bug databases tend to classify bug status, there are also a lot of common themes. There are New bugs, bugs that have been Reproduced, and bugs that Couldn't be Reproduced. Then there are bugs for which new code has been checked in and are Probably Fixed, bugs that have been Fixed, and then bugs that still Aren't Fixed even with the new code. And then, there are the dreaded bugs that Can't be Fixed or Won't be Fixed in time for ship, because it's too risky. And finally, for the ultimate slap-from-a-white-glove, there's Not a Bug.

It can be very tempting to think that a tester is simply smoking hemp and punt the bug back as doesn't-happen, but if you're the submitter of a bug, this feels like you've been called a liar to your face. Machine configurations and usage patterns are varied enough that usually both sides are right — it always happens to the tester, and it never happens for the developer. Where conflicts can really occur is when development admits to a bad bug but says it won't be fixed in time, which can lead to QA massing a campaign to reverse that decision.

The tags used to forward bugs can contribute to this. Personally, I think that "Won't Fix" was a bad forwarding option to use on the Microsoft Product Feedback Center, because it implies that development doesn't want to fix a problem, whereas "Can't Fix" would indicate that there are reasons why the bug can't be fixed, such as someone depending on the bug. If I could set up a bug database, I would want the following set of tags:

no sh*t
oh sh*t
your sh*t
bullsh*t

Such tags would be clear and unambiguous in their meaning.

Yes, this is a bug, but I won't fix it. Nyaa-nyaa!

Q: If you can verify that a bug occurs and have a confirmed fix ready to go, why wouldn't you fix the bug?
A: Because you don't know what else would break.

A regression occurs when a change is made to a code base that introduces a new bug or causes a previously fixed one to reappear. Regressions are bad news, because they not only mean you aren't making progress toward a less buggy release, but you actually might be going backwards. Often these happen because of oversights when writing the fix, but sometimes they also happen indirectly — the fix might be correct, but it might change conditions such that another dormant bug pops up much more often. The more frequently regressions occur, the harder it is to fix bugs. A gnarly code base with lots of hidden and unexpected connections between code is more prone to accidental regressions, so regression rate can be an indicator of how bad a code base is.

Amusingly, regressions can also occur because two bugs cancel each other out. For historical reasons, VirtualDub's filter system works with bitmaps that are stored upside-down, because that is the memory order for scanlines in a Windows device independent bitmap (DIB). Well, some filters ignore this and process the bitmaps as if they were stored right-side-up. This works because there are two oppositional bugs: the bitmap is read backwards, but it is also written backwards. There is no harm in this case, but let's say one of the filters had a mode that tried to put a mark in the upper-left corner, and due to the flip put it in the lower-left instead. If someone were to try to "fix" this by making the filter read the bitmap correctly, the output would flip and be wrong. The person would have to find the other bug in the output code to make a correct fix. Of course, the other alternative would be to make the marking code wrong too, which would fix the bug but add another land mine onto the pile. Yes, this is lame, but it can happen if you're up against a shipping deadline.

What this means is that fixing a bug isn't limited to testing the portion of code that was buggy. It also requires general testing of the area in question to ensure that a bug hasn't cropped up anywhere else. This means that the bottleneck in fixing bugs isn't always the software developers; it can also be testing. If the bug happens to be in low-level code, such as a file I/O layer, it's possible that the entire program may have to be re-tested. Considering that the number of possible code paths can increase exponentially with the number of variables (particularly configuration options), it shouldn't be a surprise that regressions can and often do occur. The closer a program is to shipping, the more reluctant everyone has to be to accept a fix, because it may make the program less stable. Sucks, doesn't it?

There are some bugs, of course, that will always be fixed regardless of regression risks. If the next build of a program starts crashing on startup on all machines, it's hard to imagine how a fix could make the situation any worse.

You might say that the reluctance to fix bugs is due to having rigid ship dates, and that open-source development doesn't have this constraint: the program ships when it's ready. In some ways, that's true. However, open-source software can suffer from the opposite problem, where the lack of ship dates means that there are no stabilization phases, and thus the project has a constant flow of bugs as well as bug fixes. Larger projects that are well-managed tend to have milestones, alpha and beta periods, and check-in procedures to address this problem. If you're a small team, though, or even a single developer, it can be hard to justify having strict beta periods and parallel stable/unstable branches. It's really a question of development and release processes, not an open source / closed source question.

Finally, you might ask if there is a way to prove that code is bug-free, instead of hoping that it is a relying on testing to catch the failures. It isn't possible in the fully general case, but with appropriate restrictions on coding techniques it is possible to formally prove a program correct. Doing so can very quickly become prohibitively expensive, though: verifying a simple thread synchronization algorithm can involve a 30+ node sequence graph. The cost can be justified if you're writing control software for a missile, but not for most consumer-level software. The software industry will have to move toward more proactive strategies for avoiding bugs as software becomes more complex, but it isn't feasible to formally prove real-world applications right now.

5 comments | Dec 21, 2005 at 23:43 | default

Current version

Navigation

Archives

¶Being on both sides of the bug database

Comments