Just say no to cut-and-paste coding

¶Just say no to cut-and-paste coding

Far more often than I would like, I see code like this:

if (p[x+1]->GetSomeObject()) {
    p[x+1]->GetSomeObject()->DoSomething();
    p[x+1]->GetSomeObject()->DoSomethingElse();
}

...with lots and lots of redundant expressions. I've always wondered why people code like this, because it's an awful practice. Is it concise? No, it's nauseatingly redundant. Is it fast? No, it has lots of redundant operations. My only plausible guess is that it is easier to type somehow because the person who wrote it was adept at cut-and-paste operations, and even in that case it seems sub-optimal, when you could write this instead:

Blah *base = p[x+1];
SomeObject *obj = base->GetSomeObject();
if (obj) {
    obj->DoSomething();
    obj->DoSomethingElse();
}

I see this anti-pattern most often in C++, but that's mainly because I look at C++ code most often; I've also seen it in C# and other languages too. It seems widespread enough that I can only conclude that either:

People are lazy, or
It's being taught somehow in some central venue.

The first is a virtual certainty, but the second is something I can't discount, because I've seen some horrid practices taught in avenues of learning before. Academia is notorious for being littered with brilliant researchers who stink at programming, and so perhaps picking on it is a bit too easy; on the other hand, I still have nightmares about a popular C++ data structures textbook in which the author's code called exit() from a constructor to flag an error and had "friend void main();" in a class declaration.

Probably the most irritating issue with this practice, though, is that I've heard it defended with "the optimizer will fix it." Well, readability issues aside, this assumes there is an optimizer at all, which there isn't in a debug build. Without an optimizer, such explicitly redundant code can execute very slowly... to say nothing of being a pain to step through during debugging as well. This also puts unnecessary strain on the optimizer, making it re-deduce simple equivalences over and over when it should be spending its cycles doing more interesting optimizations like virtual method inlining and auto-vectorization.

The third problem is that it assumes the optimizer can remove the redundancy at all. In fact, it's fairly likely that it won't, because optimizers have incomplete information and are necessarily very pessimistic. In the above case, an optimizer has to assume that the return value from GetSomeObject() may change with each call unless it has concrete information indicating otherwise — and given code visibility, aliasing, and polymorphism concerns, this determination frequently fails. If there is any valid way in which the return values from the calls would not be the same, no matter how unusual, weird, or useless the code would be, the optimizer cannot collapse the calls. If it does, that's called bad code generation, and nobody wants that.

Don't be gratuitously redundant. Hoist out common subexpressions to variables, and both humans and compilers will appreciate it.

21 comments | Jan 19, 2008 at 01:32 | default

Current version

Navigation

Archives

¶Just say no to cut-and-paste coding

Comments