Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Just say no to cut-and-paste coding

Far more often than I would like, I see code like this:

if (p[x+1]->GetSomeObject()) {
    p[x+1]->GetSomeObject()->DoSomething();
    p[x+1]->GetSomeObject()->DoSomethingElse();
}

...with lots and lots of redundant expressions. I've always wondered why people code like this, because it's an awful practice. Is it concise? No, it's nauseatingly redundant. Is it fast? No, it has lots of redundant operations. My only plausible guess is that it is easier to type somehow because the person who wrote it was adept at cut-and-paste operations, and even in that case it seems sub-optimal, when you could write this instead:

Blah *base = p[x+1];
SomeObject *obj = base->GetSomeObject();
if (obj) {
    obj->DoSomething();
    obj->DoSomethingElse();
}

I see this anti-pattern most often in C++, but that's mainly because I look at C++ code most often; I've also seen it in C# and other languages too. It seems widespread enough that I can only conclude that either:

The first is a virtual certainty, but the second is something I can't discount, because I've seen some horrid practices taught in avenues of learning before. Academia is notorious for being littered with brilliant researchers who stink at programming, and so perhaps picking on it is a bit too easy; on the other hand, I still have nightmares about a popular C++ data structures textbook in which the author's code called exit() from a constructor to flag an error and had "friend void main();" in a class declaration.

Probably the most irritating issue with this practice, though, is that I've heard it defended with "the optimizer will fix it." Well, readability issues aside, this assumes there is an optimizer at all, which there isn't in a debug build. Without an optimizer, such explicitly redundant code can execute very slowly... to say nothing of being a pain to step through during debugging as well. This also puts unnecessary strain on the optimizer, making it re-deduce simple equivalences over and over when it should be spending its cycles doing more interesting optimizations like virtual method inlining and auto-vectorization.

The third problem is that it assumes the optimizer can remove the redundancy at all. In fact, it's fairly likely that it won't, because optimizers have incomplete information and are necessarily very pessimistic. In the above case, an optimizer has to assume that the return value from GetSomeObject() may change with each call unless it has concrete information indicating otherwise — and given code visibility, aliasing, and polymorphism concerns, this determination frequently fails. If there is any valid way in which the return values from the calls would not be the same, no matter how unusual, weird, or useless the code would be, the optimizer cannot collapse the calls. If it does, that's called bad code generation, and nobody wants that.

Don't be gratuitously redundant. Hoist out common subexpressions to variables, and both humans and compilers will appreciate it.

Comments

Comments posted:


Well, I'm not a casual C/C++ programmer, nor I can be thought of as one. Heck, the language I use mostly is MC68000 assembly!

Now, given the facts that I:

a) Don't like C/C++ too much
b) Haven't had a proper course in C/C++ in my life, nor I intend to (see (a))
c) I mostly taught C by myself, and googling around for some syntax issues I couldn't figure out on my own
d) I only had to code in C (dunno if it had C++ bits in it, it probably did) at work and that included small comms s/w that were about the only processes running on a P4 @ 2-3Ghz with half a gig of RAM.

I can declare that I didn't know (or think) of that idea you mentioned in your post. I've seen something similar to that in VB ("with" I think? It's not important anyway) and at some points I wish I had something like this when I wrote in C. But in the end cut-and-paste programming did work ok. Anyway I think it's safe to assume that just context switching between the system idle process and my program would take more clock cycles than those won by defining a pointer at some points.

It's my personal preference that when I want to stress my mind and code something optimal I use assembly directly. Any other language for me is used for convinience, getting something up and running fast, not for performance. That is not to say that I wouldn't use something I learned that seems elegant (that would make me stupid and stubborn). But I wouldn't look around for it if I didn't read it here.

(and yes, I know that properly written C can be pretty close or equal to hand tuned assembly. but learning to do this, figure out the compiler quirks and feeding it the right sequences reminds me of people back in the '90s that used an Atari ST language called GFA Basic to produce demos - if you took a look at the source code, it looked much messier than writing it in assembly, pure masochism)

ggn (link) - 19 01 08 - 04:36


I completely agree. But if that were the biggest of problems...

I especially hate when lazy people who code 200 line functions which could be split, on top of everything, call it func1 or something meaningless like that, then copy paste the ENTIRE function, change one line of code, and then call it func2. Now lets copy paste func2 to another file and call it func1 too! OMG.

And then, it is usually my task to fix this and attempt to teach these people at work what they are doing wrong, and the worst part is that most times, noone understands why it pisses me off so much.

Imagine trying to teach this people to use source control properly, or test-driven development.

It's so tempting to give up!

Mastermnd - 19 01 08 - 04:39


Howdy,

as an old programmer myself, I also was always upset from those repeated expressions someday. But then, that was the golden era of programming, for 640k... :)

Since then, the compiler technology has got so smart, where the compiler itself can do more optimization than most of the programmers. So, the criticized expression gets simplified by the compiler itself to the shown - corrected - example. If you care enough, the compiler can generate source code, assembly and c, you can watch it and see, how it can optimize further your very optimized code.

Gyorgy Kenez - 19 01 08 - 11:54


It's not the same thing, but pointers aren't always the most optimal way.

1. for(int i = 0; i < count; i++) ptr[i] = i;

mov edi, eax
xor eax, eax
$LL12@wmain:
mov DWORD PTR [edi+eax*4], eax
inc eax
cmp eax, 100
jl SHORT $LL12@wmain

2. for(int i = 0; i < count; i++) *ptr++ = i;

mov ecx, edi
xor eax, eax
$LL6@wmain:
mov DWORD PTR [ecx], eax
inc eax
add ecx, 4
cmp eax, 100
jl SHORT $LL6@wmain

As it can be seen in the first case the compiler can map indexed addressing directly to assembly code (or unroll the loop if it wants to).

Gabest - 19 01 08 - 12:28


C# has the notion of properties. for example:

SomeObject obj = new SomeObject();
if(obj.SubObj1.SubObj2 != null)
{
obj.SubObj1.SubObj2.A();
obj.SubObj1.SubObj2.B();
}

the reason for using the pattern you critisized in your article is twofold:

1. most properties are just {return backingField;} and {backingField = value;}, the compiler inlines the code and it become nearly es efficient as the optimized version you propose (it still has one pointer dereference more).
2. visual studio intellisense helps you when writing this pattern. you type:
SomeObject obj = ne ();
if(ob.S.S != nu)
{
ob.S.S.A();
ob.S.S.B();
}

as you can see the effort to write the duplicate subobject-access ist less than declaring an new variable.

the language is a tool for your job and you dont code in order to code but in order to get your job done. applying your pattern takes me 5 seconds which equals 10cents for my boss. is it worth spending 10cents to save 10 cycles? if you answer yes, then you also have to optimize many other examples of inefficient code with the same argument. this of course assumes that the code in question is no hot spot eg requires no optimization.

Tobias Rieper - 19 01 08 - 13:52


Performance is not the issue here. Damn, we live in an era now where computers have quad core processors and 4GB of RAM to do simple operations. Yes, compilers optimize, but in fact, nowadays interpreted programs in .net, java and python run fast enough. No need to do ASM or C unless you are developing some sort of special drivers or maintaining some old code.

What we ALL should be concerned about is not the darn 10 cents your boss looses when you type a variable declaration (you should get some speed typing lessons!!!) but it's the hundreds, if not thousands of dollars they will save on maintenance when your code has to be maintained by someone else in the NEAR future. We all know documentation always gets outdated and becomes a useless piece of crap. Noone ever bothers to update comments either (you can object it also costs money, but in the end, it's just lazyness, admit it), so AT LEAST code properly

Keep your code clean and follow the patterns!!!!!

Mastermnd - 19 01 08 - 15:33


Boy, I hope none of you who have poo-pooed this idea have ever had to debug someone else's code written like this. Ever tried to debug a crash in a build where someone wrote:

p->X()->Y()->Z.q().w()

Whoops, the debugger didn't save any of those intermediate pointers, so there's no easy way to look at the intermediate return values. Whoops, the stack trace only gives me line number information so I don't know which expression blew up. Congratulations, you saved 10 cents of time when you wrote the code and forced someone else to spend $1 later to clean it up.

As for C#, there are lots of cases in which properties do NOT have trivial implementations or are polymorphic. In WinForms code this can be very expensive and sooner or later people will wonder why your program slows to a crawl when 100 items are in a list. And do you name all of your properties S or do you use descriptive names that take a bit longer to type?

The true cost of what you write is determined by what it takes to write AND maintain it. If you are habitually coding in a manner that saves you time up front and causes both performance and debugging headaches down the road it is a net loss for your project.

And as for the optimizer, once again, NO, it will not always fix this mess for you. Virtual method? No, some subclass implementations may not be visible or even read yet. Possible aliasing? No, can't do it. Exceeded inlining depth or length? Can't do it. JITter out of time? Can't do it.

Phaeron - 19 01 08 - 16:11


Phaeron wrote:
"... And do you name all of your properties S or do you use descriptive names that take a bit longer to type? ..."

All though I agree with you, especially on readability, (have now idea on how good the c# optimizers are) I think you misunderstood Tobias Rieper's 2.
SomeObject obj = ne ();
if(ob.S.S != nu) ...
will of course be expanded with the help of intellisense be expanded to
SomeObject obj = new ();
if(obj.SubObj1.SubObj2 != null) ...
as you write it.

Henrik - 19 01 08 - 17:32


Hi, I totally agree with Mastermnd and Phaeron. I meet with this "opposite of the DRY (Don't Repeat Yourself) pattern" in my everyday work and I have never fully understood why most people write such a terrible code. I think in general it's because they are lazy or maybe not smart enough to really see the advantages of writing a proper and well documented code. Yesterday I even wrote an article about a similar issue, but it's written in Czech so unfortunately you probably won't understand it (but if so: http://malo.bloguje.cz/645724-mozna-jsem..). I'm happy that there are people like you guys who have the same opinion!

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." (Martin Fowler)

peta - 20 01 08 - 11:54


Ah yes, the the horror of watching countless javascripts doing the old: if (i

Saribro - 20 01 08 - 17:05


(Sorry for that empty comment, but a preview which requires you to retype the comment to post it is a bit strange...)

On one hand, I also get this itch if I see:

if (p[x+1]->GetSomeObject()) {
p[x+1]->GetSomeObject()->DoSomething(); p[x+1]->GetSomeObject()->DoSomethingElse();
}

on the other hand, once I’ve changed it too:

Blah *base = p[x+1];
SomeObject *obj = base->GetSomeObject();
if (obj) {
obj->DoSomething(); obj->DoSomethingElse();
}

I must admit that the first one is much clearer to me (aside of the possible side effect issues with DoSomething changing the result of GetSomeObject somehow) and ‘looks better’. But why? I can think of a few possbile things: – it’s still 2 lines less – it repeats more code… This is what is bad about it, but humans are basically pattern recognition machines, so it is probably also why it ‘looks’ good to me? – you immediately see why it is doing what it is doing. With the second one, you’re 3 lines in before you find out why it is doing the stuff in the first 2 lines, and then you have to go back and check exactly what was done.
In fact, I think the two combine to: the first one is more readable.

(Not intended as flamebait, I struggle with this a lot: writing code that looks good and is easy to read.)

Legolas - 20 01 08 - 18:34


sorry, that should have been @ Legolas and others

(I didn't realize the comment's author was displayed _beneath_ the comment, until I saw my own. And then for a split second I thought mine was empty too =)

Bramz (link) - 21 01 08 - 05:57


ARGH, I'm afraid I'm not really getting the way to submit comments here. I keep screwing things up. Anyway, here it goes again. Third time's a charm ...

@ Legolas and others,

You know, there is a little known construct in C++ that allows for the definition of obj _inside_ the if test:

if (SomeObject* obj = p[x+1]->GetSomeObject())
{
obj->DoSomething();
obj->DoSomethingElse();
}

IIRC this was invented especially to support type switches: lots of if tests like if (A* a = dynamic_cast(p))

The reason why the fixed fragment doesn't look so clear is because it takes a split second to realize what the obj in the if statement is. Somehow, we skim its definition and only start to wonder what it is when we start doing something with it.

Nevertheless, the best argument against the original code fragment is the day when someone comes it to fix p[x+1] to p[x] and ... well, you can guess the rest ;) For me that's sufficient.

bramz (link) - 21 01 08 - 05:59


i am delighted that my comment initiated such an intensive discussion ;-) i want to clear things up a bit because i think my message has been partially misunderstood.
1. as i already wrote performance is not the issue here (i agree with mastermind regarding this aspect).
2. my descision for "my" way to write such code is based on the fact that it is more readable, because, mainly, there is no variable definition and the code gets shorter. not shorter measured in chars or lines but shorter measured in "units of information". the brain can easily see the similarity between the sub-object accesses so it treats them the same. in contrast the additional variable complicates things a bit. this is of course marginal but it is there.
3. as phaeron correctly states there are cases where the reapeated-access pattern damages readability or performance. in this case you should avoid it (of course). there is no rule that is always valid.
4. brams: i like this construct and i think you should definitively use it in c++. unfortuantely it cannot be used easily in c#.
5. when talking about programming-attitude i oftenly notice that c++ prorammers are more focused on performance than "we" c#-developers. c# helps us to get things done faster at the expense of performance. if you have never coded in c# it might be an advantagous experience for you to give it a try, because the same problem can be solved in _much_ less time. this is caused by the fact that whole classes of bugs which are a plague in c++ a simply gone. for many scenarios c++ is outdated and the only reason it is still in heavy use is because of large legacy code-bases and not because of technology.

Tobias Rieper - 21 01 08 - 15:58


You were fine up until #5, where you started talking nonsense. I'd actually recommend this pattern MORE in C#, for two reasons: the ease of hiding expensive code in innocuous looking properties, and the need to explicitly Dispose() references. And by the way, I've actually written quite a bit of C# code -- it's just not publicly released.

As for C++ being outdated, you're on crack. C++ is still around because it is by far the most universal language available today. C# is nowhere near the magic bullet you think it is. What's it add? Its main advantage over C++ is the switch from pointers to garbage collected references. By no means is it the first language to bring GC to the mainstream, and by no means has it solved the problems that other languages have experienced in doing so, such as the fact that GC doesn't help you manage any type of resource other than memory. The other big change that people most often associate with C# is the addition of the .NET Framework base class libraries, which aren't part of the language itself. The Framework is definitely more extensive than the C++ standard library, but personally I've been disappointed with the lack of hindsight that I've seen in its development. It seems too geared toward making already simple tasks trivial, without adding sufficient depth. The container library, for example, I find much less powerful than the C++ STL.

The other issue I have with C# is that the Microsoft tools, to put it nicely, are still very immature. Have you ever worked on a C# project that had over a dozen projects and thousands of source code files? The incremental compilation time can be even worse than a C++ project because the environment recompiles entire assemblies over and over.

C# isn't a bad language, and for many classes of applications I think it could make ones that are as polished as a native application. The problem is that I basically never see them. When I talk about Java, I always bring up Azureus as the most polished application I've seen so far. For C#, the best I could come up with is Paint .NET, and that still feels quite a bit less solid.

I guess what annoys me in the end is that I see C# being pushed as the premier language for Windows development instead of earning the title. C++ didn't become king because Stroustrup pushed it as such. It became king because it had a practical migration path from the previous king at the time (C) and people found it a better choice for development. On the other hand, I constantly get C# fans telling me that C++ sucks and that C# is the new wave of the future. You know what? Come back to me in three years when the tools and libraries are mature and your language is actually geared toward the problems I'm trying to solve. Until then, I'll stick with C++.

Phaeron - 21 01 08 - 16:37


The reason why I do this is pretty embarrassing. It's to eliminate the need to create a variable name. Creating a new variable is always difficult. Does it convey the right meaning? Does it conflict with some other variable elsewhere in the code? Do I have to document the variable? Oh forget about it, I'll just cut-n-paste. Won't hurt nuthin.

Eddo - 22 01 08 - 17:47


I agree that the code written like that is awfull.

@Gyorgy Kenez:

You obviously never heard of -ansi-alias?

Igor (link) - 25 01 08 - 09:22


I've actually seen code like this before, in C#. The bad thing is that the GetData method actually went across the network and got info from a database. There's no way that can be optimized out because the compiler doesn't know if another app has modified the database. To be fair, the coder was still an intern, but I was surprised he thought something like this would be OK.

int f1 = new CustomObject().GetData(x).Field1;
int f2 = new CustomObject().GetData(x).Field2;
int f3 = new CustomObject().GetData(x).Field3;
int f4 = new CustomObject().GetData(x).Field4;
int f5 = new CustomObject().GetData(x).Field5;
int f6 = new CustomObject().GetData(x).Field6;

Matt - 26 01 08 - 18:25


That kind of code writing is good if and only if you're using the class multiple times in a row (say, more than five or 6 times), else it is less readable.

Anyway, some languages have a With-like feature in VB that will allow you to start a sentence with a single dot to use the class mentioned in the With block.

lie - 27 01 08 - 01:57


A comment about your so-called redundant code that I've been missing is not about compiler optimization but about compiler helping.

Often people think that splitting code in smaller steps or restructuring it generates more efficient code because uhm... well err... it looks more efficient, right?

But a compiler is not a human and it uses techniques most programmers aren't aware of. The compiler is the expert here, not the programmer, and trying to help it will more often than not result in less efficient code.

My main programming rules: write code as readable as possible for your fellow humans and do not try to help the compiler, the compiler knows better than you.

And finally, before claiming anything, test your compiler by looking at the assembly it generates. Not all compilers are created equal.

Clemens - 31 01 08 - 04:11


The compiler may have more techniques at its disposal, but it also has less knowledge of the problem than the human does. In a way, one way you could interpret this issue is that it's like trying to make your code more readable to the compiler; the less you make it try to deduce, the less chance it will fail to do so. It is possible for the redundant formulation to be faster, but for that to be the case the redundant access would have to be more efficient, and that is extremely unlikely to be the case when the access involves a call through an interface.

As for a programmer's intuition, I should note that it is seldom the case that reducing the number of basic operations in a code fragment results in slower code, optimizer or not; this is generally a safe assumption, and is especially so if the extra operations actually result in a rise in asymptotic complexity. Remember too that people often don't document the complexity of operations on their public interfaces; assuming that a call executes in O(1) is not a good idea. Doubling the time by calling a routine twice is one thing; repeating a call in a loop can turn a linear time operation into quadratic time, which is a BIG deal.

As for the readability issue, I have to ask again: how exactly is it more readable to have no description of the intermediate result (what is the meaning of the return value of "GetData()"?) and no indication of whether the return values are expected to be different between calls? And how do you handle the case of a null reference in the middle of such an expression? Don't say catch(NullReferenceException e) -- that way lies madness.

Phaeron - 31 01 08 - 04:51

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.