Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Making a hash of floating point numbers

I've always thought that hash tables were well named, because often when you see how people have used them you wonder what they were smoking at the time. Often the problem revolves around a mistaken notion that switching a binary search tree for a hashed container bestows some sort of magical constant time lookup behavior, but sometimes it's more subtle. One case has to deal with the choice of hash function.

The hash function for a hashed container converts a key to a bucket index with the intent of trying to distribute data items as evenly as possible. Given a decent distribution for input values, the hash function for an integral key can be as simple as just using the integer value itself, with the container then applying a modulus operation to wrap it within the bucket count. Anyone who's gone down this route, however, then discovers the problem of trying to do this for a key that is of floating point type. Usually the first thing they try is something like this:

size_t operator()(float value) const {
    return (size_t)(value * 100);
}

This is unfortunately usually fairly slow due to poor performance in the float-to-int conversion. There's also the matter of slightly worse behavior around zero due to truncation toward zero instead of negative infinity.

At this point, the inclination is probably to just give up and either deal with it or use a different container. Others go "aha!" and use this hack instead:

size_t operator()(float value) const {
    return (size_t)*(const unsigned int *)&value;
}

This code uses the bit pattern of the float as the hash value. Yeah, it's non-portable. It's also got problems with the aliasing rules of the C language. In the not so unusual case of being able to depend on a 32-bit integral type and IEEE single precision floating point, though, it's a really neat and fast trick. And, sadly, it's also wrong. If you've done this or thought about it, don't feel bad. The .NET Framework team almost made this mistake, too.

Reason after the jump.


Negative zero is the problem. For IEEE floats, positive zero and negative zero differ by only the sign bit in bit pattern, but compare equal as floats. Using the bit pattern as the hash code therefore results in an inconsistency in the hashed container where two values compare equal but have different hash codes. I've also seen people burned by this when trying to compare floats quickly with integer compares (-0 < +0) and when implementing fast float-to-int functions (floorToInt(-0) = -1).

Comments

Comments posted:


Using floating point numbers as the key to a container seems like something people should almost never do in the first place. (Obviously there are exceptions.)

Given rounding errors and the usual need to define "equals enough" (i.e. an epsilon constant) in most projects, using operator== on floats should probably result in a compiler warning (or a programmer quiz to prove they know what they're doing!), IMO.

Sometimes I think using floats at all should result in a warning as they're so often mis-used.

(Obviously these are not absolute rules. In something like VirtualDub or a 3D engine floats make sense. I'm talking about all the general-purpose/application/financial code which uses floats when it really shouldn't.)

Leo Davidson (link) - 11 06 09 - 05:33


'After the jump' seems to appear in your RSS feed, in case you didn't mean it to appear.

nine - 11 06 09 - 07:14


is it possible that binary different floats compare equally? then you would have to account for that, too. its kind of a mess.

Tobias - 11 06 09 - 10:38


The compiler will have a better time optimizing if you use a union to get the float bits:

size_t operator()(float value) const {
union { float f; size_t i; } x = { value };
return x.i == 0x80000000 ? 0 : x.i;
}

You're still screwed with NaNs, since NaNs don't compare equal to anything, even values bitwise equal to themselves.

&#9829; - 11 06 09 - 12:16


In Java, Sun expressly defined Double(-0.0) to be != Double(0.0), and to hash that way as well, even though -0.0=0.0 I wonder why.

Foo - 11 06 09 - 20:26


I put "after the jump" in manually because lately I've been just putting everything in the main post.

I'm not in favor in general of using epsilon-based similarity comparisons on floating point numbers and find that I rarely need them. In my experience, they're overused by people who have heard about or bitten by FP issues, and then spam tolerance checks everywhere without thinking. Floating point arithmetic is well-behaved on modern processors, particularly with IEEE 754 compliance, and FP numbers don't just randomly accumulate error. If you are actually supposed to get equality, you should check for exact equality, and if you expect error, you need to know where that error is coming from and what epsilon is appropriate. There are also some nasty surprises with using epsilons on equality tests. For example, similarity is not transitive: similar(a, b) and similar(b, c) does not imply similar(a, c).

Zero is the only case I can think of in IEEE 754 arithmetic where two values with different bit patterns compare equally. Some FPUs support projective closure where positive and negative infinity are also equal, but I think that was an Intel x87 extension. Infinities are distinct from finites which are distinct from denormals, and NaNs never compare equal to any other number. (Since NaNs always compare as false to any other value, there's no problem with hashing them, but it's kind of pointless given the implications.)

I knew that Java had some weirdness with its FP definition, but I didn't realize that -0 was part of it. That's ugly. I imagine that breaks a bunch of identities and slows down Java FP arithmetic on a lot of CPUs.

Phaeron - 12 06 09 - 00:45


The fastest way to make positive and negative 32-bit floating point zero the same in comparisons is:

shl eax, 1
test eax, eax

Igor Levicki (link) - 17 06 09 - 11:40


Isn't add eax, eax faster? :)

Phaeron - 17 06 09 - 22:32

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.