Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Good approximation, bad approximation

Numerical approximations are a bit of an art. There are frequently tradeoffs available between speed and accuracy, and knowing how much you can skimp on one to improve the other takes a lot of careful study.

Take the humble reciprocal operation, for instance.

The reciprocal, y = 1/x, is a fairly basic operation, but it's also an expensive one. It can easily be implemented in terms of a divide, but division is itself a very expensive operation  it can't be parallelized as easily as addition or multiplication and typically takes on the order of 10-20 times longer. There are algorithms to compute the reciprocal directly to varying levels of precision, and some CPUs provide acceleration for that. x86 CPUs are among them, providing the RCPSS and RCPPS opcodes to compute scalar and vectorized reciprocal approximations, respectively.

However, there is a gotcha.

For any approximation, the first question you should ask is how accurate it is. RCPSS and RCPPS are documented as having a relative error no greater than 1.5*2^-12, or approximately 12 bits of precision. That's fine, and good enough for a lot of purposes, especially with refinement. The second question you should ask is whether there are any special values involved that deserve special attention. I can think of five that ideally should be exact:

RCPSS/RCPPS do handle zero and infinity correctly, which is good. Sadly, 1.0 is handled less gracefully, as it comes out as 0.999756 (3F7FF000), at least on Core 2 CPUs. Even worse, if you attempt to refine the result using Newton-Raphson iterations:

x' = x * (2 - x*c)

...the result converges to 0.99999994039535522 (3F7FFFFF), a value just barely off from 1.0 that in many cases it will be printed as 1.0, such as in the VC++ debugger. This leads to lots of fun tracking down why calculations are slewing away from unity when they shouldn't, only to discover that the innocuous little 1.0 in the corner is actually an impostor, and then having to slow down an otherwise speedy routine to handle this special case. Argh!

If I had to take a guess as to why Intel did this, it's probably to avoid the need to propagate carries from the mantissa to the exponent, because otherwise the top 12 bits of the mantissa can go through a lookup table and the exponent can produce the result exponent and top bit of new mantissa. It's still really annoying, though. I have to go fix the rcp instruction in my shader engine, for instance, because the DirectX docs explicitly require that rcp of 1.0 stays 1.0. Curiously, they don't mention -1.0. I guess it just goes to show how hard it is to specify a good approximation.

Comments

Comments posted:


You'd have thought correct (hell, exact) maths would be trivial to implement for CPU builders. How hard is it to make the ones you specify part of the architecture, and correct each and every time? Why are rules so difficult to implement for "simple" correct mathematics?

Turkey Machine - 03 12 08 - 18:58


I still find it hard to think that handling an all-(or mostly-)zero mantissa as a special case would have cost too many transistors and would have given exact results for all powers of two...

Reimar - 04 12 08 - 04:55


I'd be interested to know if Intel has improved the handling of this in Nehalem. Anyone got an i7 to try it?

Trimbo (link) - 04 12 08 - 11:57


I was going to say that RCPSS/RCPPS are single clock operations, but they're not. Pentopt says they're 3 uop, 2 clocks. I wonder if the table lookup was the bottleneck.

In any case, I don't think they can change it now. I know for a fact that there are networked programs that rely on the returned values being exact between CPUs. Probably not the best idea, but no one said life was easy for CPU designers.

Phaeron - 05 12 08 - 02:05


The problem with approximation ist you either approximate or you don't. If you want 1 to be 1, you just can't use approximation - you even can't use a floating point variable as it is an approximation itself.
1 might be a special value in human understanding and in general math, but you can't treat one value out of an infinite pool differently just because it has some special meaning you need.
What if someone just magically solves this 1-issue? I'm sure the next value someone will have a problem with is pi.

.. just my 2 cents of philosophy ;)

Murmel - 05 12 08 - 16:06


Er, you can absolutely have an approximation that lets 1 be 1, as long as your underlying math and algorithm are deterministic. In fact, it's par for the course for polynomial approximations -- the degree of the polynomial used gives you a set number of degrees of freedom, which you can use to nail down specific results that you need out of the approximation.

The reason that I single out 1/1 here is that it's one of the special points that, if not correct, violates a well-known feature of the reciprocal that people can depend on. As another example, say you had an approximation for multiplication that didn't compute some of the partial products. (Apparently, this was common for Crays.) For a lot of purposes, that's fine as long as the average and worst case errors are within bounds for the output precision required. However, if you break the identity x*1 == x, you're going to cause some pretty major train wrecks in some algorithms, such as causing iterative algorithms to diverge instead of converge.

You're correct that an approximation will always fall down at some point, but you can definitely define an approximation that is good enough for a restricted set of uses. RCPSS/RCPPS are certainly still useful, as long as you can deal with this quirk.

Phaeron - 06 12 08 - 23:26

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.