§ ¶Good approximation, bad approximation
Numerical approximations are a bit of an art. There are frequently tradeoffs available between speed and accuracy, and knowing how much you can skimp on one to improve the other takes a lot of careful study.
Take the humble reciprocal operation, for instance.
The reciprocal, y = 1/x, is a fairly basic operation, but it's also an expensive one. It can easily be implemented in terms of a divide, but division is itself a very expensive operation — it can't be parallelized as easily as addition or multiplication and typically takes on the order of 10-20 times longer. There are algorithms to compute the reciprocal directly to varying levels of precision, and some CPUs provide acceleration for that. x86 CPUs are among them, providing the RCPSS and RCPPS opcodes to compute scalar and vectorized reciprocal approximations, respectively.
However, there is a gotcha.
For any approximation, the first question you should ask is how accurate it is. RCPSS and RCPPS are documented as having a relative error no greater than 1.5*2^-12, or approximately 12 bits of precision. That's fine, and good enough for a lot of purposes, especially with refinement. The second question you should ask is whether there are any special values involved that deserve special attention. I can think of five that ideally should be exact:
RCPSS/RCPPS do handle zero and infinity correctly, which is good. Sadly, 1.0 is handled less gracefully, as it comes out as 0.999756 (3F7FF000), at least on Core 2 CPUs. Even worse, if you attempt to refine the result using Newton-Raphson iterations:
x' = x * (2 - x*c)
...the result converges to 0.99999994039535522 (3F7FFFFF), a value just barely off from 1.0 that in many cases it will be printed as 1.0, such as in the VC++ debugger. This leads to lots of fun tracking down why calculations are slewing away from unity when they shouldn't, only to discover that the innocuous little 1.0 in the corner is actually an impostor, and then having to slow down an otherwise speedy routine to handle this special case. Argh!
If I had to take a guess as to why Intel did this, it's probably to avoid the need to propagate carries from the mantissa to the exponent, because otherwise the top 12 bits of the mantissa can go through a lookup table and the exponent can produce the result exponent and top bit of new mantissa. It's still really annoying, though. I have to go fix the rcp instruction in my shader engine, for instance, because the DirectX docs explicitly require that rcp of 1.0 stays 1.0. Curiously, they don't mention -1.0. I guess it just goes to show how hard it is to specify a good approximation.
You'd have thought correct (hell, exact) maths would be trivial to implement for CPU builders. How hard is it to make the ones you specify part of the architecture, and correct each and every time? Why are rules so difficult to implement for "simple" correct mathematics?
Turkey Machine - 03 12 08 - 18:58
I still find it hard to think that handling an all-(or mostly-)zero mantissa as a special case would have cost too many transistors and would have given exact results for all powers of two...
Reimar - 04 12 08 - 04:55
I'd be interested to know if Intel has improved the handling of this in Nehalem. Anyone got an i7 to try it?
Trimbo (link) - 04 12 08 - 11:57
I was going to say that RCPSS/RCPPS are single clock operations, but they're not. Pentopt says they're 3 uop, 2 clocks. I wonder if the table lookup was the bottleneck.
In any case, I don't think they can change it now. I know for a fact that there are networked programs that rely on the returned values being exact between CPUs. Probably not the best idea, but no one said life was easy for CPU designers.
Phaeron - 05 12 08 - 02:05
The problem with approximation ist you either approximate or you don't. If you want 1 to be 1, you just can't use approximation - you even can't use a floating point variable as it is an approximation itself.
1 might be a special value in human understanding and in general math, but you can't treat one value out of an infinite pool differently just because it has some special meaning you need.
What if someone just magically solves this 1-issue? I'm sure the next value someone will have a problem with is pi.
.. just my 2 cents of philosophy ;)
Murmel - 05 12 08 - 16:06
Er, you can absolutely have an approximation that lets 1 be 1, as long as your underlying math and algorithm are deterministic. In fact, it's par for the course for polynomial approximations -- the degree of the polynomial used gives you a set number of degrees of freedom, which you can use to nail down specific results that you need out of the approximation.
The reason that I single out 1/1 here is that it's one of the special points that, if not correct, violates a well-known feature of the reciprocal that people can depend on. As another example, say you had an approximation for multiplication that didn't compute some of the partial products. (Apparently, this was common for Crays.) For a lot of purposes, that's fine as long as the average and worst case errors are within bounds for the output precision required. However, if you break the identity x*1 == x, you're going to cause some pretty major train wrecks in some algorithms, such as causing iterative algorithms to diverge instead of converge.
You're correct that an approximation will always fall down at some point, but you can definitely define an approximation that is good enough for a restricted set of uses. RCPSS/RCPPS are certainly still useful, as long as you can deal with this quirk.
Phaeron - 06 12 08 - 23:26
Please keep comments on-topic for this entry.
If you have unrelated comments about VirtualDub, the forum is a better place to post them.