Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Why not just go to floating point?

I've been thinking of putting together a new desktop machine -- to replace the ancient Socket 754 AMD64 machine that is currently serving as a door stop -- and most likely it'd be Sandy Bridge based. The nice thing about this is that I'd then be able to experiment with Advanced Vector Extensions (AVX). Currently my main machine is a laptop with a Core i7, so the highest CPU insn set I have available is SSE 4.2. Of course, when I actually looked at AVX again, I found out to my disappointment that it's floating point only like SSE was, and the AVX2 integer version won't arrive until a future chip, which pretty much torpedoed most of the ideas I had for using it.

Why not just switch to floating point?

Well, the main reason is that it would nuke the benefit of trying to use AVX in the first place, which is higher parallelism. AVX uses 256-bit vectors instead of 128-bit vectors, so it can process twice the number of elements per operation and thus get double the throughput. However, most of the data I work with is in bytes, so going to 32-bit floats means dividing throughput by four. Multiplying by two and dividing by four doesn't work in your favor. Then there are other reasons:

It's definitely not just a question of switching to vector float types. That isn't to say there aren't advantages to going FP, of course:

AVX does appear to have some niceities for integer routines, like 3-argument syntax, but truth be told, I haven't had too many problems with excess register moves lately. It's a bit of a bummer to go from "yeah, this would probably run much faster with 256-bit vectors" to "hmm, I'd have to convert this to floats and then it would probably run slower." :-/

Comments

Comments posted:


Sounds like the inventor of the product does not use it himself.

tobi - 08 07 11 - 22:43


AMD has some new integer instructions with Bulldozer:
http://en.wikipedia.org/wiki/XOP_instruc..

Rumbah - 09 07 11 - 01:39


"going to 32-bit floats means dividing throughput by four" - AVX has 256-bit registers, but as far as I know, byte vector arithmetic instructions are limited to 128-bit half (16 bytes only). So you are dividing your throughput by two.

Paul Jurczak - 09 07 11 - 05:52


> AMD has some new integer instructions with Bulldozer:

The SSE5/XOP instructions look far more interesting for what I do than AVX/AVX2, particularly since many operations take bytes or words. Unfortunately, being AMD only pretty much dooms them to obscurity. The same thing happened with 3DNow!, which had some useful instructions that Intel never replicated (pmulhrw, pi2fd, pf2id). :(

We also don't know how fast they will be. It looks like they haven't been extended to 256-bit, which already puts them at a throughput disadvantage. If they end up having too high of a latency or too low throughput, AVX2 might win when it comes out. Intel ran into this same problem with SSE 4.1; I'm told that a bunch of the new instructions, like the unpacked moves, aren't any faster than the old ways.

> "going to 32-bit floats means dividing throughput by four" - AVX has 256-bit registers, but as far as I know, byte vector arithmetic instructions are limited to 128-bit half (16 bytes only). So you are dividing your throughput by two.

You're prematurely jumping to the conclusion. Converting to floats by itself gives quarter throughput at the same register size.

Phaeron - 09 07 11 - 07:43


Avery, you should wait for IvyBridge CPUs if you want to use integer SIMD with AVX vector size. They are not that far away.

Regarding 1/4 of throughput in going from byte to float -- that is based on the assumption that you already had the ability to process 16-byte vectors without losing throughput which I sincerely doubt is possible.

Igor Levicki (link) - 10 07 11 - 08:31


> Regarding 1/4 of throughput in going from byte to float -- that is based on the assumption that you already had the ability to process 16-byte vectors without losing throughput which I sincerely doubt is possible.

Why do you say that? If anything it's actually easier to keep calculations moving with the integer vectors as there are fewer scheduling bottlenecks.

Phaeron - 10 07 11 - 08:45


Just checked roadmaps... if I'm not mistaken, AVX2 isn't due to come out with Ivy Bridge, but the rev after that (Haswell).

Phaeron - 10 07 11 - 08:53


For what I work with, AVX has one significant advantage - that porting SSE intrinsic code is as simple as recompiling with AVX support, to get the boost from the three parameter instructions.

Of course using the full 256bit registers will require re-writing code, but it is nice to get a small speed boost just from re-compiling.

I find that when working with floats, I really like having a min/max function available makes it significantly easier. Also integer multiplication may not always be easily doable in integer, since some of the combinations are only available after SSSE3 or SSE4.1.

Klaus Post (link) - 10 07 11 - 20:00


Anyone who tries to update his code to AVX, thinking it might get faster somehow, is going to see how unusable the instruction set turned out to be. Without the integer "promotion", as they call it, there are walls everywhere.

For image processing floats are useless anyway, 16 bit integer is all you need. SEE2 can process 8 colors components, hardly any reasons to do that with AVX and floats.

There is also an bug in vs2010, if you use the intrinsics:
https://connect.microsoft.com/VisualStud..

Gabest - 13 07 11 - 11:11


Bleargh... I just looked at the AVX intrinsics. The existing intrinsics are bad enough to use, but with new ones line _mm256_castps128_ps256 they've managed to make intrinsics-based code even uglier. It's sad when the asm is more readable. :(

Floats aren't *quite* useless -- like I said earlier, you're going to have a hard time writing a decently accurate and performant fixed point version of an algorithm that has divides and square roots in it. At least in MMX I usually ended up going through a lookup table stage to do the divide and took a hit in accuracy. I suppose some neat lookup tricks might be possible in SSSE3... PSHUFB is kind of useful.

Phaeron - 13 07 11 - 15:53

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.