Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Auto-vectorization in the Visual Studio 11 Express preview

Okay, it's actually the Microsoft Visual Studio 11 Express for Windows Developer Preview, but that's a ridiculously long name. I hope they call it something like vs11ew internally.

One thing I didn't expect to see in the VC11 compiler is auto-vectorization:
http://msdn.microsoft.com/en-us/library/dd547188%28v=VS.110%29.aspx#BKMK_VCPP

This attempts to produce vectorized code by analyzing your scalar loops. Now, this isn't going to do miracles -- particularly with poor support in C/C++ for alignment -- and you'll still have to go to intrinsics or assembly for fastest code. However, the advantage of auto-vectorization is that the compiler can still do it when you're lazy -- which is great when you're prototyping, and can help in code you can't afford to focus on. As I've said before, I don't consider intrinsics to be very readable and it's been long since I considered manual register allocation fun, so even though I wouldn't want to have to rely on auto-vectorization I'm still in favor of it.

After doing some testing with the x86 compiler (17.00.40825.2), the first thing I can say is that at least with this early implementation you probably won't be relying on auto-vectorization for video or image processing code. I was not able to get the compiler to vectorize any code processing 8-bit or 16-bit integers. The only types I was able to vectorize with were 32-bit integers, 64-bit integers, floats, and doubles, and that excludes a huge amount of decoding/encoding/filtering code. In order to do this the target CPU needs to support SSE for floats and SSE2 for ints or doubles; however, the developer preview compiler is pretty broken and I was often able to get it to generate SSE or SSE4.1 instructions inappropriately. For now we'll overlook that and just look at the operations that it can vectorize. For ints, I was able to get these operations to vectorize:

64-bit ints don't work very well -- x+y vectorizes while x+1 doesn't. Inversion (~) didn't work, and surprisingly, neither did negation (unary minus), so 0-x runs better than -x. Probably the most disappointing is that neither conditionals nor relationals vectorize, so writing branchless mask based code isn't possible. I couldn't get min/max or masked writes out of it, either.

For floats, more operations are supported:

Unary minus, fmodf(), fabsf(), transcendentals, min/max, and relational ops failed. I got float-to-unsigned casts to vectorize, but the generated code was bad (truncated all numbers above 2^31). The auto-vectorization is thus more powerful with floats, but there are still noticeable holes in operations support.

Another issue with the current auto-vectorization implementation is that it universally emits unaligned loads and stores (movups/movdqu). I tried copying to a local array with forced alignment, but even that wasn't enough to get movaps. That's an easy gain for intrinsics/asm over the auto-vectorizer, unfortunately. It does, however, emit code that is aliasing tolerant: it checks whether the destination and source arrays overlap and branches to either vectorized or unrolled code depending on the result. __restrict wasn't effective in removing the check.

The third problem with the auto-vectorizer is that currently you can't turn it off by itself, only by reducing the global optimization level. This means a significant amount of code bloat with full optimization even if the vectorized code will never run (cases of guaranteed partial overlap). It also makes the developer preview a bit fragile since it means you can't easily escape the code generation bugs in the vectorizer. Hopefully there will be ways to control the auto-vectorizer like the inliner (command line switch + pragmas).

Anyway, it'll be interesting to see how this evolves. After Visual Studio .NET 2002, my general rule is that you should assume everything in a public Visual Studio beta is as it will ship unless it's already known to be changing, enough people complain about it, or it's clearly a showstopper. The level of codegen bugs in this compiler version is a lot higher than usual, though, so I have to assume this is earlier in the development cycle (or else the compiler team is in trouble!).

Comments

Comments posted:


What kind of codegen bugs are you seeing?

They made a lot of compiler changes to make the reference-counted WinRT objects work in C++. I can't think of an obvious reason why this should affect codegen, but they might also have ripped out and rewritten various parts of the compiler pipeline from scratch. That would certainly cause a lot of regressions.

Tom - 16 09 11 - 02:33


Hi... thanks for cracking open the preview Dev 11 compiler.

1.) I think we have all the missing functionality called out going into the final product right now back here in Redmond. The Dev-11 compiler is still in flight.
2.) Will reply later with some technical discussion and data about code size – within the context of *current* micro-architectures.
3.) Also a blurb about unaligned memory references.
4.) Turning off the vectorizer is a feature that’s going to be available in pragma form for loop level granularity (We are considering command level)
5.) We ask that you be a little patient with the technology.
We are a V1 vectorizer and internally we are able to handle ARM Neon, Intel SSE2, some SSE4.1 and Intel AVX - while correctly building all of Windows 8, SQL, Office and Developer Division Tools.
We try to focus on mission critical correctness before attaining peak performance in all domains.

More later,
Jim Radigan (Architect/Dev Lead C++ optimizer team)

Jim - 16 09 11 - 07:05


Interesting post Avery. I'd love to hear what you find from the "auto-parallellzer" described here

http://msdn.microsoft.com/it-it/library/..

Trimbo (link) - 17 09 11 - 05:07


@Tom:
> What kind of codegen bugs are you seeing?

Just the ones in the auto-vectorizer; they're only a problem because you can't turn it off, so your existing float or int based code can be broken. The float-to-uint one is the main one I've found that could likely bite. So far VirtualDub and Altirra are running fine after being rebuilt with VC11, but I don't have much code written the way that the auto-vectorizer likes and isn't already overlaid with hand-optimized versions for SSE2.

@Jim:
Thanks for the response! You can't expect people to be patient after putting in something like this, though. :)

Good to hear that the #pragmas are going in. It'll be great if the final version of this does support byte/word sizes and conditionals.

@Trimbo:
> I'd love to hear what you find from the "auto-parallellzer" described here

There's almost no information on it, unfortunately. I tried the specified #pragma and the compiler issued an undefined pragma warning, so either it didn't get in, the pragma needs more args, or some switch needs to be thrown to enable it.

Phaeron - 17 09 11 - 14:30

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.