Current version

v1.9.11 (stable)
v1.10.1 (exp.)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Calendar

« November 2012 »
S M T W T F S
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30  

Archives

01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Intrinsics code generation in VC11 preview compiler

I now have the Visual Studio 11 developer preview installed in Windows 7, which makes stressing the new compiler much easier than with the Windows 8 DP in VirtualBox, which freezes for minutes at a time. The compiler version is fortunately the same: 17.00.40825.2. I happened to have a VC10 converted version of Altirra that built without problems after switching to the v110 toolset; VirtualDub required a VC8-to-VC11 conversion, which required stripping some quotes from the converted psa.props and fixing a runtime library setting mismatch. Both programs ran fine, so no big codegen problems.

A few things I've discovered about the new compiler:

1) SSE2 code generation is now the default.

This is confusing since neither the docs nor the project system UI have been updated, but if you don't specify any compiler switches or have enhanced instruction set usage set to Not Set in your project, the compiler will act as if /arch:SSE2 was set. You need to use /arch:IA32 to disable enhanced instruction set usage. (See MS response to bug 688736.)

2) Commutativity-based optimizations are now applied.

I wrote a while back that the compiler generates intrinsics exactly as you write them, so you can sometimes get extraneous moves unless you swap some parameters around. This appears to be fixed and both fold1() and fold2() generate the shorter output.

3) Intrinsics register allocation has improved.

The VC11 compiler does a better job on the SSE FIR routine example I posted earlier. It no longer generates the MOVSS orgy through temps at the top of the loop and also recognizes that zero is easily regenerated, the result being that it is able to hoist two of the four kernel vectors permanently into registers.

I browsed through the intrinsics list, and unfortunately it doesn't look like there are any new intrinsics in the existing instruction sets (still no min/max or round-to-int), but a least it looks like intrinsics code will generally run a bit faster with VC11.

(Read more....)

§ Auto-vectorization in the Visual Studio 11 Express preview

Okay, it's actually the Microsoft Visual Studio 11 Express for Windows Developer Preview, but that's a ridiculously long name. I hope they call it something like vs11ew internally.

One thing I didn't expect to see in the VC11 compiler is auto-vectorization:
http://msdn.microsoft.com/en-us/library/dd547188%28v=VS.110%29.aspx#BKMK_VCPP

This attempts to produce vectorized code by analyzing your scalar loops. Now, this isn't going to do miracles -- particularly with poor support in C/C++ for alignment -- and you'll still have to go to intrinsics or assembly for fastest code. However, the advantage of auto-vectorization is that the compiler can still do it when you're lazy -- which is great when you're prototyping, and can help in code you can't afford to focus on. As I've said before, I don't consider intrinsics to be very readable and it's been long since I considered manual register allocation fun, so even though I wouldn't want to have to rely on auto-vectorization I'm still in favor of it.

(Read more....)

§ Interesting DirectX changes in Windows 8 Developer Preview

I've managed to install the Windows 8 Developer Preview into a VirtualBox session, after spending a few hours in frustration trying to find a way around not having a dual layer DVD-R handy, and all of the suggested workarounds either requiring a large USB flash drive or not working on a UEFI boot machine. I'll just start by saying, yeah, Metro gets in the way so far. I'll leave it at that since it's clearly unfinished and I'd like to talk about other things.

Specifically, DirectX changes.

I've only had a couple of hours to dig into it, and so far, VirtualDub and Altirra run fine. So far, so good, nothing catastrophic like the display panes totally breaking *cough*Vista*cough*. One issue I have found is that the display code refuses to switch into DirectDraw mode. Not a big deal, since DirectDraw is basically neutered under WDDM anyway. (Amusingly, over time many of the DirectX APIs seem to be faring worse than their base Win32 counterparts.) The reason is a bit strange, though: in Windows 7, DirectDrawEnumerateEx() returns an entries for the primary monitor with both NULL and non-NULL monitor handles, whereas in Windows 8 DP the callback is only getting called once with a NULL HMONITOR. This is causing the current tip display code to fail to find a matching monitor. Easily worked around, but strange nevertheless.

The other issue is more interesting, and has to do with this debug trace:

VideoDisplay/DX9: 3D device is lame -- reason: raster caps check failed

(Read more....)

§ Weird PREfast problem

For a while now, Microsoft has made the /analyze mode of the VC++ compiler -- a.k.a. PREfast -- available through the Windows SDK. I've run it a couple of times before out of curiosity, and it's found a few interesting null pointer deference paths, but like most static analysis tools it has a huge problem with spewing dubious results when you first sic it on a codebase. This mainly results from C/C++ being an unexpressive language, and while you can fix that and improve the results by peppering your source code with annotations, it's not something I've gotten around to doing given the time and risk involved.

One of the problems specifically with PREfast is that it seems to have a habit of issuing bogus warnings about bad array indices. For instance, take this simplified code:

void foo(int *p) {
static const int data[16] = {0};
    for(int i=0; i<16; ++i) {
if (i != 0)
p[i] = data[i-1];
}
}

The if() clearly prevents any pointer deferencing when i = 0, but PREfast gives this output:

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

oops.cpp
oops.cpp(6) : warning C6200: Index '-1' is out of valid index range '0' to '15' for non-stack buffer 'int const * const `void __cdecl foo(int *)'::`2'::data'

All such warnings turned out to be impossible cases in code. I tracked down some of them to PREfast being confused by an assert macro, but this is the one case I've been able to pin down to the static analyzer ignoring an obvious prohibition in control flow. A related problem is that it isn't very good about indicating how sure it is about a buffer overflow, so instead of indicating that it has found a possible overflow, it assumes some rather large numbers:

warning C6201: Index '18446744073709551615' is out of valid index range '0' to '255' for possibly stack allocated buffer 'mStackLevels'

Kind of hard to do that when indexing with a uint8_t in 32-bit code.

I did try cppcheck once as a possible alternative for a free code analysis tool. It was a bit better about false positives and focused more on structural issues rather than dynamic ones, but its problem was speed -- due to a combination of not supporting precompiled headers and slow parsing performance it was taking over five minutes per .cpp file. It turned out to be primarily due to an unexpectedly mediocre implementation of std::string::operator==(const char *) in the VC++ STL and an O(N^2) implementation of a core string pattern matching algorithm, but even after fixing those it was still prohibitively slow.

(Read more....)

§ Taking a look at D3D10.1's WARP driver

I recently went through the exercise of writing a basic Direct3D 10.1 display backend for VirtualDub. The primary motivation was to take advantage of Direct3D 10.1 command remoting... until I realized that the DirectX SDK I was using was a bit old and its documentation didn't mention that D3D10.1 command remoting had been removed in Windows 7 RTM. I did get it working in windowed mode, however, and since I had a working D3D10.1 path I figured I might as well check out the WARP driver.

WARP, or Windows Advanced Rasterization Platform, is a software driver that ships with the Direct3D 11 runtime. As far as I know, it's the first widely available and full featured software renderer that Microsoft has shipped. The DirectX SDK has long shipped with the reference rasterizer (refrast), but that has several shortcomings: it's not redistributable, it can't be instantiated in a headless environment, and it's so abysmally slow that it barely works for debugging much less running anything. Microsoft also created RGBRast for DirectX 9 which .NET 3.5 used as a software fallback, but AFAIK it doesn't support shaders and is pretty minimal. The OpenGL software rasterizer works but it pretty slow and lacking on features. WPF has its own software rasterizer that I've written about before and isn't too bad, but it only does pixel shaders on rectangular blits and is internal to WPF. Now we have WARP, which is fully featured, fast, and widely available.

Having used WARP a little bit, I can tell you that you won't be ditching your 3D graphics card anytime soon. When I say WARP is fast, I mean it's fast by software rasterizer standards, which means it might beat an S3 ViRGE. It's still very slow compared to any modern graphics accelerator, even one with "Integrated" in its name, and I get dropped frames drawing one 1440x900 full screen quad on an i5-2500K. That's even before you take into account that even to get that level of performance you have to give up a lot of CPU power that could be used for something else. The main benefit of WARP is that programs can now use 3D rendering without worrying about being 100% screwed in the unusual case where no 3D hardware acceleration whatsoever is available. Considering the difficulty of writing a general 3D software driver, that's a big benefit.

Now, that out of the way, time to look at the details: let's look at what code WARP generates.

(Read more....)