Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
Forum
 
Other projects
   Altirra

Search

Calendar

« August 2013 »
S M T W T F S
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Weird optimizer behavior

Take this simple function:

int foo(int p0, int p1) {
return p1 + ((p0 - p1) & ((p0 - p1) << 31));
}

This should generate four operations: a subtract, a shift, a bitwise AND, and an add. Well, with VS2005 it generates something a bit different:

  00000000: 8B 54 24 08        mov         edx,dword ptr [esp+8]
  00000004: 8B 4C 24 04        mov         ecx,dword ptr [esp+4]
  00000008: 2B CA              sub         ecx,edx
  0000000A: 8B C1              mov         eax,ecx
  0000000C: 69 C0 00 00 00 80  imul        eax,eax,80000000h ???
  00000012: 23 C1              and         eax,ecx
  00000014: 03 C2              add         eax,edx
  00000016: C3                 ret

Somehow the optimizer changed the shift to a multiply, which is a serious pessimization and thus results in a rare case where the code is actually faster with the optimizer turned off!

Oddly enough, manually hoisting out the common subexpression (p0 - p1) fixes the problem. I've seen this behavior before in VC++ with 64-bit expressions of the form (a*b+c). My guess is that the compiler normally converts left shifts to multiplications and then converts them back later, but somehow the CSE optimization breaks this. Yet another reason that being lazy and repeating common subexpressions all over the place while relying on the optimizer to clean up your mess isn't the greatest idea.

The reason for the repeated subexpression, by the way, is because this is an expanded version of a min() macro. I called the function foo above instead of min because it's actually broken -- the left shift should be a right shift. As long as you can put up with the portability and range quirks, this strange formulation has the advantages of (a) being branchless, and (b) sharing a lot of code with a max() on the same arguments.

(Read more....)

§ VirtualDub 1.8.7 and 1.9.0 released

I haven't had as much time as I'd like to work on VirtualDub, which is unfortunately why it's been three months since the last release. Time to rectify that.

Both 1.8.7 and 1.9.0 are now up on SourceForge. 1.8.7 is a bugfix only release, with the one major fix being to the distributed job system. It turns out that the distributed job code wasn't that stable and would often attempt to run the same job on multiple machines, due to essentially a race condition in the filesystem. The new version now has logic to detect job start conflicts and retry with exponential delay, which should be more reliable. I also rewrote the conflict resolution logic, which is now more similar to the two-way and three-way merges that a revision control system has to deal with.

1.9.0 is of course the new experimental build and contains a number of new features and changes. I spent some time closing the gap in functionality between the x86 and AMD64 builds, so although the AMD64 build may still not be as well optimized, several features that were previously absent in the AMD64 build are now implemented. I've also thrown in a built-in AMD64-capable Huffyuv decoder that handles some of the popular post-2.1.1 extensions. Second, the internal display and blitter libraries got overhauled quite a bit. The uberblit system that backs the resampler in the 1.8.x series has been cleaned up and expanded, and now handles many of the complex blit scenarios that were previously handled by custom code or multi-stage blits. As a result, VirtualDub 1.9.0 can now handle several new image formats, including the 10-bit per channel v210 format and the interleaved NV12 format. The display library has also been upgraded to handle the new formats, and in particular the Direct3D module can now accelerate display of 10 bit/channel v210 video with dithering. The new formats are not yet exposed to video filters -- mainly because the thought of trying to work directly in v210 scares me -- although I'm not ruling out the possibility of a 14-bit fixed point linear color format in the future.

Changelists are after the jump.

(Read more....)

§ "10 is the new 6" my #*&

I'm trying to give the Visual Studio team the benefit of the doubt with their "10 is the new 6" push, but I just tested something in the VS2010 CTP and nearly blew my top. Therefore, it's rant time.

A long time ago, I filed a bug on Visual Studio 2005, or really, VS2003:

https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=100052

This bug's really simple: if you have a stack of overlapping controls in the Visual Studio dialog editor and click on them, the editor selects the one on the bottom. It's a pain in the butt, because when you've got a bunch of overlapping controls that you're trying to fix -- such as from a copy-and-paste -- you try clicking on one to drag it and you end up messing up the positioning of some other control, so you've got to undo that and then try to find some way to select the one you actually wanted. You can't marquee the errant, control, because the marquee is inclusive and invariably also picks the group box that surrounds it. Shift-clicking out the unwanted controls is dangerous, because it sometimes registers as a double-click and then you get a random event handler added to your code or something.

I checked VS2010. It still picks the control on the bottom. It's still broken. It's been broken since Visual Studio .NET 2002, I've been waiting more than eight years for them to fix this bug, and it is STILL BROKEN. Does anyone actually use this anymore??

I really do want to give the Visual Studio team the benefit of the doubt. I do like a lot of the improvements in the compiler. Yet, I'm afraid that if I ever met a member of the IDE team that I would wrap my hands around his neck and strangle him, for the sheer amount of pain his team has inflicted upon me. I mean, c'mon. What visual editor with draggable components selects the one on the bottom?? There are so many other pet peeves of mine that still aren't fixed. All I have to do is look at the nearly unchanged project settings dialog to get the sinking feeling that the team still doesn't really get what they need to do to achieve "10 is the new 6." Then I look at the new MSBuild-based C++ project system in progress, which takes more than 30 seconds to load the converted VirtualDub.sln and prints out a 400+ column command line by default for every file group that it builds, and I get really depressed. And I look over the fence at other stuff like the XAML editor, and things don't really look that rosier over there, either.

Please, make VS2010 better. I tried Eclipse once and I hated it. I'd have to wear a bag over my head if I had to resort to EMACS. I don't want to succumb to the temptation of writing my own IDE. I don't need new features. I just need what's there now to work well.

(Read more....)

§ Good approximation, bad approximation

Numerical approximations are a bit of an art. There are frequently tradeoffs available between speed and accuracy, and knowing how much you can skimp on one to improve the other takes a lot of careful study.

Take the humble reciprocal operation, for instance.

The reciprocal, y = 1/x, is a fairly basic operation, but it's also an expensive one. It can easily be implemented in terms of a divide, but division is itself a very expensive operation  it can't be parallelized as easily as addition or multiplication and typically takes on the order of 10-20 times longer. There are algorithms to compute the reciprocal directly to varying levels of precision, and some CPUs provide acceleration for that. x86 CPUs are among them, providing the RCPSS and RCPPS opcodes to compute scalar and vectorized reciprocal approximations, respectively.

However, there is a gotcha.

For any approximation, the first question you should ask is how accurate it is. RCPSS and RCPPS are documented as having a relative error no greater than 1.5*2^-12, or approximately 12 bits of precision. That's fine, and good enough for a lot of purposes, especially with refinement. The second question you should ask is whether there are any special values involved that deserve special attention. I can think of five that ideally should be exact:

RCPSS/RCPPS do handle zero and infinity correctly, which is good. Sadly, 1.0 is handled less gracefully, as it comes out as 0.999756 (3F7FF000), at least on Core 2 CPUs. Even worse, if you attempt to refine the result using Newton-Raphson iterations:

x' = x * (2 - x*c)

...the result converges to 0.99999994039535522 (3F7FFFFF), a value just barely off from 1.0 that in many cases it will be printed as 1.0, such as in the VC++ debugger. This leads to lots of fun tracking down why calculations are slewing away from unity when they shouldn't, only to discover that the innocuous little 1.0 in the corner is actually an impostor, and then having to slow down an otherwise speedy routine to handle this special case. Argh!

If I had to take a guess as to why Intel did this, it's probably to avoid the need to propagate carries from the mantissa to the exponent, because otherwise the top 12 bits of the mantissa can go through a lookup table and the exponent can produce the result exponent and top bit of new mantissa. It's still really annoying, though. I have to go fix the rcp instruction in my shader engine, for instance, because the DirectX docs explicitly require that rcp of 1.0 stays 1.0. Curiously, they don't mention -1.0. I guess it just goes to show how hard it is to specify a good approximation.

(Read more....)