Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
Forum
 
Other projects
   Altirra

Search

Calendar

« July 2013 »
S M T W T F S
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ UYVY vs. YUY2

Video formats on Windows are described primarily by a "four character code," or FOURCC for short. While most describe compressed formats such as Cinepak and various MPEG-4 variants, FOURCCs are also assigned for YCbCr compressed formats that aren't natively supported by the regular GDI graphics API. These FOURCCs are used to allow video codecs to recognize their own formats for decoding purposes, as well as to allow two codecs to agree on a common interchange format.

Two common YCbCr FOURCCs are UYVY and YUY2. They are interleaved, meaning that all YCbCr components are stored in one stream. Chroma (color) information is stored at half horizontal resolution compared to luma, with chroma samples located on top of the left luma sample of each pair; luma range is 16-235 and chroma range is 16-240. The formats are named after the byte order, so for UYVY it is U (Cb), Y (luma 1), V (Cr), Y (luma 2), whereas the luma and chroma bytes are swapped for YUY2 -- Y/U/Y/V.

On Windows, it seems that YUY2 is the more common of the two formats -- Avisynth and Huffyuv prefer it, the MPEG-1 decoder lists it first, etc. Most hardware is also capable of using both formats. Ordinarily I would consider supporting only YUY2, except that the Adaptec Gamebridge device I recently acquired only supports UYVY. Now, when working with these formats in regular CPU based code, the distinction between these formats is minimal, as permuting the indices is sufficient to accommodate both. (In VirtualDub, conversion between UYVY and YUY2 is lossless.) When working with vector units, however, the difference between them can become problematic.

In my particular case, I'm looking at Direct3D-accelerated conversion of these formats to RGB, so the graphics card's vector unit is the pertinent one.

There are a few reasons I'm pursuing this path. One is that DirectDraw support on Windows Vista RTM seems to be pretty goofed up; video overlays seem to be badly broken on the current NVIDIA drivers for Vista, even with Aero Glass disabled. Second, I'm experimenting with real-time shader effects on live video, and want to eliminate the current RGB-to-YCbCr CPU-based conversion that occurs when Direct3D display is enabled in VirtualDub. Third, I've never done it before.

If you're familiar with Direct3D, you might wonder why I don't just use UYVY or YUY2 hardware support. Well, unfortunately, although YCbCr textures are supported by ATI, they're not supported on NVIDIA hardware. Both do support StretchRect() from a YCbCr surface to an RGB render target, but there are luma range problems when doing this. So it's down to pixel shaders.

Now, I have a bit of fondness for older hardware, and as such, I want this to work on the lowest pixel shader profile, pixel shader 1.1. The general idea is to upload the UYVY to YUY2 data to the video card as A8R8G8B8 data, and then convert that in the pixel shader to RGB data. The equations for converting UYVY/YUY2 data to RGB are as follows:

R = 1.164(Y-16) + 1.596(Cr-128)
G = 1.164(Y-16) - 0.813(Cr-128) - 0.391(Cb-128)
B = 1.164(Y-16) + 2.018(Cb-128)

As it turns out, this works out very well for UYVY. Cb and Cr naturally fall into the blue and red channels of the A8R8G8B8 texture; chroma green can be computed via a dot product and merged with a lerp. A little logic for selecting between the two luma samples based on even/odd horizontal position, and we're done. Heck, we can even use the bilinear filtering hardware to interpolate the chroma, too.

YUY2, however, is more annoying because Cb and Cr fall into green and alpha, respectively. Pixel shader 1.1 is very restricted in the channel manipulation available and instructions can neither swizzle the RGB channels nor write to only some of them; also, there is no dp4 instruction for including alpha in a dot product in 1.1. Just moving the scaled Cb and Cr into position consumes two of the precious eight vector instructions:

def c0, 0, 0.5045, 0, 0    ;c0.g = Cb_to_B_coeff / 4
def c1, 1, 0, 0, 0.798     ;c1.rgb = red | c1.a = Cr_to_R_coeff / 2
dp3 r0.rgb, t1_bx2, c0     ;decode Cb (green) -> chroma blue / 2
+ mul r0.a, t1_bias, c1.a  ;decode Cr (alpha) -> chroma red / 2
lrp r0.rgb, c1, r0.a, r0   ;merge chroma red

The net result is that so far, my YUY2 shader requires one instruction pair more than the UYVY shader. I don't know if this is significant in practice, since the underlying register combiner setup of a GeForce 3 is very different and considerably more powerful than Direct3D ps1.1 -- it can do dot(A,B)+dot(C,D) or A*B+C*D in one cycle -- but I have no idea how effective the driver is at recompiling the shader for that architecture.

(If you're willing to step up to a RADEON 8500 and ps1.4, all of this becomes moot due to availability of channel splatting, arbitrary write masks, and four-component dot product operations... but where's the fun in that!?)

It seems that, at least for vector units without cheap swizzling, UYVY is a better match for BGRA image formats than YUY2 due to the way that channels line up. I've been trying to think of where YUY2 might be more appropriate, but the best I can come up with is ABGR, which is a rare format. The other possibility is that someone was doing a weird SIMD-in-scalar trick on a CPU that involved taking advantage of the swapped channels; doing an 8 bit shift on an 80286 or 68000 would have been expensive.

(Read more....)

§ Fix list for Visual Studio 2005 Service Pack 1

Glenn Maynard writes:

I sure wish I could find a good list of what's actually changed.  The "new in SP1" list is empty.

Ah, yes. Microsoft made it really hard to tell what actually got fixed in Visual Studio 2005 Service Pack 1. There are actually some fix lists out, though (and I do mean lists).

The "what's new in SP1" selection referred to by the SP1 release notes is indeed empty -- sloppy. One of the commenters on the VC++ blog found a list of hotfixes that made it into the service pack, though:

http://support.microsoft.com/default.aspx/kb/918526

This is not a comprehensive list of fixes, as it only lists hotfixes; other bugs that were submitted on Connect have also been fixed. Some, but not all, of the fixes that were tentatively listed for VC++ SP1 (http://blogs.msdn.com/vcblog/archive/2006/06/22/643325.aspx) made it into the build. For instance, a fix for a for scoping bug that I submitted got backed out -- I'm pretty sure that was in the beta build -- but an obscure alloca() bug was fixed.

If you dig further into the help, you can find a new SP1 section added to the "What's New in Visual C++ Compiler, Language, and Tools" section, listing new features added to the service pack (naughty, naughty).

http://msdn2.microsoft.com/en-us/library/f0tby9k9(VS.80).aspx

The new features are minor. The SP1 page says that it adds "new processor support (e.g. Core Duo) for code generation," but don't get too excited. All of the new VC++ features pertain only to kernel mode -- specifically, intrinsics were added for the new hardware virtualization instructions, and for declaring whether a 32-bit pointer is signed or unsigned. Okay, you could use __ud2() and __nop() in user space, but they're not terribly useful.

Update:

Kevin Frei on the VC++ team posted a little note on his blog about compiler level fixes in SP1:

http://blogs.msdn.com/freik/archive/2006/12/19/new-job-sp1.aspx

It looks like SP1 is worthwhile if you do AMD64 or use profile guided optimization (PGO).

(Read more....)

§ Visual Studio 2005 Service Pack 1 is out (and will eat your hard drive alive)

Visual Studio 2005 Service Pack 1 Final is finally out.

Before you go and install it, though, check your disk space on C: and installed memory. No, really, bad things happen otherwise.

Thanks to the wonderful technology that is the Microsoft Installer, VS2005 SP1 eats quite a bit more disk space than it should. In order to install it, you should really have at least 3GB of free disk space on your system drive (NOT necessarily where you installed Visual Studio); people have reported VS2005 installations breaking when they tried installing with less, or even the VC8 runtime library going missing in the system (BAD). The installer will make about four copies of the 400MB archive during the unpacking process and also load the entire patch into memory to check it, so it can take a good 20-30 minutes to install the patch, and more if you're low on memory and the system starts swapping. I've heard that you can diffuse some of this with some switches on the top-level setup application and/or manually extracting the .msp with an unzip utility.

What is perhaps much worse is that you will also lose more disk space in the end than you might expect, because the Installer will copy the entire 400MB .msp file into the C:WindowsInstaller folder AND copy a ton of uncompressed backup files into a subdirectory below that. I only installed Visual C++ and Visual C#, but in the end I lost 1.3GB of space on C: after installing VS2005 SP1. I filed a bug on this behavior during the SP1 beta, but apparently the VS team didn't find a way to address it. I can't faithfully express in words how lame this is, but the next time I have to reinstall VS2005 I'm going to try slipstreaming the patch into the installation media because increasing the installation footprint of Visual Studio by 5x is ridiculous.

(Read more....)

§ Amusing OpenGL extension specification

The OpenGL 3D graphics API has a well-defined mechanism for extension by third parties, and extension by vendors provides it much of its modern power. Part of providing an extension is writing a specification document, which has a standard form with name, version, authorship, functions/tokens, issues, and revisions to the language in the base specification and other extensions.

You can see some of the extension specifications at the OpenGL Extension Registry, although it may be incomprehensible reading if you're not experienced in 3D graphics.

For the most part, the specification documents are to the point... but occasionally a snarky statement does sneak in. Take WGL_NV_render_texture_rectangle, for instance:

Additions to the WGL Specification

First, close your eyes and pretend that a WGL specification actually
existed. Maybe if we all concentrate hard enough, one will magically
appear.

(This must have been a pet peeve of the fellow from NVIDIA, because this text appears in WGL_NV_render_depth_rectangle, too.)

(Read more....)

§ And I thought my implementation of Deflate was bad

When I added PNG support to 1.7.0, I wrote my own routines to handle the Deflate compression algorithm. The Deflate algorithm is also the underlying compression algorithm behind the well-known 'zip' format, and is surprisingly complex, with a sliding window matching algorithm combined with three layers of Huffman trees. It was quite an educational experience to write a fast sliding window compressor.

Someone posted an interesting bug in the Visual Studio public bug database about the new zip file support enlarging files, so I decided to try .NET Framework's Deflate compressor:

using System;
using System.IO;
using System.IO.Compression;
namespace VCZipTest {
    class Program {
        static void Main(string[] args) {
            using (FileStream src = new FileStream(args[0], FileMode.Open)) {
                using (FileStream dst = new FileStream(args[1], FileMode.Create)) {
                    using (GZipStream def = new GZipStream(dst, CompressionMode.Compress, true)) {
                        byte[] buf = new byte[4096];
                        for (; ; ) {
                            int act = src.Read(buf, 0, 4096);
                            if (act <= 0)
                                break;
                            def.Write(buf, 0, act);
                        }
                    }
                    System.Console.WriteLine("{0} ({1} bytes) -> {2} ({3} bytes)", args[0],
src.Length, args[1], dst.Length); } } } } } D:projwinVCZipTestbinDebug>vcziptest f:shuffle.mp3 f:test.bin.gz f:shuffle.mp3 (1439439 bytes) -> f:test.bin.gz (2114778 bytes)

Yes, the .NET Framework implementation actually enlarged the file by 47%, and yes, gunzip was able to "decompress" the 2.1MB file correctly. I ran the file through VirtualDub's zip decompressor and found that the literal/length Huffman tree in the stream was pretty badly misbalanced, with most of the literal codes allocated 14 bits. Deflate should never enlarge a stream by more than a tiny fraction since it allows individual blocks to be stored.

In contrast, even "gzip -1" was able to compress the file from 1,439,439 bytes to 1,416,488 bytes, and gzip -9 got it down to 1,406,973 bytes.

(Read more....)

§ SSE warning in 1.7.0

Many of you have reported a warning like this showing up in the experimental 1.7.0 release:

[E] Internal error: SSE state was bad before entry to external code at
    (.sourcew32videocodecpack.cpp:670). This indicates an uncaught bug
    either in an external driver or in VirtualDub itself that could cause
    application instability.  Please report this problem to the author!
    (MXCSR = 00009fc0)

In all of the cases I've seen, this warning is harmless and can be ignored. I'll be tweaking the codebase in 1.7.1 to prevent these from appearing in such circumstances.

From what I can tell, the reason for the warnings is that modules compiled with Intel C/C++ with fast math optimizations enabled causes the runtime to flip the Flush to Zero (FTZ) and Denormals are Zero (DAZ) bits on in the CPU's SSE control register. SSE stands for Streaming SIMD Extensions and refers to a streamlined vector math instruction set added starting with the Intel Pentium III and AMD Athlon XP CPUs. In my opinion, the runtime really shouldn't be flipping these settings, because those settings affect math precision in other code running in the thread, and I'm pretty sure it's against the Win32 calling convention to call external code with those bits set. It's definitely against the Windows x64 calling convention, which explicitly defines them as part of nonvolatile state and as normally disabled. Nevertheless, it appears that there are several video codecs and video filters that are compiled in this manner, and thus trip the problem.

The reason I added this check in 1.7.0 is due to an additional vulnerability that I gained when moving to Visual Studio 2005, which is sensitivity to SSE math settings. There has always been a problem with third-party DLLs screwing around with x87 floating point state, like changing precision and unmasking exceptions, which causes all sorts of mayhem such as crashes and calculations going haywire. For this reason, VirtualDub monitors the x87 state in and out of all external calls to video codecs and filters, and fixes the x87 state whenever it detects a violation. However, it didn't check the SSE control register, because it didn't use any SSE math.

1.7.0 is different, however, because starting with Visual Studio .NET (2002), the C runtime library will use SSE2-optimized versions of math functions when possible, such as pow(). These implementations use sequences of primitive operations (add/multiply/etc.) instead of microcoded transcendental math instructions in the FPU. This is often faster, but an unfortunate side effect is that it can be a lot more inaccurate when the rounding mode in the FPU is inappropriately set. Shortly before shipping 1.7.0, I discovered that the resize video filter had a long-standing bug where it would switch the FPU from round-to-nearest-even to truncate, and not restore it properly. This was harmless in 1.6.x because the filter code auto-fixed the x87 state, but in Visual Studio 2005 the _control87() function also changes the SSE state. As a result, the levels filter started showing speckling errors which I tracked down a bizarre result of something like pow(0.997, 0.998) = 1.001, which in turn was caused by the bad rounding mode. Thus, after fixing the resize filter, I added code to check for and fix the analogous SSE violations. Unfortunately, I didn't have a video filters or codecs installed that were compiled with Intel C/C++ aggressive optimization settings, so I missed the warning problem. There was also a bug in the startup code which caused the SSE check to be enabled too late, so any video filters which tripped this problem showed up as an internal error instead of properly tagging the violator.

FTZ (flush-to-zero) and DAZ (denormals-are-zero) are flags which, when set, tell the FPU to allow slight violations of IEEE math rules for faster speed. The numbers in question are denormals, which are really tiny numbers that are so small that they are missing the leading 1 bit normally implicit in IEEE-encoded floating point numbers; for a float, these are smaller in magnitude than about 5.8*10^-39. The FPU normally handles these special cases by grinding the pipeline to a halt and executing special microcode. Most applications won't need the additional accuracy provided by denormals, though, so setting these bits can increase performance slightly by reinterpreting the tiny numbers as zero. It's not that huge of a deal on the x86 architecture because microcoded execution is still hardware support, whereas on some RISC CPUs denormals actually cause a trap to a software emulation handler, which is thousands of times slower than the hardware unit.

The change in accuracy caused by enabling FTZ and DAZ is very minor compared to flipping precision or rounding modes; I was unable to find any computations not involving denormals which were affected by their absence. As a result, 1.7.1 will simply ignore those bits in the external code check.

(Read more....)