Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Calendar

« May 2013 »
S M T W T F S
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Archives

01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Avoiding burnout on the Internet

I've been a long time reader and occasional commenter on Raymond Chen's blog, The Old New Thing. Raymond's an old-time programmer on the Windows team and has a lot of good experience and advice to share... but as of late he's been becoming increasingly frustrated with the comments on his blog, to the point that he's actually begun pre-emptively attacking people by name. Raymond freely admits to having the social skills of a thermonuclear device, but given the buildup I've been seeing over the past couple of weeks that started with the Nitpicker's Corner, it seems to me that the fellow's getting a bit too close to blowing up. Sure, some of the comments on his blog are annoying or incendiary, but perhaps he should disable comments or take a break for a while.

One of the things I've learned about being on the Internet is that it's a really, really big place... global, in fact. That means you're really insignificant, and it's easy to be bowled over by the magnitude of it all -- especially if you attract a lot of attention, as Raymond has due to his skill and writing style. I've had to deal with this too in some ways, due to the popularity of VirtualDub. In order to avoid getting blown out myself, I've adopted some rules:

(Read more....)

§ Troubles with _mm_loadl_epi64()

Alright, who was the dork who designed this SSE2 compiler intrinsic:

__m128i _mm_loadl_epi64(__m128i const *p);

What does this intrinsic do? It loads a 64-bit integer value from memory and stores it into the low 64 bits of an XMM register, zeroing the upper 64 bits. It's the compiler intrinsic version of the MOVQ instruction. MOVQ is fairly important for image processing routines in SSE2 for a couple of reasons: it's very convenient to process 64 bits of data, since eight 8-bit samples can be loaded and expanded as 16-bit words in a 128-bit register, and 128-bit memory accesses can't be misaligned like 64-bit memory accesses can.

Anyway, I ran into this while porting my old AP-922 based IDCT routine to intrinsics in order to recompute the constants according to a tip I'd found in a whitepaper (folding column rounding into row pass, genetic algorithm to tune... don't ask). I figured, hey... maybe I'll try intrinsics again... couldn't hurt, right? Visual C++ tends not to do well with MMX intrinsics, i.e. it misgenerates code, so I first emulated the MMX instructions with scalar code. When that worked, I tried rewriting the wrappers with SSE2 for speed.

Only to have the routine utterly and completely blow up.

(Read more....)

§ VirtualDub 1.8.0 Released

VirtualDub 1.8.0 is out -- this is a new experimental release that contains many changes I've been working on in the background for months. As this is an experimental release, it is recommended that you stick with 1.7.8 for production use. However, any feedback on changes in 1.8.0 is appreciated and will be used as the 1.8.x branch eventually becomes the new stable branch.

The main big change in 1.8.0 is enhanced audio support, including:

The VBR warning is still displayed by default, although it can be disabled in Preferences; turns out, some people were using it to detect files that were unlikely to play properly on their hardware players.

The video filter subsystem has also been overhauled for 1.8.0. A side effect of the changes is that some video filters -- in particular, those that use GDI to draw on video frames -- may run slightly slower. However, there are other changes which can allow the filter chain to run much faster as well. The changes:

Video filter authors interested in adding frame rate modification or YCbCr support to their filter should consult the VirtualDub Plugin SDK, version 0.7. The Plugin SDK is still pre-release, but comments and questions are welcome.

There are other miscellaneous changes in 1.8.0, as well as bug fixes that were too risky or extensive to push into 1.7.8.

As I write this, there is an issue on the SourceForge project servers that is preventing me from updating the download page for 1.8.0. If this is still an issue when you read this, visit the VirtualDub project page on SourceForge, and you should be able to download both 1.7.8 and 1.8.0. For those of you who are signed up for new release notifications, the file releases have now been split into stable and experimental packages, so you should subscribe to the virtualdub-experimental package if you wish to be notified when a new experimental release is available. I'd also encourage you to visit the Testing/Bug Reports section of the forum occasionally, as bleeding-edge test releases also appear there.

Changelist after the jump....

(Read more....)

§ They called _what_ in the inner loop??

AMD just open sourced the AMD Performance Library as Framewave, which at least from my perspective seems like a good thing. Not that I'm going to attempt to use it, but I perused the source out of curiosity, and it looks like there are some useful goodies in there.

And then there's some... marginal stuff.

One thing that I wanted to look at was their 8x8 2D-IDCT source. The 8x8 2D inverse discrete cosine transform (IDCT) is popular and used in a number of video compression formats. There are a million ways to implement it quickly, and although everyone's seen Intel's AP-922 SSE2 algorithm for it by now, I hadn't seen one by AMD before. So I grab the source and dig around in the JPEG module, and I see this:

int IdctQuant_LS_SSE2(const Fw16s *pSrc, Fw8u *pDst, int dstStp, const Fw16u *pQuantInvTable)
{
... pedx = (Fw16s *) fwMalloc(128); //64 array of Fw16s type

Who the #*@&*( calls malloc() in an optimized IDCT routine???

It looks like there are indeed a number of well-optimized SSE2 routines in the Framewave library, but after seeing things like the above a few times I was left scratching my head a bit....

Another uglyness I saw, which isn't restricted to Framewave unfortunately, is assembly language routines that have been translated to intrinsics. The result is a nasty C++ routine that has variables like "pedx" and "pesi," but has instruction names translated so that what used to be an understandable "paddw" is now "_mm_add_epi16." I know this was a hack job for portability, but the result sure is unreadable.

(Read more....)

§ VirtualDub 1.7.8 released

1.7.8 is out -- fetch!

Not really anything new, just a bunch of bug fixes. Most of them are for bugs and crashes in the capture module, but there are a couple that hit the editing module as well.

Those of you who frequent the forums know that I've been putting a lot of changes into another release, which was labeled as 1.7.X in test builds. That build is also nearly completed and is going to become the new 1.8.0 experimental build after it passes final checks to make sure it has an acceptably low level of stupid bugs. Unlike the jump from 1.6.x to 1.7.x, there will be no jump in minimum system requirements for 1.8.x; pretty much the only time that happens is when I'm forced to do so, and I'm not switching to VS2008 this time.

Changelist after the jump.

(Read more....)

§ The hidden danger of the Win32 TreeView

Random performance anecdote time.

I once had a bug filed on VirtualDub regarding a performance problem in its hex viewer on large files. (I have a habit of putting random features into my open-source tools; it never ceases to amaze me that people actually use them.) The problem turned out to be in a code fragment like this:

while(GetNextChunk(chunkInfo)) {
TVINSERTSTRUCT tvItem;
CreateTreeViewItem(tvItem, chunkInfo);
TreeView_InsertItem(hwndTV, tvItem);
}

I had expected that I'd done something stupid in the hex viewer code. When I profiled the routine under VTune revealed that for large files, though, I discovered that this routine was spending almost no time in VirtualDub.exe itself -- it was spending a huge amount of time in the TreeView_InsertItem() call. This is a call to the Win32 tree view control to insert an item. Investigation into the disassembly around the hotspot revealed that the Win32 tree view internally stores its nodes as a singly-linked list and adding an item to the end takes linear time according to the number of items. This meant that in order to add N items to the tree list, a total of N^2 steps were required, making the tree initialization quadratic time. In case you're not familiar with asymptotic complexity, here's my cheat sheet:

I ended up solving this problem in two ways: I changed the routine to insert items in reverse order at the beginning instead of in forward order at the end, and I split the chunk list into two levels to reduce the maximum child count within a tree node.

Scalability problems are the worst kind of performance issues to deal with because the performance effects can be drastic and the fixes dangerously invasive, i.e. rewrite. The main danger is that it's really easy to nest fairly fast operations and end up with a composite operation that is O(N^2) or worse. On more than one occasion I've seen people unnecessarily calling linear-time operations like strlen() in a loop, and that simple error ends up turning an ordinarily fast operation into a painfully slow one.

(Read more....)