Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ If you receive a pitch/stride, you can't memcpy() across the image

It's fairly common for images to be represented as a structure that includes width, height, and a field called pitch or stride. The pitch of an image is the distance in bytes between the start of one scanline to the next. It allows an image to be specified such that the scanlines aren't adjacent in memory, which is useful for achieving a given alignment. Software tends to run slightly faster if 8 or 16 byte alignment can be achieved; hardware sometimes has more draconian alignment requirements, like 128 bytes.

A fairly common mistake I see when handling an image with pitch is code like this:

memcpy(image.Data, srcImage, image.Pitch * image.Height);

This is a great way to cause image corruption or even a crash.

You can't just write across the entire memory block, because the area of memory outside of the valid scanlines is usually both unspecified and reserved. DirectDraw, for instance, interleaves surfaces and thus writing in this manner may trash other surfaces in video memory. This also generally assumes a specific pitch, which defeats the purpose of having it specifiable in the first place. Some people have gotten used to this because Direct3D always uses a specific alignment for system memory and managed surfaces that tends to give adjacent scanlines; switch to video memory resources and suddenly the code gives corrupted images.

Oh, and by the way, pitches can be negative in some frameworks, in which case the code above will nicely crash. I got burned by this issue when I was upgrading the filter pipeline to support YCbCr formats in 1.8.0, during which I discovered that a not so insignificant number of third-party filters crashed hard when given bitmaps with negative pitches. That's why in 1.8.0, if you manage to get an RGB image in the filter chain that has a top-down orientation, the pipeline will force a blit to flip it and you'll see a [C] conversion marker next to the filter entry. This is skipped if the filter advertises support for image format negotation, so if you're planning on upgrading a video filter to support YCbCr, you're gonna have to fix your code or it'll blow up.

When reading from a strided surfaces you can more often get away with such sloppy code, but there is still a gotcha: technically, the description of such a surface doesn't guarantee that there is readable padding beyond the end of the last scanline, in which case reading pitch bytes can fault. In the end, it's not worth the trouble -- just write a MemcpyRect() routine once that copies scanlines individually, and use it everywhere. You can even optimize it for the case where the scanlines do turn out to be contiguous and in ascending order.

There is one case where it's generally OK to step outside of the bounds of a scanline, and it's when you're trying to optimize away unaligned access. To be more specific, you can assume (at least on x86) that an unaligned word can safely be accessed by reading the two aligned words that hold it. You may drive analysis tools like Purify nuts, but you can gain a lot of performance in the unaligned case this way. The resampler in VirtualDub 1.8.0 optimizes for the planar 8-bit case by taking advantage of this: it precomputes filter kernels that have been appropriately padded and shifted by the misalignment amount. This results in a small amount of extra computation in some cases, but completely avoids misalignment penalties.

Comments

Comments posted:


> Oh, and by the way, pitches can be negative in some frameworks, in
> which case the code above will nicely crash. I got burned by this
> issue when I was upgrading the filter pipeline to support YCbCr
> formats in 1.8.0, during which I discovered that a not so
> insignificant number of third-party filters crashed hard when given
> bitmaps with negative pitches.

Heh. I knew RGB did such crazy things (not that I didn't have to fix it after the fact in my ffdshow AviSynth filter), but I couldn't believe the same thing to happen with YV12... until libtheora came along and happily did it... :(

Probably should've prevented that when I fixed the RGB case... :/

np: Kings Of Convenience - Stay Out Of Trouble (Riot On An Empty Street)

Leak (link) - 10 03 08 - 13:30


> you can assume (at least on x86) that an unaligned word can safely be accessed by reading the two aligned words that hold it.
Where is this documented in the Intel manuals? By word, do you mean 2 contiguous bytes or does it apply to MMX/SSE2 register widths as well?

pshufb - 11 03 08 - 14:03


It's not explicitly mentioned, but it comes out of the definition of a page fault: an access violation can only be detected by page fault, which requires crossing an aligned page boundary. If the unaligned word doesn't cross a page boundary, then the two aligned words containing it must be in the same page, and can't fault. If the unaligned word does cross a page boundary, then the two aligned words lie on the two pages, and again those cannot fault if the unaligned access doesn't. This is true for any power of two word size smaller than the page size (4K) -- doesn't matter if you're dealing with 16-bit, 32-bit, 64-bit, or 128-bit accesses.

In reality, fetching aligned words is actually how the CPU itself handles misaligned accesses. On current Intel CPUs, an L1 cache line is fetched and pushed through a shifter, which accesses the desired words. If the access crosses L1 cache lines, a data cache unit split event occurs (with associated penalty), bytes are shifted out from the adjacent L1 cache line, and the results are combined to produce the misaligned words.

Phaeron - 12 03 08 - 00:49


Thanks for the insightful explanation. I guess you were careful to use the terminology 'unaligned word' because there is a subtle point that if a pointer was at an aligned address that was one wordsize below a 4KB boundary, the action of getting two consecutive words from that address could cause an access violation. What are some good ways to avoid this problem other than making a separate routine for the aligned case?

pshufb - 12 03 08 - 22:39


The simplest strategy is simply to use special-case code for the final steps that would overread. A slightly more complex strategy, if you're writing something like a resampler, is to compensate in the offset table. What I do is detect when the filter would overlap the right border of the bitmap, clamp the offset, and shift the filter kernel accordingly. This fails if the filter kernel overlaps both the left and right sides of the bitmap, but in that case you might as well just reduce the filter kernel length.

Phaeron - 13 03 08 - 01:39

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.