Current version

v1.9.9 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ AVI timing and audio sync

Last time I promised I would write up some information about how VBR audio is popularly implemented in an AVI file; I'm going to generalize this slightly and talk about the timing of AVI streams. I'm not going to speak on the properness of VBR audio in AVI because almost everyone knows how I feel about this and that doesn't change the fact that VBR files are out in the wild and will be encountered by applications that accept the AVI format. Instead, here are the technicals so you will at least know how it works and what issues arise as a result.

I should note that I didn't devise the VBR scheme; I simply reverse engineered it from the Nandub output when I started receiving reports that newer versions of VirtualDub suddenly were not handling audio sync properly on some files. The technique I describe below varies slightly from Nandub's output, as I omit some settings that, as far as I can tell, are not necessary to get VBR-in-AVI working.

As usual, any and all corrections are welcome.

AVI streams, both audio and video, are composed of a series of samples which are evenly spaced in time. For a video stream, a stream sample is a video frame, and the stream rate is the frame rate; for an audio stream, a stream sample is an audio block, which for PCM is equivalent to an audio sample. These stream samples are in turn stored in chunks, where there is generally one sample per chunk for a video stream, and multiple samples per chunk for an audio stream. These chunks are then pointed to by the index, which lists all chunks in the file in their stream order.

Timing of an AVI stream is governed by several variables:

There is one last tidbit missing: where exactly each sample starts and ends. The standard set by DirectShow is that the start time for the initial sample is zero, so assuming dwStart=0, the first sample in a 25/sec stream would occupy [0ms, 40ms), the second [40ms, 80ms), etc. This can be interpreted as nearest neighbor sampling, which means that an interpolator would consider the samples to be in the center of each interval at 20ms, 60ms, and so on.

Note that, based on the above, the timing of a sample is determined solely by its position in the stream -- that is, a sample N always has a start time of (dwStart + N)*(dwScale/dwRate) seconds regardless of its position in the file. In particular, the grouping of samples into chunks or the position of a stream's chunks relative to another stream's chunks doesn't matter. This means that interleaving of a file doesn't affect synchronization between two streams. That doesn't mean that interleaving doesn't affect performance, and if a player has strict playback constraints as hardware devices often do, poor interleave may render a player unable to maintain correct sync or even uninterrupted playback. However, a non-realtime conversion on a hard disk (or other random access medium) on a PC should not have such constraints.

Now, about VBR....

You might think that setting dwSampleSize=0 for an audio stream would allow it to be encoded as variable bitrate (VBR) like a video stream, where each sample has a different size. Unfortunately, this is not the case -- Microsoft AVI parsers simply ignore dwSampleSize for audio streams and use nBlockAlign from the WAVEFORMATEX audio structure instead, which cannot be zero. Nuts. So how is it done, then?

The key is in the translation from chunks to samples.

Earlier, I said that the number of samples in a chunk is determined from the size of the chunk in bytes, since samples are a fixed size. But what happens if the chunk size is not evenly divisible by the sample size? Well, DirectShow, the engine behind Windows Media Player and a number of third-party video players that run on Windows, rounds up. This means that if you set nBlockAlign to be higher than the size of any chunk in the stream, DirectShow will regard all of them as holding one sample, even though they are all different sizes. Thus, to encode VBR MP3, you simply have to set nBlockAlign to at least 960, the maximum frame size for MPEG layer III, and then store each MPEG audio frame in its own chunk. Since each audio frame encodes a constant amount of audio data -- 1152 samples for 32KHz or higher, 576 samples for 24Khz or lower -- this permits proper timing and seeking despite the variable bitrate. This can also be done for other compressed audio formats, provided that the encoding application is able to determine the compressed block boundaries and the maximum block size, and the decoders accept non-standard values for the nBlockAlign header field.

The advantages of this VBR encoding:

Now, the downsides:

As I mentioned in the introduction, I will refrain from saying whether VBR audio should or shouldn't be used, as I've already done the subject to death. Hopefully now those of you trying to write AVI parsers will have some idea about how to read and detect VBR files, however.

Comments

Comments posted:


The maximum frame size for MPEG1 layer 3 is 320*144/32 = 1440 bytes, so nBlockAlign should be at least 1440 for MP3. Unfortunately, the NanDub hack uses 1152 which is also not sufficient. AVIs with MP3 frames larger than 1152 bytes usually have the video speeded up.

On the other hand, it is not very likely that MP3 will be encoded at 32 kHz and an average bitrate of say 290 kbps... not by sane people anyway.

stephanV - 17 11 04 - 09:25


The reason I specified 961 is that layer III frames cannot exceed 7680 bits in size, as that is the buffer size proscribed by the standard. I guess the frame size would still exceed 7680 bits even if the data portion didn't, though. Actually, I believe it can far exceed 1441 bytes (1440 bytes + 1 pad) if MPEG-2.5 is used. It's not a huge deal in practice since you can trivially compute the required lower bound before you begin writing the stream.

Phaeron - 17 11 04 - 23:53


A very interesting topic Phaeron.I cannot add any intelligent advice as I have no experience of software coding,but it sounds as though you have things figured out to a T.
Its obvious you spend a lot of hard work reading up on things to improve your free VirtualDub software.
When you have released a fully stable version of your newly coded VirtualDub,you could definitely sell it because I know a lot of people would be willing to pay for it,me included!

Luke - 18 11 04 - 19:18


DirectShow's avi muxer cannot remux vbr audio simply because MS's avi splitter cannot timestamp every sample in this case. When using another source (another splitter) the muxer can handle the stream, but I was experimenting with this so long ago that I can't remember whether the output was in sync with the video or playable at all.

Gabest (link) - 19 11 04 - 09:16


It's really funny that the AVIFile API rounds down, even down to zero: If a chunk were incomplete, the AVIIF_FIRSTPART or AVIIF_LASTPART flag in the index would have to be set. Thus, it is logical to assume that a chunk smaller than nBlockAlign bytes contains one frame (or "sample") as described in the AVI headers. Assuming zero frames in a complete chunk is imho a bit weird.

There is no such flag in the opendml index structures. However, that AVIFile API does not even support OpenDML (IIRC)...

Alexander Noé - 20 11 04 - 11:46


This is a great program and I use it frequently however, I downloaded an AVI file and it plays fine on the computer but when I changed it to DVD the sound is too high and the people sound like chipmonks . What can I do or change to record it properly ?

Judy - 17 04 08 - 18:13


So how is out of sync sound to picture corrected in VirtualDub? Can it do the job?

Ianpb - 04 02 09 - 11:45


@Ianpb: Yes, go to Audio > Interleaving > Audio skew correction.

Martin - 10 07 09 - 01:02


Hi! I am a bit of a rookie with digital video manipulation, so, I hope this is not too dummy...
I have an AVI whose video stream has a non-zero start value and VirtualDub warns me of that. I know that VirtualDub does not support non-zero start values but, is there a way to fix this? Is there a tutorial or something like that on the web to show me the way to fix this? Thanks a lot!

Fred Mach - 11 08 09 - 00:07

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.