Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ AVI timing and audio sync

Last time I promised I would write up some information about how VBR audio is popularly implemented in an AVI file; I'm going to generalize this slightly and talk about the timing of AVI streams. I'm not going to speak on the properness of VBR audio in AVI because almost everyone knows how I feel about this and that doesn't change the fact that VBR files are out in the wild and will be encountered by applications that accept the AVI format. Instead, here are the technicals so you will at least know how it works and what issues arise as a result.

I should note that I didn't devise the VBR scheme; I simply reverse engineered it from the Nandub output when I started receiving reports that newer versions of VirtualDub suddenly were not handling audio sync properly on some files. The technique I describe below varies slightly from Nandub's output, as I omit some settings that, as far as I can tell, are not necessary to get VBR-in-AVI working.

As usual, any and all corrections are welcome.

AVI streams, both audio and video, are composed of a series of samples which are evenly spaced in time. For a video stream, a stream sample is a video frame, and the stream rate is the frame rate; for an audio stream, a stream sample is an audio block, which for PCM is equivalent to an audio sample. These stream samples are in turn stored in chunks, where there is generally one sample per chunk for a video stream, and multiple samples per chunk for an audio stream. These chunks are then pointed to by the index, which lists all chunks in the file in their stream order.

Timing of an AVI stream is governed by several variables:

There is one last tidbit missing: where exactly each sample starts and ends. The standard set by DirectShow is that the start time for the initial sample is zero, so assuming dwStart=0, the first sample in a 25/sec stream would occupy [0ms, 40ms), the second [40ms, 80ms), etc. This can be interpreted as nearest neighbor sampling, which means that an interpolator would consider the samples to be in the center of each interval at 20ms, 60ms, and so on.

Note that, based on the above, the timing of a sample is determined solely by its position in the stream -- that is, a sample N always has a start time of (dwStart + N)*(dwScale/dwRate) seconds regardless of its position in the file. In particular, the grouping of samples into chunks or the position of a stream's chunks relative to another stream's chunks doesn't matter. This means that interleaving of a file doesn't affect synchronization between two streams. That doesn't mean that interleaving doesn't affect performance, and if a player has strict playback constraints as hardware devices often do, poor interleave may render a player unable to maintain correct sync or even uninterrupted playback. However, a non-realtime conversion on a hard disk (or other random access medium) on a PC should not have such constraints.

Now, about VBR....

You might think that setting dwSampleSize=0 for an audio stream would allow it to be encoded as variable bitrate (VBR) like a video stream, where each sample has a different size. Unfortunately, this is not the case -- Microsoft AVI parsers simply ignore dwSampleSize for audio streams and use nBlockAlign from the WAVEFORMATEX audio structure instead, which cannot be zero. Nuts. So how is it done, then?

The key is in the translation from chunks to samples.

Earlier, I said that the number of samples in a chunk is determined from the size of the chunk in bytes, since samples are a fixed size. But what happens if the chunk size is not evenly divisible by the sample size? Well, DirectShow, the engine behind Windows Media Player and a number of third-party video players that run on Windows, rounds up. This means that if you set nBlockAlign to be higher than the size of any chunk in the stream, DirectShow will regard all of them as holding one sample, even though they are all different sizes. Thus, to encode VBR MP3, you simply have to set nBlockAlign to at least 960, the maximum frame size for MPEG layer III, and then store each MPEG audio frame in its own chunk. Since each audio frame encodes a constant amount of audio data -- 1152 samples for 32KHz or higher, 576 samples for 24Khz or lower -- this permits proper timing and seeking despite the variable bitrate. This can also be done for other compressed audio formats, provided that the encoding application is able to determine the compressed block boundaries and the maximum block size, and the decoders accept non-standard values for the nBlockAlign header field.

The advantages of this VBR encoding:

Now, the downsides:

As I mentioned in the introduction, I will refrain from saying whether VBR audio should or shouldn't be used, as I've already done the subject to death. Hopefully now those of you trying to write AVI parsers will have some idea about how to read and detect VBR files, however.

Comments

Comments posted:


The maximum frame size for MPEG1 layer 3 is 320*144/32 = 1440 bytes, so nBlockAlign should be at least 1440 for MP3. Unfortunately, the NanDub hack uses 1152 which is also not sufficient. AVIs with MP3 frames larger than 1152 bytes usually have the video speeded up.

On the other hand, it is not very likely that MP3 will be encoded at 32 kHz and an average bitrate of say 290 kbps... not by sane people anyway.

stephanV - 17 11 04 - 09:25


The reason I specified 961 is that layer III frames cannot exceed 7680 bits in size, as that is the buffer size proscribed by the standard. I guess the frame size would still exceed 7680 bits even if the data portion didn't, though. Actually, I believe it can far exceed 1441 bytes (1440 bytes + 1 pad) if MPEG-2.5 is used. It's not a huge deal in practice since you can trivially compute the required lower bound before you begin writing the stream.

Phaeron - 17 11 04 - 23:53


A very interesting topic Phaeron.I cannot add any intelligent advice as I have no experience of software coding,but it sounds as though you have things figured out to a T.
Its obvious you spend a lot of hard work reading up on things to improve your free VirtualDub software.
When you have released a fully stable version of your newly coded VirtualDub,you could definitely sell it because I know a lot of people would be willing to pay for it,me included!

Luke - 18 11 04 - 19:18


DirectShow's avi muxer cannot remux vbr audio simply because MS's avi splitter cannot timestamp every sample in this case. When using another source (another splitter) the muxer can handle the stream, but I was experimenting with this so long ago that I can't remember whether the output was in sync with the video or playable at all.

Gabest (link) - 19 11 04 - 09:16


It's really funny that the AVIFile API rounds down, even down to zero: If a chunk were incomplete, the AVIIF_FIRSTPART or AVIIF_LASTPART flag in the index would have to be set. Thus, it is logical to assume that a chunk smaller than nBlockAlign bytes contains one frame (or "sample") as described in the AVI headers. Assuming zero frames in a complete chunk is imho a bit weird.

There is no such flag in the opendml index structures. However, that AVIFile API does not even support OpenDML (IIRC)...

Alexander Noé - 20 11 04 - 11:46


This is a great program and I use it frequently however, I downloaded an AVI file and it plays fine on the computer but when I changed it to DVD the sound is too high and the people sound like chipmonks . What can I do or change to record it properly ?

Judy - 17 04 08 - 18:13


So how is out of sync sound to picture corrected in VirtualDub? Can it do the job?

Ianpb - 04 02 09 - 11:45


@Ianpb: Yes, go to Audio > Interleaving > Audio skew correction.

Martin - 10 07 09 - 01:02


Hi! I am a bit of a rookie with digital video manipulation, so, I hope this is not too dummy...
I have an AVI whose video stream has a non-zero start value and VirtualDub warns me of that. I know that VirtualDub does not support non-zero start values but, is there a way to fix this? Is there a tutorial or something like that on the web to show me the way to fix this? Thanks a lot!

Fred Mach - 11 08 09 - 00:07


Hello. I also am looking for a way to fix stream "non-zero start position" problems. Thanks in advance.

twipley - 18 08 10 - 01:37


@http://www.virtualdub.org/blog/pivot/ent..
Avery Lee (Phaeron) wrote that "the dwInitialFrames field, [in direct relation with non-zero start positions], can pretty much be ignored at this point as whatever purpose it originally had has rotted away."

twipley - 16 09 10 - 03:41


Hmm, sorry... I seem to have been mistaken, there.

Phaeron actually said something to a completely other effect.

twipley - 17 09 10 - 00:27


in reply to 'Fred Mach' & 'twipley', re: non-zero start position msg in virtualdub. had same issue here.

edited line 80: just after the - needs to be 00 (mine read 01) think vdub reads it as 0 anyway. my AVI is fine now.

open vdub>tools>hex editor

line 80: ?? ?? ?? ?? ?? ?? ?? ??-00 00 00 00 ?? ?? ?? ??

worked for me anyhow, peace

jonny_e - 09 10 10 - 10:00


Thanks Phaeron for this information!
One question, if writing a parser/decoder... can one assume that sample size (dwSampleSize) will be zero if it was encoded in VBR and non-zero if not? It sounds like for each audio chunk I can simply round up to one.

I'm testing a VBR file which has 1152 for the block align, and AverageBytesPerSec 16000. Given what chunk I'm on... for all other formats I can determine the time elapsed by doing something like so:
SampleStart * BlockAlign / AvgBytesPerSec Where SampleStart is how many samples have passed by per chunk. If I plug in these with a VBR and I passed 2 chunks... 2 * 1152 / 16000 = 0.144. That is not the correct time it would be something like (2 * 1152) / 44100 ~= 0.052...
Unfortnately I cannot use that equation for non VBR types (SampleCount * BlockAlign / SampleRate). If I can assume the dwSampleSize will be zero for VBR then I could switch equations. It would be nice if there was one which could work with both though.

James Killian (link) - 19 10 10 - 04:14


@James Killian:
dwSampleSize is *ignored* for audio streams. The solution you are looking for is actually much simpler: nBlockAlign is the size of a sample, and when determining how many samples are in a chunk, just round up.

Phaeron - 19 10 10 - 15:08


Thanks for quick response, I get how to convert how many samples are in a chunk, but then given the sample count, how would you determine how much time has elapsed? Using the example in the previous entry, 1 VBR sample is the equivelant of 1152/44100 samples per second. Where as one sample of a non-VBR would be 1/44100 samples per second. Currently from what I have researched, I have not found a clear cut way to distinguish when the AVI is using VBR audio. :( So far, I am assuming that if the BlockAlign size is 960 or greater than it is using VBR where the sample time is block align/ samplerate. If there is a better way to identify when VBR is being used, or a universal equation to compute time... please let me know. Thanks.

James Killian (link) - 20 10 10 - 02:46


Ah, I see where the problem is.

The format of the stream isn't used to determine the sample rate. That's determined by the dwRate/dwScale fraction in the stream header, which gives the stream sample rate in rational form. The reason for the 1152 denominator is because at sampling rates of 32KHz or higher, each MPEG layer III frame produces 1152 output samples. Of course, since this is a fraction, it could also have been encoded as 1225/32.

What's important to realize is that while the dwRate/dwScale fraction does need to be a specific value, the nBlockAlign value *doesn't*. The block alignment only has to be at least as large as the biggest frame, to avoid having one frame interpreted as more than one sample; 1152 is frequently used but nBlockAlign actually has no relation to output samples/frame as dwScale does. Therefore, changing nBlockAlign in your example to 1000, 2000, etc. wouldn't make a difference in timing. It can be much lower than 960 for VBR MP3 streams; for instance, a 128kbps stream at 44KHz can go as low as 418 bytes (ceil(144000 * 128 / 44100)). That also assumes that only one frame is included per sample. It is possible to use a lower value to pack multiple CBR/ABR frames into a single sample to reduce index overhead while still retaining some degree of frame-level precision in editing and seeking.

To put it in formula terms:

chunk_sample_count = ceil(chunk_byte_count / format.nBlockAlign);
stream_sample_rate = header.dwRate / header.dwScale;
sample_start_time_in_seconds = stream_sample_rate * (sample_number_from_zero + header.dwStart);

Note that nowhere is there VBR detection or any MP3 specific code; remember that the Microsoft DirectShow parser doesn't actually support such a feature, but that this was discovered to be a side effect of the parser implementation. Video streams use almost the same formulas except that header.dwSampleSize is used instead of format.nBlockAlign.

Phaeron - 20 10 10 - 06:51


Ah ha! It didn't occur to me to use the header rate and scale... that was what I needed to hear. Thanks :)

James Killian (link) - 21 10 10 - 06:39


Hi,
Is the 11 bit sync bytes of MPEG-1 Layer-2 Audio frame header is fixed...?
Actually i want to decode an audio (MPEG-1 Layer 2 audio) which is coming live..
I got an example code from ffmpeg-2.0 .. But it is able to decode only standalone MPEG-1 Layer 2 file.
I modified it for live streaming it shows error "Header Missing"..
Please guide me..

Jeetendra - 04 10 13 - 22:40

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.