Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Pipelines and performance

VirtualDub, although not really designed for multi-CPU systems, is moderately multithreaded. On an average render it will use about 4-5 threads for UI, reading from disk, processing, and writing to disk. Trying to keep these threads busy is a challenge and to do so VirtualDub's rendering engine is pipelined -- all of the stages attempt to work in parallel with queues between them. The idea is that you add enough buffering between the different threads that they are all working on different places in the output and the stages only block on the single bottleneck within the system, which is usually either disk (I/O) bandwidth or CPU power.

In VirtualDub, pipelining parameters are set under Options > Performance in the editing mode; in capture mode, you can adjust disk buffering parameters in Capture > Disk I/O. But what do the parameters mean, and how should they be set?

First, a classic laundry analogy: think of pipeline stages in terms of your washer and dryer. The fact that your washer and dryer are separate instead of being one washer-dryer amalgam is that you can get more done by keeping both busy: while one load is drying, you can start the next one in the washer. However, there are some gotchas to this:

Okay, what does this have to do with anything? I ain't washin' my video!

The first two points are about bottlenecks and are very important. If you are doing a high-bandwidth operation with Huffyuv or uncompressed video and only a light amount of processing, you are almost certainly going to be disk bound. This means that your CPU utilization is actually going to drop well below 100% because reading and writing from the disk is the bottleneck. At this point you largely don't care about MMX/SSE2 optimized code or other kinds of CPU-based performance optimizations, because the only thing they'll do is make your CPU run cooler. Conversely, if you're using a ton of video filters and your render is running at 1 fps, it probably isn't so bad to have your video files stored over the network, because the network will keep up just fine.

The third point is a bit stretched, but has to do with insufficient pipelining and the fact that the pipeline stages process discrete blocks, not a continuous stream. If you don't have enough buffering between two stages then the pipeline will flood when the first stage writes to it, causing the first stage to stall, and then empty when the second stage pulls data from it, causing the second stage to stall. When this happens you essentially are locking stages together (serialization) and losing the benefits of pipelining. This happens most often in VirtualDub when doing very fast Direct Stream Copy operations, where the render engine can process thousands of frames per second on a highly-compressed file because it's doing nothing but copying them into the write buffer. In this case having only 32 buffers is likely to be a bottleneck because the render engine is going to exhaust the pipeline buffer very quickly and then stall while waiting for the disk to bring in the next batch, so you'll want more buffers. Another way you get stalls like this is if you're filtering compressed video with frame rate decimation on, because the engine will decompress a bunch of frames in a burst and then sit down for a while to do some image processing. However, if you're processing 640x480 uncompressed video with some filters you really don't want 256 pipeline buffers because neither the disk nor the render engine can process frames that fast and you'll potentially waste a ton of RAM, as much as hundreds of megabytes. Using so many pipeline buffers that you swap is really counterproductive.

The fourth point has to do with block size. You want to process data in big chunks to minimize the management and shuffling overhead. The primary culprit in this department is I/O. Windows likes to do I/O in relatively small chunks (4K-64K), so VirtualDub has quite a bit of logic to defeat buffered I/O and force the kernel to do larger blocks. Between blocks the hard disk may have to seek somewhere else and that can cost 20-50ms, so you want to keep the blocks large. I like to aim for no more than about 10 blocks per second when tuning capture I/O, so when I capture uncompressed or Huffyuv I'll typically use a 1MB block size.

The fifth point is even more contrived but has to do with too large of a block -- specifically, I/O blocks. There are diminishing returns to larger and larger block sizes and there are latency problems as well. 1MB is likely to work just as well as 2MB, and gives the system more flexibility in scheduling. If you configure VirtualDub to do 8MB block writes to disk and then try to open Internet Explorer you're going to be waiting a while because every time IE wants to read a 4K file off disk it has to wait for 8MB of data to go through! The long latency associated with waiting for a large disk write also has implications for buffering efficiency, too. A disk buffer that is configured as 8 x 1MB can buffer more effectively than one split as 2 x 4MB, even though they're both 8MB, because with the 2 x 4MB buffer the front end can't put anything into the buffer until an entire 4MB block is finished, whereas with the 8 x 1MB buffer the front end can write and do something else as soon as 1MB has gone through. What this means is that there's a sweet spot for disk block size -- bigger isn't always better.

So, what signs should you look out for?

Highly fluctuating frame rate during a render may be a sign of insufficient buffering. Figure out whether you're disk or I/O bound, and then examine Task Manager or the HDD light. If you're CPU bound, you want that sucker at 100% CPU; if you're disk bound, you want that light always on and for the HDD to not make too much noise seeking. The real-time profiler will also help diagnose this, by showing if the threads are serializing against each other in chunks at a time rather than running smoothly in parallel. Generally, though, the disk and memory buffer defaults are fine; one of the queues is usually empty and the other full, and the buffers are large enough to cover any momentary bumps. As noted above, though, there are exceptions; in the direct stream case you should consider using more memory buffers, and if you're working with really big video frames, like 640x480 uncompressed, consider a larger file write buffer.

If you're dropping frames during capture when CPU utilization is low and the bandwidth isn't that high, you may be using too large of a disk I/O block size, such that the huge writes are disruptive. This is unlikely to happen with the defaults, but might if you're capturing to a slow device or to a highly fragmented drive, in which case you should consider lowering the block size. In VirtualDub 1.6.1 or higher, you can use the real-time profiler to detect this case, as it will show up as the audio or video channels blocking for long periods of time on a write. Note that there is currently a problem in VirtualDub's AVI write routines in that it periodically extends the file ahead of the current write position; this greatly reduces seeks to the directory and free space bitmap on disk, but unfortunately I recently found out that file extension is a synchronous operation even if overlapped I/O is used. (1.6 uses overlapped I/O under NT to pack disk requests back-to-back.) 1.6.3 does file extension in a separate thread and shouldn't show the video/audio channels blocking on writes unless the disk hitches for a moment and backs up the buffer.

If your render operation is projected to take 14 hours and its speed is better measured in SPF than FPS, just leave the buffering settings alone. Tweaking them isn't likely to speed anything up.

Comments

Comments posted:


Since Direct Stream Copy mode might benefit from more memory buffers than other processing modes, maybe VirtualDub should:

A. Automatically boost the memory buffers in Direct Stream Copy mode (e.g. multiplying the value by some (possibly configurable) factor)

B. Provide different settings for Direct Stream Copy mode from other modes.

James - 19 12 04 - 17:28


I agree with the options idea. Set up two slide bars. Once for the DSC mode and the other for regular work.

UA=42 (link) - 19 12 04 - 23:14


Why not allow VirtualDub to set these parameters itself on the fly? Seems like it could tune itself for best performance in any situation.

Cyberia - 20 12 04 - 19:58


I thought about two sliders, but you can get into wildly different situations even with Direct/Direct operations, depending on the size of the video frames involved.

Autocalibration would indeed be the best solution, but I'm not sure how the feedback mechanism would work. The clearest sign of insufficient buffering is encountering both full and empty conditions; however, this can also be caused by heavy swapping that causes the threads to run erratically. This would make it possible for the feedback mechanism to increase the number of buffers further and make the swapping worse, and thus spin out of control until the program or the system crashed. I'd need to find a feedback mechanism that doesn't suffer from such problems.

Phaeron - 21 12 04 - 00:47


Well, could you start small (small blocks, few buffers) and start increasing them until it starts to be counterproductive?

Cyberia - 21 12 04 - 20:01


I usually use VirtualDub to process XdiV/DivX video, with some filtering (like adding subs). My aim is to obtain the better quality video that is possible so i use multipass encode mode and the ‘slow encode’ performance setting to achieve the best quality possible. In my computer, During the ‘Multipass Nth pass’ face, witch is the more time consuming one, the use of the CPU is always about 50%, so I was thinking that the speed could be improved by using a faster disk, because I thanked that the bottleneck wasn't the CPU in that case. Now I have the opportunity to test that: I have a second machine, identical in CPU (3GHz/800MHz P4 w/HT), motherboard and memory (1Gb DDR2/533MHz split in 2 memory modules configured in dual channel mode) to the first, but with a SATA RAID-0 configuration with 2 SATA 150 disks. Surprisingly (to me) encoding the same video, the speed is almost the same in both computers and in both the CPU utilization is about 50%. I also go to the Options/performance and increase all the items to the maximum in both computers, with the same results. Does anybody know why the CPU utilization is the same in both computers, and there is no speed increase with the RAID-0 configuration???
Thank you in advance,
Juancho

Juancho - 26 04 05 - 11:30


Since you're running an HT enabled P4, 50% processor utilization in Task Manager is actually 100% of ONE (of the two) logical CPU(s). Disk access is not your bottleneck, CPU power is... It's just that the codec you are using is single-threaded, so it can't take advantage of the two logical CPUs. 50% is the maximum CPU utilization that you will ever see in that instance.

I've found that using AVISynth to frameserve into VirtualDub, and splitting my filters between both apps, works very well for eliminating some of the CPU bottleneck. I can usually get an HT P4 up to around 65-70% utilization (using about one-and-a-half of the two logical processors) that way.

Hooz - 29 06 05 - 13:20

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.