Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Does Hyperthreading Technology speed up VirtualDub?

"VirtualDub currently isn't multithreaded and only takes 50% of my hyperthreaded CPU. Would it run twice as fast if it were?"

No, and the premise is incorrect anyway.

VirtualDub is multithreaded -- if it weren't, the UI would lock every time it rendered a frame. Rendering operations use three threads: UI, I/O, and processing. Preview operations also create a fourth thread for timing blits. The reason for low CPU utilization on second and subsequent logical CPUs during a render is that all video operations are serialized in the processing thread; audio operations take place on the I/O thread however so if you have audio filtering or compression those will execute in parallel. To take full advantage of dual CPUs you have to balance operations across multiple threads and make sure you're not just wasting time ping-ponging data between the CPUs. VirtualDub isn't written to do this currently.

But what about Pentium 4 CPUs with Hyperthreading Technology? Everyone has one, so VirtualDub should be tuned for dual CPUs soon, right?

Well, not quite.

In a traditional symmetric multiprocessing (SMP) system, you have two separate CPUs. Each has its own set of caches, execution resources, and buses. As long as you don't have to transfer too much data between the CPUs, and the CPUs don't fight too much for the bus, and you can keep both CPUs busy, then you can get twice as much performance as a single CPU system. This is easiest when the threads are chewing through workloads that require a lot of processing over little data; it's harder in the opposite case, where the CPUs are using memory so heavily that they begin to contend for memory bandwidth.

In a hyperthreaded CPU, however, the situation is different. Here you only have one set of caches, one set of execution resources, and one bus, but you have two logical CPUs contending for those, each running its own different thread. Unlike the SMP case, in an HT CPU one of the threads can take most or all of the resources that the other thread doesn't use, including everything if there is only one thread to run. While Windows Task Manager might report that only 50% of your CPU is active, that could -- and often does -- represent one of the two logical CPUs taking 90%+ of the execution resources. Only 10% more execution power is available for another thread, and beyond that the second thread starts slowing down the first thread as it fights for resources. Hyperthreading is mainly useful for filling in the holes in one thread with another -- that is, while one thread is executing sparsely and not taking advantage of many CPU resources, another thread can sneak in and keep the execution units busy, and get another 5-15% of performance out of the CPU.

A major issue with hyperthreading is that there is a design flaw in the Northwood-core Pentium 4s that can cause the two logical CPUs to seriously interfere with each other, called 64K aliasing. Basically, incomplete tag bit encoding in the L1 cache means that two data blocks that are a multiple of 64K apart can boot each other out of the cache, making the cache's associativity useless and greatly reducing its effectiveness. Even worse, two prefetched streams that alias on top of each other can boot each other's prefetches out of the L1 cache, wasting a lot of bandwidth, and Windows allocates virtual memory on 64K boundaries, making 64K aliasing likely to happen for thread stacks. This means that Hyperthreading Technology can also get you a net loss in performance if threads execute in non-HT-friendly ways, even if they execute well on an SMP system. The 64K aliasing flaw was supposedly fixed in the new Prescott core, but I haven't heard whether it improves hyperthreading performance.

So what is Hyperthreading Technology really good for?

First of all, it improves system response similiarly to the way that SMP does; the kernel can respond faster to interrupts and program UI can react immediately while a processing thread still cranks away. If you have a program that is attempting to consume 100% of the CPU in the background, your web browser will respond much more snappily on an HT system than on a single-CPU system. Second, HT systems expose the same kinds of multithreading bugs as SMP systems. As more programmers get HT-capable systems, expect more threading bugs in programs to be resolved, and the overall stability of programs to rise, especially on SMP systems, where traditionally a lot of drivers and programs have simply crashed.

When it comes to VirtualDub, I've had enough trouble in performance-critical code with execution bandwidth on a single thread. The problem is that the Pentium 4 can only issue MMX operations on one execution port, meaning that it tends to get very badly bottlenecked when executing optimized MMX code. The situation improves considerably when SSE2 is used, because two-cycle, 128-bit operations can be issued in only one clock, and thus it is possible to keep both the multiplier and the add/shift units running in parallel -- but this can be difficult with only eight registers and with the P4's long latencies.

Overall, if you have a choice between running VirtualDub on a system with hyperthreading and on the same system with HT disabled, I would say to leave HT enabled, because if nothing else the system will run more smoothly.

Comments

Comments posted:


However on a 2-processor system, the rendering pipe avisynth - vdub - xvid/mp2 often fully utilizes both CPUs. So where's the multiprocessing done? With one CPU it's almost twice the rendering time, so it's not a fight for semas.

AK - 11 10 04 - 13:45


Avisynth is an AVIFile driver, so it executes in the I/O thread like VirtualDub audio processing, and thus in parallel to VirtualDub's processing thread. This has the disadvantage that the I/O thread is now doing major non-I/O operations, but this is seldom a concern as you are rarely I/O bottlenecked with a moderately complex Avisynth script.

phaeron - 11 10 04 - 23:25


>>-- but this can be difficult with only eight registers and with the P4's long latencies.

Does this imply that an AMD 64-bit processor would be able to utilize HyperThreading better due to more registers and a shorter pipeline? (Assuming AMD was to add this ability)

Cyberia - 12 10 04 - 19:53


>Does this imply that an AMD 64-bit processor would be able to utilize HyperThreading[...]

On the contrary. The much shorter pipeline on AMD will make Hyperthreading pretty pointless. Memory latency of course still exists on the K8, but the massive penalty of a branch misprediction on the P4 is not as bad on the K8. The K8 is also better at pairing MMX/SSE operations.

So in essense there isn’t as much “un-used” CPU power on a K8, where the CPU is stalled for various reasons, thus leaving much less over for a secondary thread. As you also might have noticed in recent news AMD is opting Dual-core instead of Hyperthreading.

sh0dan (link) - 14 10 04 - 05:32

Comment form