Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
 
Other projects
   Altirra

Archives

Blog Archive

The "center cut" algorithm

Semi-recent versions of VirtualDub have an audio filter known as "center cut." This filter attempts to isolate the central components of the incoming signal and separate them from the side signals. The result is a stereo output with the ambience, and a mono output with the foreground sounds and vocals. To test this filter in VirtualDub, enable advanced audio filtering, then add input/center cut/output/discard filters to the filter graph, in that order (or swap discard and output).

Someone asked me by email if I could describe this algorithm, and I thought it would be blog-worthy.

Disclaimer: I'm not an audio researcher and am not familiar with the audio literature, so excuse me if I use the wrong terms or fail to acknowledge past work, as I am not familiar with existing advanced algorithms for vocal removal. I came up with this algorithm one day after a discussion about vector projection in lower-division math class, so I didn't do any research before devising the algorithm.

The algorithm

"Center cut" is a separation algorithm that works in frequency domain; it analyzes the phase of audio components of the same frequency on the left and right channels and attempts to determine the approximate center channel. The center channel is then subtracted from the original input to produce the side channels. Thus, one immediate limitation that should be apparent is that center cut requires stereo input. However, unlike the traditional method of vocal separation, taking the difference of left and right channels, the center cut algorithm is able to both produce stereo ambience output and extract the center channel.

The algorithm, as implemented in VirtualDub, is as follows:

Obviously, it isn't necessary to use a Fast Hartley Transform to do this; a Fast Fourier Transform (FFT) would do equally well as the results of a FHT and a real FFT can be easily exchanged. Also, the vector normalizations and the quadratic solve must be guarded against degenerate cases. As one or both source vectors shrink, the problem becomes increasingly ill-conditioned and the derived phase of the center vector becomes erratic; fortunately in this case the magnitude of the center vector also shrinks and the phase stability matters less. Finally, the components at DC and Nyquist rate only have a real component, so they are simply set to zero for the center channel.

The algorithm can be optimized to avoid explicit normalization, but care must be taken with regard to accuracy when doing so. In single-precision the alpha value from the quadratic solve already has marginal significand precision (~12 bits) and moving operations around can affect the output. IIRC, when I tried to do so, I got greater leakage from the center channel into the side channels. Double-precision would probably fare much better.

Opportunities for improvement

I'm sure a better overlap-and-add setup could be used.

The center cut algorithm whacks the phase of the left and right channels, so it has a tendency to move them apart in time and cause echoing effects in the side channels. This phenomenon becomes worse as the FHT window is increased, which is unfortunate as increasing the window size improves the quality of separation.

Overall imbalances in volume between the incoming left and right channels result in center leakage into the louder channel. It may be possible to add an adaptive normalizer into the algorithm to fix this.

Comments

This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.