I finally got a little bit of time to took at my live capture audio playback problems again, and I think I have a better idea of what's going on now.
I hacked VirtualDub to display both the buffer status and the tracking rate on the DirectSound Renderer while I played a bit, and what I found was that the renderer was stepping the playback rate up and down at 25Hz at a time, converging on around 49750Hz for the DirectSound playback rate. That's fine, since we have both playback and rendering clocks that are likely off a little bit. What was more disturbing was the buffering level, which had both a random initial level and a very slow upwards climb towards 90%, regardless of where it started. Furthermore, even after it had hit 90%, the renderer still did not change the playback rate, which meant that it was most likely dropping samples periodically at that point in order to keep the buffer level in check.
After starting at the IAMAudioRendererStats entry points and tracing through some portions of the DirectSound Renderer code, I think I have a good idea of how the rate matching algorithm works. The main function is CWaveSlave::AdjustSlaveClock(), which monitors both the master clock (the reference clock set on the renderer) and the slave clock (the DirectSound playback position). The routine monitors the difference between the two and adjusts the playback rate up and down slowly to compensate. I don't know how low it can go, but for a 48KHz sample rate, it clamps out at 48600Hz at the high end. Within these bounds, the renderer matches the rate of the playback clock against the rate of the reference clock.
This sounds good... until you realize the ramifications. In particular, neither buffer level nor data rate are inputs into this algorithm, only clocks. Therefore, the algorithm is only accurate as long as the incoming data rate matches the reference clock. This is fine for file playback, but not such a great assumption for a live source. In my case, there appears to be a discrepancy such that the incoming data rate is actually slightly higher than 48KHz, and therefore the DirectSound Renderer is slightly matching the wrong rate, causing its buffer to fill over time. The other consequence is that the renderer makes no attempt to target any particular latency as long as it isn't encountering either a buffer underflow or overflow condition, which is disturbing.
As it turns out, I inadvertently discovered an amusing workaround, which is to stop the process for a long period of time in the debugger. Upon resuming, this kicks the rate matching algorithm so far off that it clamps out at 48600Hz and then gradually slides into a constant buffer underflow condition. Although the resulting low latency is rather nice, I'm not particularly fond of this "fix."
It looks like the real solution is going to involve me writing a custom filter, either as a transform filter in front of the DirectSound Renderer or just a custom renderer. I'm not sure which one is easier, but I'm actually leaning toward the custom renderer as the random latency issue is a bit bothersome.(Read more....)