Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Implementing cardinal spline bicubic filtering on 3D hardware

I had originally intended to post a massive pair of blog entries about bicubic filtering, but I committed the sin of Exiting Without Saving, so I guess I'll wing it instead.

Bicubic filtering is generally the next step up from bilinear filtering; it's not nearly as big of an improvement as bilinear is from point sampling (a.k.a. no filtering), but it's noticeably better if you are doing enlargement or gradient determination. In terms of implementing it on hardware, all decent 3D graphics hardware supports bilinear filtering, but none of them support bicubic. (Well, ATI's R600 is touted as doing so, but I haven't heard any information about how you'd actually use it, so as far as I'm concerned it doesn't exist.) That means bicubic has to be emulated, and as usual, the cheaper the implementation, the better.

Bicubic filtering, or rather, cardinal spline bicubic filtering, is a 4x4 (16 tap) filter composed of two 1-D 4 tap cubic filters. Well, actually only for interpolation, but that's still a very useful case. The main difficulty in implementing this in a shader is that the filter kernel contains two negative lobes, which corresponds to the outer two taps on the discrete filter. GPU Gems 2 has a way to implement B-spline bicubic via the bilinear filter hardware, but that relies on all of the taps being positive. NVIDIA also has a direct 2D implementation in bicubic.fx in their SDK, but it's absurdedly expensive, at ~50 clocks per pixel -- pretty close to not being real-time.

How to improve it, then?

Generally, if you can help it, you never want to implement a bicubic filter as a direct 2D filter. From a pure operation count, implmenting it as two 1-D horizontal/vertical passes takes half the operations, and it drops further when you realize that one of the passes often operates on fewer pixels than the other. This only works if you are interpolating pixels on an axis-aligned grid, however. That's fine if you're doing a resize; it's not applicable if you're doing something like a rotation. Anyway, the straightforward way of doing this in 3D is to use a filter texture containing the four weights for the four taps, and then four texture samplers each fetching the source taps. The pixel shader then just adds the weighted source pixels, i.e. w0*x0 + w1*x1 + w2*x2 + w3*x3. There are a surprising number of caveats in doing this (many of these apply to software implementations as well):

The worst issue I've found, though, is in the UV positioning. The filter pattern has to restart exactly when the source taps step. If it doesn't, you sometimes get an ugly glitch on screen where the filter is off by almost a full pixel. For compatibility with pixel shader 1.1, I've used wrapping for the filter texture, which means that it relies on extremely high precision from the texture coordinate interpolators -- to give you some idea, 6-bit subpixel positioning with a 512x512 source texture requires 16 bit UV accuracy. I've never been able to get this quite right. You can see this if you enable Direct3D display mode and bicubic filtering in VirtualDub 1.6.19. NVIDIA's bicubic.fx instead clamps the filter texture coordinates and uses frac() in the pixel shader, but that doesn't work on ps1.1 and is slower due to a dependent texture read. I haven't seen that version glitch, but I suspect that it would if a non-power-of-two texture were used (I couldn't test this because FX Composer always uses pow2 textures).

So what's the alternative?

If you look closely at the bicubic filter kernel and how the four tap weights vary, you'll notice interesting behavior with regard to the stepping boundary that's causing the problem. Specifically, the outer two taps (negative) always go to zero at that boundary, so no glitch is possible there; it's only the inner two (positive) taps that are problematic. The trick I came up with is to replace one of the two point-sampled inner taps with a linearly interpolated tap and have the linear and point taps both centered. Given two original inner tap weights (w1, w2), the linearly interpolated tap weight (wi) and point sampled tap weight (wp) can be computed as follows:

t = phase (0 - 0.5)
w1 = wi*(1-t) + wp
w2 = wi*t

wi = w2/t
wp = w2*(1-t)/t - w1

The equations for t in [0.5, 1] can be computed analogously by moving the +wp contribution from w1 to w2, corresponding to when the point-sampled tap hops over a pixel; alternatively, just mirror it as the results are symmetric. There's a little problem with dividing by zero at t=0 or t=1 -- you can solve that either by substituting in the original cubic tap weight expressions for w1 and w2 and determining the limits, or simply fudge them (w2=0 at t=0 and w1=0 at t=1).

Why is this advantageous? Well, for one thing, wi and wp are also the same at t=0 and t=1, meaning that it's also possible to enable linear filtering on the filter texture (although that may introduce precision-related normalization problems in fixed point). More importantly, however, it moves the critical point where the point sampled tap hops from t=0/1 to t=0.5. At that point w1==w2 and if you run that through the above equations, you'll find that the contribution from that tap drops to zero, meaning that a glitch there is insignificant. The result is much less sensitivity to the precision of the texture interpolators, and thus a much more robust solution.

I'll end with a few teasers. One, there are a number of patterns in the cubic filter equations that I haven't exploited, one being that the sum of the outer two weights is At(1-t) and the sum of the inner two is (1 - At(1-t)). Two, it's possible to implement 1D cardinal spline cubic filtering on a GeForce 3/4 in a single pass, although not with great accuracy. The above solution doesn't work because you need five samplers, one for the filter kernel and four for the source taps. I have been able to get it down to four samplers, however, using three bump mapped, luma corrected fetches (tex + texbeml + texbeml + texbeml).  Three, it's possible to accelerate the direct 2D solution, although in a totally different manner: XORing a checkerboard into the source texture. My current implementation, which I use for displacement mapping, requires 9 texture fetches instead of the regular 18.

Comments

Comments posted:


More or less unrelated, but this post is also recent and includes the words 'cardinal spline'! http://erikloyer.com/index.php/blog/flex..

nine - 18 08 07 - 00:36

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.