¶How to make a resampler that doesn't suck
Resizing an image to a different size requires a basic image processing technology called a resampler. This is one of the elementary operations in any image processing toolkit. Yet, I've seen many, many cases where people get resampling algorithms subtly wrong, slightly wrong, or even blatantly wrong. Some of these were even in supposedly professional image processing applications! It isn't that hard to create a quality resampler. It does take some careful thought, and a little ingenuity to implement, but doesn't cost a lot of performance and makes the resampler act a lot more consistently.
I will freely confess to having committed violations of all the rules I'm about to outline in various older versions of VirtualDub, where sometimes you would see a line of pink pixels on the right, or the image is a tiny bit larger or smaller than it's supposed to be, etc. All of these should be worked out in modern versions and the resize filter should obey all of the rules. If you think there is a violation, feel free to ask about it.
The resampler takes a source image of size (sw, sh) and converts it to a destination image of size (dw, dh), where sw/sh/dw/dh are in pixels.
The coordinate system for the source image is from (0,0)-(sw,sh), and for the destination image it is (0,0)-(dw,dh). The right and bottom edges of each rectangle are not included in the image (this is a top-left convention).
Each pixel occupies a square from (x,y)-(x+1, y+1), where x and y are integers. Again, the right and bottom borders are not included. The pixel center is located at (x+0.5, y+0.5) and indicates the location where the pixel's color is authoritative. For the destination image, the color at this point represents the color for the whole pixel.
The image is processed via a reverse mapping that correlates each point in the destination image to a single point in the source image. The destination image is produced by iterating over every pixel in the destination, converting the location of each destination pixel's center to a source location, and taking the color of the source image at that point. This results in exact coverage of the destination image, with no overdraw or dropouts.
Point sampling is the simplest resampling algorithm with the lowest quality. It simply chooses the color of the source pixel whose center is closest to the desired source point.
1. Resampling the image by N% shall scale image features by N%.
If I scale my portrait by 2x, I want my face to be twice the size and my nose to be twice the size. Not 1.97x, not 2.03x, exactly 2x. This means that a span of M pixels in the source must correspond to M*N% pixels in the destination. So, if my nose is 20 pixels wide originally, I expect it to be 40 pixels wide afterward.
In a point-sampled resampler, where pixels can only be duplicated or removed, it isn't possible to do this exactly at every point in the source or output image. The source pixels chosen, though, should still be as close as possible to the ideal location, and on average the image features should land in the right spots.
Violating this rule causes chains of resample operations to be inconsistent. If you do a 2x enlargement three times, you would expect the result to correspond to a single 8x enlargement. If the resampler gives you 1.97x in features when the frame doubles in size, though, then three consecutive 2x operations would give you 1.97^3 = 7.65x, whereas a single 8x operation would give 7.88x.
2. There shall be no image shift.
This is desirable for similar reasons to rule (1). If you enlarge an image by 200% and then reduce it by 50%, you'd expect that you'd get the same size image, just with some possible degredation from the resampling. It shouldn't be shifted overall three pixels to the left.
This rule is simple to devise and tricky to get right. You can reduce it to either matching the centers or matching the corners. A continuous reverse mapping from destination to source that satisfies this criterion might look like this:
dst(x, y) = src(x * (sw/dw), y * (sh/dh))
The catch is that this is a continuous mapping. You want to sample with (x,y) being on half-pixel coordinates, so that the first pixel is (0.5, 0.5), the second (1.5, 0.5), etc. This places the destination points exactly on pixel centers. For point sampling, the source pixel is then chosen by flooring the coordinates to the nearest equal or lower integer. If you use integer coordinates for this instead, you will get a half-pixel shift. This can be fixed by mapping the corners instead, which results in ((sw-1)/(dw-1)) and ((sh-1)/(dh-1)) for the ratios, but that then violates rule (1).
When point sampling, an integer enlarging factor should result in a regular pattern throughout the image. For instance, 300% enlargement will make each source pixel into a 3x3 block. There should be no runt columns or rows of 1 or 2 pixels on the borders.
A good stress test for a resampler is to do a big series of forward and inverse transforms, such as 200% followed by 50% ten times. Not only does this expose subtle subpixel shifts in a resampler, but it also shows how good the resampling filter is.
3. When doing an identity transform, the image shall remain exactly the same.
This one is common sense. If I ask for a 320x240 image to be resampled to 320x240, there shouldn't be any change. If you got rules (1) and (2) right, this should easily follow.
What isn't as sensical is that this applies for a single axis as well. That is, resampling a 320xN image to 320xM should result in only a vertical resampling — there should be no crosstalk between columns, which should be entirely independent in their processing. This follows because the mapping equations treat the horizontal and vertical axes independently, so a change in one doesn't affect the other. Most resampling filters are separable and thus implemented as separate row and column passes, which automatically guarantees this property.
It's easy to get this right with simple point sampling, but it's often broken when filtering is involved. An interpolation filter should return exactly one pixel's value when asked to sample exactly on top of a source pixel center; it shouldn't blend in any other adjacent pixels. Otherwise, you'll get a subpixel shift in the image, which violates rule (2).
Sometimes it is advantageous to choose a filter that applies mild blurring in order to reduce aliasing artifacts. In this case, the result won't be exactly the same, but at least it should be unbiased. One way to check is to flip the image on input and output and look for differences in the result.
4. When stretching an image with image filtering, border pixels shall sample from outside the source image.
Bilinear interpolation improves the quality of a resampler by doing crude linear interpolation between the 2x2 block of pixels closest to the desired source point. The closer the source point is to one of the pixels, the more that pixel contributes to the output, and if the source point is exactly on top of the pixel, the result is just that pixel. For pixels A-D in book order within the 2x2 block, and fractional offsets x and y in range 0-1 from A, the result is lerp(lerp(A, B, x), lerp(C, D, x), y), where lerp(E, F, r) = E*(1-r) + F*r = E + (F-E)*r.
If you stretch an image in a way consistent with the above rules, some of the sample points will fall on the outer edges of the border pixels of the image. The source sampling point will never fall outside of the source image, but they'll get closer than 1/2 pixel on the border. Problem is, if you're filtering, the filter window requires pixels outside of the image, even for the itty-bitty 2x2 bilinear window. Forgetting this results in junk in the image or a crash in the resampler. All too often, I see people fix this by just shrinking the source bounds until the problem goes away. Don't do this! It not only breaks rule (1), but it also requires an adjustment that depends on the size of the filter kernel, which makes no sense.
The way you solve this is by introducing a rule that defines the source pixels that fall outside the source rectangle. Some useful rules are:
- Clamp: Choose the nearest pixel along the border. This takes the border pixels and extends them out to infinity. I use this one, because it's the fastest rule.
- Mirror: Take the source point and reflect it across the border. This extends out the entire image by alternately flipping it out to infinity. This one avoids streaking, but can still look odd with a "bounce" effect at the border. One bug that I often see when this rule is implemented is only mirroring the source point coordinate once. If the filter kernel is large or the scaling factor is very high, the coordinate may be sufficiently far out that mirroring it across one border results in it still being out of bounds past the opposite one. Bouncing it back and forth across both borders eventually puts it in-bounds.
- Wrap/tile: Pretend the source image is tiled infinitely and wrap the coordinates to the opposite side when a border is crossed. This rule isn't very useful unless the source image is naturally repeating, such as a texture that's meant to be tiled. Wrapping the source coordinates when resampling such an image, however, prevents seams from developing when the resampled image is tiled.
The size of the border that requires this handling is half a source pixel. A 300% enlargement results a 1.5 pixel wide clamped/wrapped/mirrored border in the output. This means that for most usual factors, the border is hardly noticeable.
I test resamplers for mistakes in border handling by creating a 2x2 black-and-white checkerboard image and then stretching it to 1000x1000. One popular image editing program I tried this on gave me a giant green blotch in the output, almost certainly the result of reading memory outside of the source image. Oops.
5. Resampling a solid color should give a solid color.
If I stretch or shrink a solid red image, it should remain solid red. Doesn't matter if I pick point sampling, bilinear, bicubic, Lanczos3, or 256-tap windowed sinc. Obviously some concessions can be made for limited computing precision under extreme conditions, but in general, the smaller the difference, the better, and ideally it is zero.
What this means is that any resampling filter used should have all weights in its kernel sum to exactly one. This is called unity gain. If it doesn't, the sum is multiplied into all colors in the output. It also means that if there are weights smaller than zero or larger than 1, that intermediate results can't be clamped to 0-1, only the final result, or else artifacts will show up in the image where the clamping occurs. Even worse, these artifacts will be position-dependent.
The rules I've outlined above are consistent with OpenGL texture mapping:
glTexCoord2f(0, 0); glVertex2f(0, 0);
glTexCoord2f(0, sh); glVertex2f(0, dh);
glTexCoord2f(sw, sh); glVertex2f(dw, dh);
glTexCoord2f(sw, 0); glVertex2f(dw, 0);
Those of you with 3D experience may realize at this point that there is no dependence on integer coordinates in the above code. Indeed, if you have constructed the destination-to-source mapping correctly and are careful with fill conventions, there is no reason you cannot resample a 320.4x760.5 region to 480.6x360.2. VirtualDub's internal resampler allows this, and this is how the resize filter supports fractional target sizes.
In a similar vein, the 3D analogy also implies that the above rules also apply to rotation engines as well as resamplers; they shouldn't shift or distort either. You can test this case with VirtualDub's rotate2 video filter.