v1.10.4 (stable)

§ ¶Fast noise

A user on my forum asked how to generate video noise, and I was intrigued enough to do a bit of tinkering.

The first issue to deal with is what random number generator to use. The easiest thing to do is just to use the C runtime library rand() function... until you realize how slow it is. The first reason is that it's commonly a linear congruential generator of the form:

x' = (x * A + B) mod 2^N

The multiply already isn't great, especially since it's usually a 32-bit multiply, but what makes this worse is that the seed used by rand() is shared state, which means that on a multithreaded CRT you're also taking a hit for thread-local storage. You can bypass the TLS issue easily enough just by reimplementing rand() locally, but then you run into the problem of how to vectorize this generator. 32-bit multiplication and addition are needed for this generator, and that's a pain in MMX/SSE2/3/4. And a 16-bit generator has way too short of a period.

In these cases, I prefer to use a linear feedback shift register, because they only require very simple logical operations. Here's a 31-bit generator:

x' = (x << 1) + (((x >> 27) ^ (x >> 30)) & 1);

That is, you XOR two bits in the register and shift the result in at one end. With appropriately chosen taps, this gives you a period of 2^N-1 for a register of size N. Just make sure to avoid zero, since the generator will get stuck there. LFSRs are a bit deficient in some properties typically desired for a random number generator, but that's less of an issue for noise generation than for, say, statistical sampling.

There is a problem with LFSRs, though, which is that if you use the straight formulation that is normally given for more than single bit output, you'll end up with a pretty crappy RNG in practice, because you get strings of values that are obviously just doubles or halves of the previous value due to the single bit shift. This looks like a lot of gradients if you're dealing with chunky images and a terrible "magic eye" image if you're filling bitplanes. Fortunately, it's fairly easy to generate more bits at a time, because you just do the generation in parallel:

x' = (x << 16) + (((x >> 12) ^ (x >> 15)) & 0xffff);

The straightforward way to vectorize this with SSE2 would be to run four generators in parallel in a 128-bit vector, and then extract the low 16 bits of each generator. The problem with doing this is that you spend a decent amount of time doing the masking and packing. A better way is to instead split the high and low halves of each generator into different registers, thus making both the extraction and 16 bit shift trivial:

`  ;xmm0 = high 16 bits of 8 generators  ;xmm1 = low 16 bits of 8 generators  paddw   xmm0, xmm0    ;xmm0 = hi << 1  movaps  xmm2, xmm1    ;xmm2 = lo  psrlw   xmm1, 12      ;xmm1 = lo >> 12  movaps  xmm3, xmm0    ;xmm3 = hi << 1  pxor    xmm3, xmm1    ;xmm3 = (hi << 1) ^ (lo >> 12)  psllw   xmm0, 3       ;xmm0 = hi << 4  psrlw   xmm1, 3       ;xmm1 = lo >> 15  pxor    xmm1, xmm0    ;xmm1 = (hi << 4) ^ (lo >> 15)  movaps  xmm0, xmm2    ;xmm0 = lo  pxor    xmm1, xmm3    ;xmm1 = (hi << 1) ^ (lo >> 12) ^ (hi << 4) ^ (lo >> 15)`

This generates eight 16-bit random values in 10 instructions using four registers, two being temporaries. It's also possible to generate 8-bit values from 16 parallel generators in a similar fashion, although then you'd need four registers just for the shift registers. You might wonder why I used a 31-bit generator instead of a 32-bit generator, since that would eliminate one of the shifts. That's because a 32-bit LFSR annoyingly requires four taps to hit maximum period instead of just two.

The last and trickiest part is how to preroll the eight generators so that they're all reading from different parts of the sequence. If they're too close together, they'll form an obvious pattern in the noise image. I'm guessing this may be better known in hardware engineering circles, but I had to figure this one out. Basically, you can model an N-bit LFSR as a multiplication by a NxN modulo-2 matrix:

x' = x * M

Note that in this case, scalar "multiplication" is a logical AND, and "addition" is an XOR.

Since multiplying by M advances the state by one step, multiplying by M twice advances it by two, and since matrix multiplication is associative, you can instead multiply by M^2:

x' = (x * M) = x * (M * M) = x * M^2

From here, you can then compute powers of two of M, and multiply together combinations of those results in order to quickly compute M to any desired power. This basically gives a closed-form solution to directly compute the state of the LFSR at any position, without having to iteratively step it up to 2^31 times. Not only does this make it easy to preroll the eight generators to adjacent portions of the sequence, but it also means that the noise generator can be made fully deterministic and support arbitrary seeking and decoding order.

You might have noticed that I used post-multiplication as the convention above. The reason is that this convention is often used along with a row-major storage ordering in memory, which is advantageous from a computation perspective as it turns vector-matrix multiplication into a series of parallel multiplies and adds instead of dot products. This is true as well for the binary matrices here, and in fact my vector-matrix multiply routine looks like this:

`uint32 Mult(uint32 v, const BinMat32& m) {    uint32 r = 0;`
`    for(int i=0; i<32; ++i) {        if (v & (1 << i))            r ^= m.m[i];    }`
`    return r;}`

Matrix-matrix multiplication is then just this applied to each row of the left-hand matrix. This isn't so fast that you'd do it in the inner loop, of course, but it's a lot faster than nested 31x31 loops and if done once a frame it's barely going to show a blip on a profile. The base matrix and the power of two matrices are properties of the LFSR and can be precomputed.

I implemented this as a VirtualDub filter, and it does generate noise very quickly. It still doesn't look like natural noise, though, because the spectrum isn't right. I tried the trick of adding groups of samples to approximate a normal distribution with a binomial, and it still doesn't look right due to the spatial spectrum. At that point I lost interest, but I think the next thing to try would have been to generate several octaves of noise and blend them together, which in my experience generates much more interesting noise.

You wouldn't happen to have an implementation of this as a pixel shader for MPC-HC rolling around? ;)

(Adding a bit of noise after resizing to fake details and hide banding using shaders would be killer...)

np: Yage - An Odd Question From A Forest Bird (The Woodlands Of Old)

Leak - 11 02 09 - 12:24

I know another good method:

1. get an analog tv tuner
2. disconnect cable
3. ???
4. profit!

Gabest - 12 02 09 - 23:15

You can't do this method in a pixel shader since it's a recursive algorithm. Each batch of pixels is dependent on the PRNG state of the previous batch. Pixel shader algorithms must be pure functions. I suppose you could do the closed-form algorithm, but that'd be an insanely heavy pixel shader. Better to use a texture-based approach instead.

Phaeron - 13 02 09 - 00:29

Here's a method that works in a pixel shader. I don't how well it works though. http://www.rgba.org/articles/sfrand/sfra..

pshufb - 24 02 09 - 03:26

Alas, the technique above is not suitable for a pixel shader. Somehow I associated the float random output with pixel shaders.

pshufb - 24 02 09 - 03:30

That's actually a very old trick for generating floating-point random numbers. I couldn't help but chuckle at that page due to the quote:

"// ripped from VC libraries (disassembling rulez)"

...since Microsoft has shipped source code for the CRT, including rand.c, for quite some time.

Phaeron - 25 02 09 - 01:32

I don't know if this helps, but have you look into Mersenne Twister RNG? It should be pretty straightforward to implement SSE-based machines.
http://www.math.sci.hiroshima-u.ac.jp/~m..

Someone - 04 03 09 - 01:21

You could try a recursive integrator on every noise-pixel. This should give you a temporal 1/f noise ie normal distributed.
On this you may have to apply a separable spatial recursive integrator again to get spatial 1/f noise.

I am curious if such a noise would look natural.

Ray - 11 12 10 - 04:32

Comment form

Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
 Name: Remember personal info? Yes No Email (Optional): Your email address is only revealed to the blog owner and is not shown to the public. URL (Optional): Comment: / An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam. Small print: All html tags except and will be removed from your comment. You can make links by just typing the url or mail-address.