Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Fast noise

A user on my forum asked how to generate video noise, and I was intrigued enough to do a bit of tinkering.

The first issue to deal with is what random number generator to use. The easiest thing to do is just to use the C runtime library rand() function... until you realize how slow it is. The first reason is that it's commonly a linear congruential generator of the form:

x' = (x * A + B) mod 2^N

The multiply already isn't great, especially since it's usually a 32-bit multiply, but what makes this worse is that the seed used by rand() is shared state, which means that on a multithreaded CRT you're also taking a hit for thread-local storage. You can bypass the TLS issue easily enough just by reimplementing rand() locally, but then you run into the problem of how to vectorize this generator. 32-bit multiplication and addition are needed for this generator, and that's a pain in MMX/SSE2/3/4. And a 16-bit generator has way too short of a period.

In these cases, I prefer to use a linear feedback shift register, because they only require very simple logical operations. Here's a 31-bit generator:

x' = (x << 1) + (((x >> 27) ^ (x >> 30)) & 1);

That is, you XOR two bits in the register and shift the result in at one end. With appropriately chosen taps, this gives you a period of 2^N-1 for a register of size N. Just make sure to avoid zero, since the generator will get stuck there. LFSRs are a bit deficient in some properties typically desired for a random number generator, but that's less of an issue for noise generation than for, say, statistical sampling.

There is a problem with LFSRs, though, which is that if you use the straight formulation that is normally given for more than single bit output, you'll end up with a pretty crappy RNG in practice, because you get strings of values that are obviously just doubles or halves of the previous value due to the single bit shift. This looks like a lot of gradients if you're dealing with chunky images and a terrible "magic eye" image if you're filling bitplanes. Fortunately, it's fairly easy to generate more bits at a time, because you just do the generation in parallel:

x' = (x << 16) + (((x >> 12) ^ (x >> 15)) & 0xffff);

The straightforward way to vectorize this with SSE2 would be to run four generators in parallel in a 128-bit vector, and then extract the low 16 bits of each generator. The problem with doing this is that you spend a decent amount of time doing the masking and packing. A better way is to instead split the high and low halves of each generator into different registers, thus making both the extraction and 16 bit shift trivial:

  ;xmm0 = high 16 bits of 8 generators
;xmm1 = low 16 bits of 8 generators
paddw xmm0, xmm0 ;xmm0 = hi << 1
movaps xmm2, xmm1 ;xmm2 = lo
psrlw xmm1, 12 ;xmm1 = lo >> 12
movaps xmm3, xmm0 ;xmm3 = hi << 1
pxor xmm3, xmm1 ;xmm3 = (hi << 1) ^ (lo >> 12)
psllw xmm0, 3 ;xmm0 = hi << 4
psrlw xmm1, 3 ;xmm1 = lo >> 15
pxor xmm1, xmm0 ;xmm1 = (hi << 4) ^ (lo >> 15)
movaps xmm0, xmm2 ;xmm0 = lo
pxor  xmm1, xmm3 ;xmm1 = (hi << 1) ^ (lo >> 12) ^ (hi << 4) ^ (lo >> 15)

This generates eight 16-bit random values in 10 instructions using four registers, two being temporaries. It's also possible to generate 8-bit values from 16 parallel generators in a similar fashion, although then you'd need four registers just for the shift registers. You might wonder why I used a 31-bit generator instead of a 32-bit generator, since that would eliminate one of the shifts. That's because a 32-bit LFSR annoyingly requires four taps to hit maximum period instead of just two.

The last and trickiest part is how to preroll the eight generators so that they're all reading from different parts of the sequence. If they're too close together, they'll form an obvious pattern in the noise image. I'm guessing this may be better known in hardware engineering circles, but I had to figure this one out. Basically, you can model an N-bit LFSR as a multiplication by a NxN modulo-2 matrix:

x' = x * M

Note that in this case, scalar "multiplication" is a logical AND, and "addition" is an XOR.

Since multiplying by M advances the state by one step, multiplying by M twice advances it by two, and since matrix multiplication is associative, you can instead multiply by M^2:

x' = (x * M) = x * (M * M) = x * M^2

From here, you can then compute powers of two of M, and multiply together combinations of those results in order to quickly compute M to any desired power. This basically gives a closed-form solution to directly compute the state of the LFSR at any position, without having to iteratively step it up to 2^31 times. Not only does this make it easy to preroll the eight generators to adjacent portions of the sequence, but it also means that the noise generator can be made fully deterministic and support arbitrary seeking and decoding order.

You might have noticed that I used post-multiplication as the convention above. The reason is that this convention is often used along with a row-major storage ordering in memory, which is advantageous from a computation perspective as it turns vector-matrix multiplication into a series of parallel multiplies and adds instead of dot products. This is true as well for the binary matrices here, and in fact my vector-matrix multiply routine looks like this:

uint32 Mult(uint32 v, const BinMat32& m) {
uint32 r = 0;
    for(int i=0; i<32; ++i) {
if (v & (1 << i))
    r ^= m.m[i];
  }
    return r;
}

Matrix-matrix multiplication is then just this applied to each row of the left-hand matrix. This isn't so fast that you'd do it in the inner loop, of course, but it's a lot faster than nested 31x31 loops and if done once a frame it's barely going to show a blip on a profile. The base matrix and the power of two matrices are properties of the LFSR and can be precomputed.

I implemented this as a VirtualDub filter, and it does generate noise very quickly. It still doesn't look like natural noise, though, because the spectrum isn't right. I tried the trick of adding groups of samples to approximate a normal distribution with a binomial, and it still doesn't look right due to the spatial spectrum. At that point I lost interest, but I think the next thing to try would have been to generate several octaves of noise and blend them together, which in my experience generates much more interesting noise.

Comments

Comments posted:


You wouldn't happen to have an implementation of this as a pixel shader for MPC-HC rolling around? ;)

(Adding a bit of noise after resizing to fake details and hide banding using shaders would be killer...)

np: Yage - An Odd Question From A Forest Bird (The Woodlands Of Old)

Leak - 11 02 09 - 12:24


I know another good method:

1. get an analog tv tuner
2. disconnect cable
3. ???
4. profit!

Gabest - 12 02 09 - 23:15


You can't do this method in a pixel shader since it's a recursive algorithm. Each batch of pixels is dependent on the PRNG state of the previous batch. Pixel shader algorithms must be pure functions. I suppose you could do the closed-form algorithm, but that'd be an insanely heavy pixel shader. Better to use a texture-based approach instead.

Phaeron - 13 02 09 - 00:29


Here's a method that works in a pixel shader. I don't how well it works though. http://www.rgba.org/articles/sfrand/sfra..

pshufb - 24 02 09 - 03:26


Alas, the technique above is not suitable for a pixel shader. Somehow I associated the float random output with pixel shaders.

pshufb - 24 02 09 - 03:30


That's actually a very old trick for generating floating-point random numbers. I couldn't help but chuckle at that page due to the quote:

"// ripped from VC libraries (disassembling rulez)"

...since Microsoft has shipped source code for the CRT, including rand.c, for quite some time.

Phaeron - 25 02 09 - 01:32


I don't know if this helps, but have you look into Mersenne Twister RNG? It should be pretty straightforward to implement SSE-based machines.
http://www.math.sci.hiroshima-u.ac.jp/~m..

Someone - 04 03 09 - 01:21


You could try a recursive integrator on every noise-pixel. This should give you a temporal 1/f noise ie normal distributed.
On this you may have to apply a separable spatial recursive integrator again to get spatial 1/f noise.

I am curious if such a noise would look natural.

Ray - 11 12 10 - 04:32

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.