¶They called _what_ in the inner loop??

AMD just open sourced the AMD Performance Library as Framewave, which at least from my perspective seems like a good thing. Not that I'm going to attempt to use it, but I perused the source out of curiosity, and it looks like there are some useful goodies in there.

And then there's some... marginal stuff.

One thing that I wanted to look at was their 8x8 2D-IDCT source. The 8x8 2D inverse discrete cosine transform (IDCT) is popular and used in a number of video compression formats. There are a million ways to implement it quickly, and although everyone's seen Intel's AP-922 SSE2 algorithm for it by now, I hadn't seen one by AMD before. So I grab the source and dig around in the JPEG module, and I see this:

int IdctQuant_LS_SSE2(const Fw16s *pSrc, Fw8u *pDst, int dstStp, const Fw16u *pQuantInvTable)
{
...
    pedx = (Fw16s *) fwMalloc(128); //64 array of Fw16s type

Who the #*@&*( calls malloc() in an optimized IDCT routine???

It looks like there are indeed a number of well-optimized SSE2 routines in the Framewave library, but after seeing things like the above a few times I was left scratching my head a bit....

Another uglyness I saw, which isn't restricted to Framewave unfortunately, is assembly language routines that have been translated to intrinsics. The result is a nasty C++ routine that has variables like "pedx" and "pesi," but has instruction names translated so that what used to be an understandable "paddw" is now "_mm_add_epi16." I know this was a hack job for portability, but the result sure is unreadable.

13 comments | Feb 20, 2008 at 23:22 | default

Comments

This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.

Current version

Navigation

Archives

¶They called _what_ in the inner loop??

Comments