§ ¶DirectX vs. OpenGL
Not only that Avery, just look at yourself using platform specific DirectX API instead of OpenGL -- how did they manage that?
Uh, well, it shouldn't be too surprising. I'm primarily a Win32/x86 programmer. To some people, it's a minor miracle anytime we write code that is in any way portable and doesn't declare at least one HRESULT variable per function.
In this specific case, it's more of an issue of practicality. I know both, I've used both, sometimes I prefer one over the other. I do like portable and multivendor APIs, but for me, deployment, ease of use, stability, and licensing are also factors.
Truth be told, I do like OpenGL better. It has a real spec, written in a precise manner, and the base API is better thought out than Direct3D, which has been hacked together over the years and still has garbage such as the infamous D3DERR_DEVICELOST. I learned a lot about 3D graphics by reading the OpenGL API and extension specs. It's portable, it makes easy things easy (like drawing one triangle), it's extensible, and it's faster since the application talks directly to the user-space IHV driver. Microsoft tried improving its API with Direct3D 10, but the documentation is still incomplete, it's still non-extensible, and it's even less portable than before since it's Vista-only.
Before anyone says anything else, though, theory and practice are a bit different. In practice, OpenGL drivers tend to have their own nice collections of bugs, and some are better optimized than others. Most games use Direct3D, so a lot of effort has been put into optimizing drivers and hardware for that API. Extensions also mean extension hell, and figuring out which extensions are widely available and work as advertised is just as frustrating as Direct3D caps hell. And then there's the shader issue whereas Direct3D has standardized assembly and HLSL shader languages, OpenGL has gone through numerous vendor-specific combiner-style, assembly-style, and high-level language shader extensions.
Now, as for how this pertains to what I do, well, VirtualDub isn't exactly a 3D centric application. It does have 3D display paths, and I have dabbled with hardware acceleration for video rendering as well, so here's my VirtualDub-centric take:
When you write OpenGL code, you don't have to write an entire framework to make sure the application doesn't blow up when the screensaver appears. (I use lots of render targets which have to be allocated D3DPOOL_DEFAULT.) What other API requires you to free all objects of certain types at random times?
The shader situation on OpenGL sucks. When I was implementing OpenGL-accelerated capture support, I wanted to support older cards and looked into the NV_fragment_shader and ATI_fragment_shader extensions. Programming those directly was so painful that I wrote my own assembler for it ("asuka glc" in the build process). I later looked into Cg, and that was much improved except that it was a bugfest the first day I broke the compiler with the vector expression (a + (a-b)*1.0). I haven't tried ARBfp or GLSL, but I'm hoping those work somewhat more reliably.
OpenGL's coordinate system makes sense: bottom-up orientation, pixel and texel centers on half-integer coordinates, normalized device coordinates are -1 to 1 in all axes. Direct3D is a mess: device coordinates are bottom-up with centers on integers which has the nice side effect of making the projection transform viewport-dependent but functions that take integer screen-space rects are top-down, and textures are top-down with centers on half-integers. NDC X and Y are -1 to 1 but Z is 0 to 1. Argh!
In terms of off-screen rendering, OpenGL has a distinct advantage over D3D9 due to better support for readback. Reading back the results from the video card into system memory where it can be processed by the CPU or written to disk is a major bottleneck when using the GPU to accelerate video. Direct3D 9 has the infuriating GetRenderTargetData(), which has the stupid restriction that it can't do a subrect read and also tends to stall in unexpected ways. OpenGL not only has the more flexible glReadPixels(), but in my tests it did readback noticeably faster on both ATI and NVIDIA cards. With asynchronous readback via pixel buffer objects (PBOs) on NVIDIA hardware, the readback advantage rises to ~2x. (If you're on Vista with a WDDM driver, Direct3D 9.L can supposedly do a subrect readback via StretchRect. If anyone knows if this is faster and if it can be done asynchronously I'd be interested in knowing.) I believe that NVIDIA's CUDA is also able to push into buffer objects in OpenGL, which allows for texture upload, whereas with D3D9 it can only push into a vertex buffer.
On the flip side, one of the more annoying aspects of OpenGL I've found is the tying of sampler parameters filtering, addressing, mip map LOD bias, etc. to textures. For a software renderer that modifies texture data for these changes, this makes sense, but it doesn't make sense for modern hardware where these are usually sampler states. It can also make sense if you consider them intrinsic to the texture, but that then breaks down if you want to do something beyond what the hardware can support. A high quality image processor really needs to support higher quality filtering than bilinear; bilinear filtering gives a really crappy gradient map. To do this, you have to emulate the higher-order filtering using lower-order filtering, and you run into the problem that if you ever need the same texture bound with different parameters in the same pass, you're screwed, because you only have one set of parameters on the texture. I do this when rendering bicubic in VirtualDub's D3D9 driver, and I can't port it to OpenGL because it's impossible to do so. I looked at the source for various OpenGL binding layers, and they all seem to emulate sampler states by pushing them into texture parameters. This sucks.
I guess I should say something else good about Direct3D. Well, the diagnostic tools are better, at least on Windows. I have NVPerfHUD, PIXWin, and D3D debug mode for debugging, I have NVShaderPerf, FX Composer, and GPU ShaderAnalyzer for shaders, and I have debugging symbols for the Direct3D and D3DX runtimes from Microsoft's public symbol server. For OpenGL, well, I have glGetError() which returns GL_INVALID_OPERATION if I remember to call it, GL debug mode if I'm using an NVIDIA card, and a GL debugging tool that everyone's pushing but is only available as a trial edition unless I pay about as much as a full VTune license costs.
Another annoying aspect of OpenGL is that NVIDIA's practically the only one really supporting it on the desktop side. They're pushing out all the cool functionality via extensions and have their OpenGL docs about as well updated as the Direct3D ones. ATI, well... not that they have many useful docs on their site anyway, but their OpenGL docs are way behind and their extension support in their OpenGL driver was way behind the last time I checked. They've put more effort into Direct3D support, even "extending" D3D9 with API hacks to support Fetch4 and R2VB. Disclaimer: I am an NVIDIA fanboy. I still want ATI to better support OpenGL so that it continues to be a viable competitor to Direct3D 9/10 on Windows.
With both APIs, it would be nice to have more flexibility in application structure. Even in games, I think we're long past the era where programs are all single-threaded, have nothing better to do with the CPU than spin in Present() or SwapBuffers() waiting for vertical blank, and don't need to render anything except to the screen. I want better support for multithreading, better ability to avoid unexpected stalls in the driver, the ability to detect/count/wait for vertical blank intervals without polling, and to not lose 3D rendering capability whenever someone locks their workstation or logs in remotely. What I do with 3D in VirtualDub is simple most of the time, I draw a quad. The complexity is that I still have to write a whole lot of framework around that code to handle lost devices, shaders, hopping commands between threads, managing invisible placeholder windows, and dynamically linking to 3D APIs.
The shader situation in OpenGL doesn't suck. GLSL is great.
Not without some warts. The spec says that code should always run, even if it can't be accelerated, because for some reason they think that handling this in application code is unreasonable, so you can't write several levels of shaders and test to see which one the user's system can actually handle. This makes capability testing impossible, short of unreliable empirical framerate/speed testing. Also, code is compiled by the end user's drivers, and nVidia drivers tend to accept more than they should, so expect to get error reports back with compiler errors (maybe they fixed this).
OpenGL seems to have a wide ranging problem in extensions: they leave stupid warts around with the excuse "someone can fix this in a future extension". FBOs can fail with unexplained errors, and the spec says the solution to this is just to keep trying different configurations until you trip over one that works, and maybe they'll fix this with another extension. It makes the whole extension unpredictable, with an error return that can happen for unspecified reasons that can't be recovered from short of disabling the extension.
Glenn Maynard - 10 01 08 - 12:16
"The shader situation in OpenGL doesnt suck. GLSL is great."
Ultimately what will count is how many games are created under OpenGL vs DirectX.
We all know the answer, so what will be logical is that OpenGL will have to improve a LOT more for even a tiny new advantage.
she - 11 01 08 - 00:53
You're right on the PC platform, but OpenGL is sneaking back in via mobile phones and OpenGL ES. There is a lot of interest in that market and the IHVs are devoting a decent amount of attention to it. Also, since D3D10 has such high OS and hardware requirements, the vast majority of users are still stuck with Direct3D 9. Although OpenGL hasn't improved much lately, D3D9 hasn't advanced at all in years. Kinda sucks across the board, really. The good news is that it's given a chance for the people who don't spend hundreds of dollars on bleeding-edge video cards to catch up and the disparity in functionality between the low-end and high-end consumer hardware has narrowed a bit.
In theory OpenGL has leapt ahead functionality-wise with NVIDIA back-porting nearly all of the new D3D10 functionality to XP via OpenGL extensions, but software doesn't really seem to be taking advantage of this. The other possibility is that both OpenGL and Direct3D have mostly covered what people need to do and thus no one cares much about one over the other anymore.
Phaeron - 11 01 08 - 01:31
Nvidia was, indeed, the only OpenGL-interested vendor for quite a long time, along with XGI; you may want to check AMD/Ati again though: they have rewritten their OpenGL driver over the summer (at least the Linux one, but if I'm not mistaken, they're basing both Windows GL and Linux GL on the same sources), and started releasing it.
These drivers have seen (on the Linux side) a surge in performance (up to 3 times faster), and documentation seems more available.
From the Wine developers' WWN, it seems that HLSL and GLSL are pretty much equivalent.
I would recommend you have a look at it, or at least check out recent Phoronix articles.
Mitch 74 - 11 01 08 - 11:45
In DirectX it's posible to use a Query to flush the command buffer to the GPU and start rendering. Again while rendering you can check if the rendering is done and Sleep(1) while rendering so it's not necessary to wait when fetching the rendertarget from the GPU. (so it is sort of async. readback). But as you noted it sometimes stalls.
The code look like this:
LPDIRECT3DDEVICE9 pDevice;//Should be initialized somewhere else
//Flush command buffer
pQuery->GetData( NULL, 0, D3DGETDATA_FLUSH );
Do cpu heavy stuff here concurrent with the GPU
//While GPU is rendering sleep
DWORD endtick=GetTickCount()+10000;//assure that we don't sleep for more than 10 sec (Hyperthreading sometimes doesn't work with query ).
LOG("sleep while query not flushed...")
while((S_FALSE == pQuery->GetData( NULL, 0, D3DGETDATA_FLUSH ))&&(GetTickCount()
tsp - 11 01 08 - 17:10
You're assuming that all commands can be put into the command buffer. This is generally true for drawing triangles, but other commands cannot always be batched. GetRenderTargetData(), in particular, seems to stall on some drivers. In theory, GetRenderTargetData() should always take negligible time and locking the system buffer with DONOTWAIT should prevent stalling. Well, in theory.
Phaeron - 11 01 08 - 20:49
but in practice the command buffer is first executed when GetRenderTargetData is called so it takes a long time (at least with nvidia hardware). To prevent this you use the above code to execute the buffer (and it takes a long time with many draw calls) just before the GetRenderTargetData call and use the cpu cycles to something usefull instead of stalling in GetRenderTargetData.
tsp - 12 01 08 - 17:01
Still sucks... wasting time on the CPU polling the GPU to finish, and then wasting time on the GPU waiting for the next round of commands to arrive. Unusable if you are attempting to do field-locked flipping on the display at the same time, because you can't queue frames. Whereas with OpenGL, you do a glReadPixels(), which doesn't block, keep drawing until the next frame, and then do a glMapBuffer(), which doesn't block. No wasted time on either unit and no goofy polling.
Phaeron - 12 01 08 - 17:46
well that is the default behavior if you don't use a query with directx. By using the query you at least get some control on when the command buffer is flushed so you can place some cpu heavy code after the flush or upload textures to the GPU instead of waiting for GetRenderTargetData to stall because the command buffer is first flushed when GetRenderTarget is called. The query was implemented in Brook, more info here: http://www.gpgpu.org/forums/viewtopic.ph..
tsp - 13 01 08 - 18:05
DX10 has much faster copy functions, and they work on depth/stencil buffers too (just for reading, 10.1 will add writing). Yes, being vista-only sucks.
Gabest - 16 01 08 - 09:11
Let me post a few comments since I was responsible for the creation of this topic.
DirectX is a mess. There is no upward compatibility, you have to change code and recompile in order to get new functionality. You can't just slap another extension.
OpenGL can be a problem too with its cumbersome way of detecting and using extensions.
However, GLEW library helps a lot (preferrably linked with #define GLEW_STATIC). GLUT can be also be replaced if you know how to set proper pixel format for your dummy rendering window you and don't shy away from some Win32 API programming.
As for shading languages, the answer is simple -- Cg.
It is easy to use with OpenGL, it works under Linux (perhaps even Mac, haven't checked), it has solid documentation, and bugs get fixed via driver updates.
As for readback performance, I am getting ~700-900MB/sec with 8800GTX here. SSE4.1 instruction MOVNTDQA is supposed to improve that a lot once video card driver vendors implement it.
CUDA is a strange beast, I would stay away from it until some new version comes out.
Finally -- Avery, I would suggest you to sign up for an NVIDIA registered developer.
Igor (link) - 25 01 08 - 11:16
The texel/pixel addressing mismatch was resolved in Direct3D10.
I have also used both D3D and OpenGL and I have to completely disagree with you. OpenGL is full of old garbage, while Direct3D has evolved into a great API.
Btw. as far as I know, GLSL can actually compile some really simple shaders into nVidia register combiners (for old cards).
Regarding offscreen rendering in D3D9, GetRenderTargetData is usually a bad idea for realtime apps. It's better to render stuff to a texture and blit that back.
Another thing: CPU will not spin in Present in normal circumstances, unless the GPU has problems keeping up (and it isn't necessarily the Present call where it will stall). But there are ways to reclaim those CPU cycles. (eg. the one someone mentioned using queries, or by locking texture data, that comes from rendering, with the do not wait flag) You may also consider double buffering your system memory texture, so that you always lock what was rendered in the previous frame, in this case the copying won't have to be a CPU/GPU synchronization point.
Kornel Lehocz - 02 07 08 - 07:57