Didn't Radeon cards already implement pixel shaders as OpenGL's fragment shaders originally? I mean that Pixel Shaders (as in the DirectX model) wasn't added first to ATI's Radeon line. So it could be done using OpenGL on both nVidia and ATI.
Pharaoh Atem (link) - 08 02 09 - 12:39
yes, those fragments are in opengl and mplayer uses them in its opengl video output. georgi was asking about how he could do it with direct3d only, for mplayer on windows.
thats a good question about support for older cards. that will have to be tested later. iirc, the direct3d vo was created for vista. so mplayer would not disable aero (when using -vo directx).
compn - 08 02 09 - 14:46
OpenGL speaks of fragment shaders, whereas Direct3D speaks of pixel shaders. They're basically the same thing, although the OpenGL term is probably more correct if you consider that the shader can run multiple times per pixel in the supersampling case.
As for the original Radeon and pixel/fragment shaders, the answer is no, but only due to specifications. The real answer is that for hardware around that time there actually is no distinction between a shader and a fixed function pipeline setup except for the language or API calls you use to configure the hardware. Neither the GeForce 1/2 nor the Radeon 7500 have enough horsepower to support even the minimum profile of any standard shader language, but both have OpenGL extensions that support enough flexibility to layer a restricted shader language on top. In fact, there is a custom build stage in VirtualDub that compiles custom pixel shader like assembly into data for the OpenGL NV_fragment_shader extension, and there is enough power on the GeForce 1/2 register combiners to run the equivalent of ~2-4 ps1.1 instructions.
Mplayer should not be disabling Aero Glass if it needs to use DirectX -- that means something is wrong in its DirectX usage, like it is locking the primary in windowed mode. Current versions of VirtualDub can blit to the screen without forcing desktop composition off. Unless the plan is to take advantage of the extra shading/filtering power of Direct3D, switching to that won't be an advantage because essentially the end of every frame is just a DirectDraw DDI blit through the Present() call.
Phaeron - 08 02 09 - 14:58
Thank you very much! I didn't even remotely expect such a nice explanation. After everything you explained and my research those 2-3 days I finally understood that it would be really hard to do what I want without pixel/vertex shaders. The reason I asked this question was to understand if it is worth learning how to program vertex/pixel shaders, but now I see that it would be easier to do so. After reading your explanation I think that it's close to impossible (considering my limited knowledge) to accomplish the task otherwise.
My other idea was to target exactly those cards without pixel/vertex shaders in order to provide broader target group of my driver, but now I see that this is not feasible.
Since the Direct3D video out in MPlayer targets Windows Vista and beyond (because you know that DirectDraw disables Aero), all video cards running Aero support pixel/vertex shaders anyway. The previous DirectDraw driver in MPlayer was just manipulating the video overlay's controls, which include everything I'm asking for. Of course since I don't use overlay anymore, but a simple offscreen surface, which I StretchRect to the backbuffer, I'll have to go with pixel/vertex shaders.
Thanks again for the good explanation. Now the question changes slightly (180 degrees, actually). I already got some idea where to start from, but do you have any suggestion which comes to you mind about already implemented brightness/contrast/hue/saturation/etc. as a pixel shader code? These days I'll look into some open source projects about this corrections and I'll try to find some existing implementation. I really doubt that such thing haven't been done so far and since I don't want to reinvent the wheel, I'll just give credits to the author and use his code.
Someone already suggested to use the Effects framework of Direct3D. I still haven't had time to study it, but I suppose that it is exactly what I need - addition of a pixel/vertex processing before the actual output:
Thanks again and greetings for the excellent blog, which is quite interesting. I'm new to DirectX as a whole, but here many new things can be learned!
Keep up the good work!
Georgi Petrov (gogothebee) - 09 02 09 - 04:53
I'm sorry - the link got misinterpreted and doesn't work...
Georgi Petrov (gogothebee) - 09 02 09 - 04:54
"Mplayer should not be disabling Aero Glass if it needs to use DirectX -- that means something is wrong in its DirectX usage, like it is locking the primary in windowed mode. Current versions of VirtualDub can blit to the screen without forcing desktop composition off. Unless the plan is to take advantage of the extra shading/filtering power of Direct3D, switching to that won't be an advantage because essentially the end of every frame is just a DirectDraw DDI blit through the Present() call."
I just saw this comment. The previous driver (written by compn, who also commented here) uses hardware overlay. You said that VirtualDub doesn't disable Aero and can blit, but it doesn't use HW overlay. I think that you are using DirectDraw, but without overlay. Just the equivalent of a offscreen surface and the blit it on the screen (if there is a such thing in DirectDraw). I think this is the case. The reason I started writing the Direct3D driver was to get rid of the HW overlay usage, which seems outdated in the future of desktop composition. Unless I'm missing something, of course.
Additionally DirectDraw is marked obsolete in the API and somewhere in the future it may be dropped, but this is only a speculation of mine. This will render all games using DX7 unplayable, so I don't know if it would ever happen. I wanted to write this Direct3D driver to benefit MPlayer users in the future and because it's really fun to see the result of your work on the screen :)
Georgi Petrov (gogothebee) - 09 02 09 - 05:12
i didnt write vo_directx , that was Sascha...
for d3d shaders, you might want to look at mpc-hd project.
i linked to it in the original mplayer-dev-eng thread when you proposed to write a d3d output. sourceforge.net/projects/mpc-hc/
compn - 09 02 09 - 07:14
My apologies to Sascha...
I'll check out the MPC HC.
Georgi Petrov (gogothebee) - 09 02 09 - 07:55
Ah, if Mplayer is deliberately turning off desktop composition to use overlay, then that makes sense. As far as I know, Windows Vista does not automatically do this, and in fact I had to put special code into VirtualDub to block overlay usage if desktop composition is detected, because although the driver reports overlay availability, it doesn't actually work.
As I said, I recommend looking into the YCbCr conversion issue first, because that is by far the most important aspect and the main advantage given up by dropping overlays. Color conversion, brightness, contrast, and saturation are all linear transforms and as such can all be handled on the CPU side by matrix arithmetic without having to swap shaders. If possible you do not want to get into having a dozen different shaders for various operations as it's a pain to manage.
If you do decide to support fixed function, I'd recommend prototyping with shaders anyway as it is MUCH easier to prototype in HLSL than in fixed function. There are a couple of other issues I forgot to mention, though: texture format availability, and non-pow2 textures. Planar YCbCr is harder to support efficiently if you do not have L8 textures, so you will need to check caps. The DirectX SDK contains a hidden card caps Excel spreadsheet in one of the samples that is very useful for figuring out what formats are realistically available. The non-pow2 issue is more annoying as cards in that range require pow2 textures to some extent -- at best you have NONPOW2CONDITIONAL support, which isn't always available and when it is present doesn't always work properly. If you do need to use pow2 textures due to lack of NP2C or need for dependent reads in a shader, you have to worry about border effects. VirtualDub doesn't handle this yet and that's why you can sometimes see artifacts along the right and bottom borders in Direct3D mode.
Oh, and by the way, vertical sync (vsync) in Direct3D windowed mode is a pain. You'll discover that letting Direct3D do it is non-viable unless you have a thread devoted to presentation, because otherwise Direct3D blocks your UI or decoding thread for a substantial portion of each frame. I think VMR9/EVR does this. I ended up timeslicing on the UI thread instead because I found Win32 input issues with multithreaded UI to be intractable.
Be careful about using the Effects framework. It's handy, but you need to carry around the boat anchor D3DX DLL to use it, and the only official redist methods are to either use a pared down DXSetup (huge) or web install. You're not permitted to just distribute the DLL side-by-side. I went the hard way of having a build tool compile the effect and reflect out the data; an easier but less powerful way is to compile raw shaders with fxc.
DirectDraw isn't getting dropped anytime soon, as way too much software is dependant. It has, however, been degraded in Windows Vista as now it executes in software rendering only.
I took a look at the MPC-HC source code, and its Direct3D rendering looks substantially similar to the original MPC code. Be careful -- the original MPC had a problematic bicubic shader that showed artifacts on certain enlargement factors that were exact multiplies, such as 5x (360 -> 1200). I believe it's due to rounding errors in the graphics hardware that cause the texture sampler and shader to see slightly different interpolator values. Current versions of VirtualDub use a different two-pass algorithm that is much less sensitive to error, but has the dubious distinction of being impossible in OpenGL (the shader reads the same texture twice with different filters).
Hue shifting is the one algorithm I don't have an easy shader equivalent for. You can use the straight if() method in ps2.0, but fill rate might be an issue on lower cards due to shader length. ps1.1-1.3 have no swizzling capability at all and ps1.4 is barely better, so for the algorithmic method you would likely have to have a series of shaders depending on how many sextant shifts are involved. Your best bet might actually be to do a full blown 3D lookup with texm3x3tex with everything baked in, including color space transform. I'd suggest looking into the sources for XBox Media Center as I think the authors did a lot of research on how best to do blits to the screen using the ~ps1.25 graphics hardware on that platform, including accuracy measurements.
Phaeron - 10 02 09 - 01:38
Iím not sure if MPlayer turns overlay off on purpose. The code is unchanged since before Vista came out. I think that just requesting an overlay turns off Aero, but you seem to disagree. I donít really know, because I only went through the code, I didnít write it. Anyway Ė I think that getting an overlay in desktop composited environment is not possible. HW overlay as far as I know is exception of the rule like nothing else. For example you can have 24 RGB bit overlay on a 16 bit RGB desktop (if Iím not mistaken). Also the overlay simply ďcutsĒ a quad of the desktop for its own usage and uses color keying.
Since in a composited environment every window is drawn to an offscreen surface and then all those offscreen surfaces are composited by DWM in order to add transparency, have live thumbnails and so on, having an ďcut offĒ portion of the screen for overlay doesnít sound logical anyway. In Vista the only way to have a working overlay is to disable Aero and revert back to non-composited environment, where everything is drawn like before.
I may be wrong, of course.
ďAs I said, I recommend looking into the YCbCr conversion issue first, because that is by far the most important aspect and the main advantage given up by dropping overlays.Ē
What do you mean? This happens in the hardware. I get offscreen surface in either YV12 or YUY2 format, copy the frame into it and then I StretchRect it on the RGB backbuffer. The actual conversion happens in the HW exactly on this StretchRect. I choose between YV12, YUY2 or RGB/BGR based on the colorspace the movie codec uses. Iím not sure about the overlay, but I think you can get overlay directly in YV12, YUY2 or RGB and do the same thing, so I donít see any advantage here. The real advantage to using Direct3D is to render to a ďcompositableĒ render target and this is the final RGB backbuffer.
Maaay be you talk about one issue I heart of Ė that doing the conversion this way you lose some of the dynamic range, something like 16-245 instead of 0-255 values for each pixel. Iím not sure if this is valid only for VMR7/9 (which Iím not using) or not.
ďColor conversion, brightness, contrast, and saturation are all linear transforms and as such can all be handled on the CPU side by matrix arithmetic without having to swap shaders.Ē
Thatís the thing I want to skip. I want to run the final HW converted to RGB surface through vertex/pixel shaders and do the transform there. These transformations will happen only if the user changes the default brightness/contrast/etc. Otherwise no correction on the final image will be performed.
About the non power of 2 textures: why are they an issue? Iím dealing with surfaces (not textures) everywhere. The only place I use textures is the OSD and yes, they are exactly L8. Canít I run the vertex/pixel shader code on the final backbuffer surface? For OSD I try to create a texture with the same dimensions as the backbuffer, but if thereís no support for arbitrary texture size, I create the next smallest possible texture and just upsize it when doing StretchRect from the L8 texture to the RGB backbuffer surface.
Vsync works just fine in the current code. I havenít seen any issuesÖ I use D3DPRESENT_INTERVAL_ONE andÖ it works.
About the Effects frameworkÖ This thing with the DLL doesnít sound good. Iíll investigate further when the time comes, but let me ask you a question: Iím still in my early days of Win32 programming, but canít I just except to find the DLL on the clientís system and load it dynamically with something like LoadLibraryA (ďd3d9.dllĒ)? This question may be stupid though.
ďBe careful -- the original MPC had a problematic bicubic shader that showed artifacts on certain enlargement factors that were exact multiplies, such as 5x (360 -> 1200).Ē
I donít have plans to use shaders for something else than brightness/color correction. When I do StretchRect from the offscreen surface (with the movieís dimensions) to the backbuffer (the window dimension on the screen), the video card does bilinear resize for me. BTW I can set the following values: D3DTEXF_NONE, D3DTEXF_POINT, D3DTEXF_LINEAR, D3DTEXF_ANISOTROPIC, D3DTEXF_PYRAMIDALQUAD , 3DTEXF_GAUSSIANQUAD. I use D3DTEXF_LINEAR. Does this sound familiar to you?
Why VirtualDub does use shaders to do its own resizing? Remember, Iím new to DirectX and video programming as a whole, but donít you use the same method I use (which doesnít require it)?
About the Hue Ė Iíll look into it when Iím finished with the other ones, but thanks for pointing me to XBox Media Center ;)
Georgi Petrov (gogothebee) - 10 02 09 - 12:37
> I think that just requesting an overlay turns off Aero, but you seem to disagree.
Didn't happen on my GeForce Go 6800, although maybe Microsoft changed that in Vista SP1.
I don't know of any modern video card that supports RGB overlays -- they're basically all YCbCr only. I think it's supported in XPDM drivers mainly for DVD playback, as some DVD playback applications will refuse to run if an overlay is not present, and they can't be trivially screencapped.
A composited desktop doesn't have too much to do with overlay support. Yes, translucency doesn't work, but the active window area is opaque most of the time anyway. Consider that you can already do translucent windows even in XP using WS_EX_LAYEREDWINDOW. I believe it's actually the lack of overlay support in WDDM that's the issue.
> About the non power of 2 textures: why are they an issue? Im dealing with surfaces (not textures) everywhere. The only place I use textures is the OSD and yes, they are exactly L8. Cant I run the vertex/pixel shader code on the final backbuffer surface? For OSD I try to create a texture with the same dimensions as the backbuffer, but if theres no support for arbitrary texture size, I create the next smallest possible texture and just upsize it when doing StretchRect from the L8 texture to the RGB backbuffer surface.
You can do that, of course. It means you will have to StretchRect() into an off-screen render target, which then means you have to deal with lost surfaces (the dreaded Alt+Tab problem). Note that fully arbitrary texture sizes aren't supported until either the GeForce 6 or Radeon HD2000 series and above, so there are a lot of cards that have at least some texture size restriction.
StretchRect() is basically the D3D9 API into the old DirectDraw blit path. There are very few things you can do with StretchRect() that you can't do by drawing polygons. One is copying from a render target surface to a render target texture, and another is converting a supersampled or multisampled surface to a non-AA surface. I've seen enough bugs on that route that I draw polygons whenever I can. Then again, I'm a 3D control freak.
> Vsync works just fine in the current code. I havent seen any issues… I use D3DPRESENT_INTERVAL_ONE and… it works.
The danger is that the D3D runtime does this by polling the beam until it's at a good place to blit. At 30 fps, you probably won't notice the problem because your presenter will always be idle for at least an entire frame. At 60 fps, though, if you're doing this on the UI thread you may notice that you suddenly can't get input through for seconds at a time as the message pump is blocked while D3D is polling the beam and it does this almost constantly. I've been doing a lot of deinterlacing and frame rate doubling work recently, so I hit this problem pretty hard.
Note that this only applies to windowed mode. In full screen mode the runtime just queues a flip command, and you won't block as long as you don't exceed refresh rate for longer than the frame queue. If you're a stickler, though, you might need to monitor the present latency in that case to maintain A/V sync. I believe VMR9/EVR do this.
> Im still in my early days of Win32 programming, but cant I just except to find the DLL on the clients system and load it dynamically with something like LoadLibraryA (d3d9.dll)?
There's a different one for each version of the SDK, and you're not guaranteed that ANY are installed even if DirectX 9.0c itself is. Do you need D3DX9_23.DLL, D3DX9_24.DLL, 25, 26... 39?
Trust me, either avoid it or install it. You cannot depend on it being preinstalled.
> BTW I can set the following values: D3DTEXF_NONE, D3DTEXF_POINT, D3DTEXF_LINEAR, D3DTEXF_ANISOTROPIC, D3DTEXF_PYRAMIDALQUAD , 3DTEXF_GAUSSIANQUAD. I use D3DTEXF_LINEAR. Does this sound familiar to you?
D3DTEXF_LINEAR gets you bilinear filtering, which is the best you can get. Aniso and the -Quad ones are illegal to use with StretchRect(), and no hardware supports the 4-tap filters anyway.
> Why VirtualDub does use shaders to do its own resizing? Remember, Im new to DirectX and video programming as a whole, but dont you use the same method I use (which doesnt require it)?
There are several reasons that VirtualDub uses shaders for YCbCr conversion. The first problem is format support, as the only standard D3D surface formats are YUY2 and UYVY, and the rest are in FOURCC land -- and I have no idea to what extent those are supported for surface-to-surface conversions since DXCapsViewer doesn't show those. YV12 and YVU9 have historically been pretty buggy at least with DirectDraw overlays. Second, you don't have control over the chroma sampling, and it's unfortunately fairly common for the hardware vendors to cheap out and use point sampling for vertical or even both axes in chroma. Third, and most seriously, you don't have control over the levels, as you guessed. In most cases you want 16-235 in YCbCr and 0-255 in RGB, but some video drivers do a straight conversion instead. By using a shader, I have exact control over the levels. In recent versions of VirtualDub I've even begun supporting higher color depths than 8bpc by converting to 16F and dithering to the screen, which reduces banding. Finally, you can't do bicubic without a shader.
Phaeron - 11 02 09 - 01:39
My observations mirror Georgi's in that Vista, since RTM, automatically disables composition whenever some application uses overlay. This includes ancient applications created much before Vista's first public preview releases. And then, the overlay does work, and it's a true color-keyed overlay, which means stuff like setting the desktop to its color to display some video as the background works, as well as painting with the overlay color on other apps. Likewise transparency affecting the key prevents the overlay for showing, etc...
I tested this back then with a Radeon 9600 and a GeForce 6600 and both exhibited this behavior. Now, with SP1 and a GF8800, it's still exactly the same.
I suspect DWM turning off is to prevent glitches (like transparent windows on top breaking the video, or bogus thumbnails/filp3D). They went all the way out to prevent glitches like this, to such extremes as hiding the hardware mouse cursor and replacing it with an artificial one while dragging windows to prevent perceived lag...
As for VSync, I don't know if we're talking about the same thing, but with desktop composition active, the presentation interval doesn't make any difference for windowed apps - the application never waits and can render as many frames as it wants. Then, when the time to compose a frame comes, the most recent one is grabbed and gets displayed. So there's never tearing and the framerate is not limited (so it's possible not all the frames reach the screen).
John - 11 02 09 - 14:20
Yes, I agree with John about everything he wrote.
Phaeron, all your comments were really interesting. Iíll write again in this blog when I start implementing what I have in mind. I think that this is an excellent discussion and many people will find very useful points in this blog. Let me take my time with pixel shaders, let me see what I can learn and Iíll be back ;)
Doing bicubic resize as well as keeping the full 0-255 RGB range sounds sweet and Iíll see what I can do. For cards that donít support pixel/vertex shaders, Iíll employ the current StretchRect conversion, but in the future I may add shaders-only resize.
Just one question (Iíll find the answer on my own eventually) Ė how can I skip this D3D DLL problem? By not using the Effects framework or? Iíll see how can I ďcompileĒ my shader code and include it directly into my .c file in MPlayer, but Iíll understand how should I do everything.
Iím really excited about your blog! I found sooooo many interesting topics here!!! Keep up the good work!
Georgi Petrov (gogothebee) - 11 02 09 - 16:17
It might be that some other common DirectDraw call turns off the DWM -- I had this problem in the beta with GetSurfaceDesc() on the primary before it got fixed in RTM. Either that, or I was using an old and buggy NVIDIA driver. Not that I'm complaining, mind you, because IMO auto-disabling DWM is the right thing to do, and in my case I definitely want to avoid killing desktop composition instead of using overlay.
If you want to avoid the D3DX DLL problem, then yes, you need to avoid using the FX framework at runtime. You can still use it in the build system, though. You also need to avoid the other D3DX calls, but in practice the rest isn't all that interesting, especially if you're doing 2D/video.
To ditch D3DXFX, you need to understand what it does.
The Effects framework really consists of two halves: a compiler that converts HLSL source code to a binary effect, and a runtime that executes a binary effect on the D3D device. Getting rid of the compiler is just a matter of either using fxc.exe or a custom build tool based on ID3DXEffectCompiler. VirtualDub has an internal build tool called Asuka that does the latter, generating a source code file from an .fx file.
Replacing the runtime is more work. The runtime does four things: it unpacks a binary effect resource, it uploads shaders to the device, it sets render/sampler state, and it computes and sets shader constants. The last three are relatively straightforward calls in IDirect3DDevice9 (CreateVertexShader/CreatePixelShader, SetRenderState/SetSamplerState, and SetVertexShaderConstantF/SetPixelShaderConstantF). The first one is where you get into trouble. Having dissected part of the .fxo format, I can tell you that you most definitely don't want to try loading those manually, which means you will need to design and generate your own format.
Shaders can be extracted from an effect through ID3DXBaseEffect::GetFunctionDesc(), and you can determine basic constant and texture-to-sampler mappings by passing the shader bytecode through D3DXGetShaderConstantTable(). Unfortunately, you cannot directly extract out preshaders, computed parameters/states, or render/sampler state. It is possible to reflect out render and sampler state if you create a fake Direct3D device and "run" effects on it, which is what VirtualDub's shader build process does.
In the end, though, this is likely overkill for what you are doing. I did this because I wanted to support ps1.1-1.4, which requires a lot of multi-pass shaders and complex constant setup. You are dealing with ps2.0+, so I'd recommend just hardcoding constants at registers like MPC does, i.e.
extern sampler src_texture : register(s0);
extern float4 src_size_params : register(c0);
...and just using fxc to compile raw shader object files. This way, it should also be relatively simple to make shaders that can also be compiled using Cg and used with OpenGL.
Phaeron - 13 02 09 - 00:06
About overlays: careful, many current cards don't support hardware overlays anymore: at Nvidia's, the last card with hardware overlay support is the original Geforce 6800; all subsequent models (66xx to 6050, 7xxx, 8xxx) don't support hardware overlays anymore (dixit developers of Nouveau, the Free accelerated X11 driver for Nvidia cards); on Ati's side, hardware overlays were dropped in the rs690 and r500 chips (the upper X1xxx Radeons).
Mitch 74 (link) - 16 02 09 - 10:56
That may be the case, but DirectDraw still reports a video overlay for my 8400GS-based Quadro NVS 140M. If the driver emulates the overlay in a way better than you could do through the userspace APIs, I'm not sure it matters whether the hardware actually supports it or not.
Phaeron - 17 02 09 - 01:45