§ ¶Making a file format standard is hard work
There has been a lot of discussion lately over Microsoft's Open Office XML (OOXML) format and how it has been going through the ISO standardization process. Now, I'm not in the business of writing productivity software nor do I have any interest in doing so, but purely from a technical standpoint -- political issues aside -- I'd have to agree with detractors that OOXML is not a good standard. Underspecification bothers me the most. Not adequately specifying part of a standard results in ambiguity that can kill the utility of parts of a file format; once everyone starts making different errors in writing the same field, it can become impossible to discern the correct meaning. Having a tag such as "useWord97LineBreak" in a standard without an actual description of what Word 97 does is, of course, an egregious offense. However, I will say that trying to fully specify a file format isn't easy, and OOXML definitely wouldn't have been the first ISO standard that suffers from holes.
The reason is that writing a file format is, well, hard.
Let's take the simple example of storing a frame rate in a file. Because common frame rates are non-integral, and we want to maintain accuracy even over long durations, we'll store it in rational form, as the ratio of two 32-bit integers:
uint32_t frameRateNumerator;
uint32_t frameRateDenominator;
(This is, in fact, how AVI stores frame rates. It is also used with Direct3D 10.)
How many issues can arise with these two fields? Well:
- Are there minimums and maximums to the stored fraction? Are there certain profiles that can rely on restricted values, such as for mobile devices?
- Are there recommended values for common frame rates? (These can double as compliance tests.)
- Can the numerator be zero? This would mean a frame rate of zero.
- Can the denominator be zero? You can't divide by zero. What does it mean?
- If zero in either field is invalid, what should programs do? Should they reject it, automatically correct it to some value, or is it up to the implementation?
- What is the byte order of these fields, little-endian or big-endian?
- Must the stored fraction be stored in lowest terms? Is there any significance if they are not, and should an implementation reduce an unnormalized fraction? What algorithm is recommended for reducing fractions? (Finding one was a bit harder when you had to go to the library instead of doing a web search.)
- If an application approximates the fraction to a single value, what is the minimum recommended or required precision? Are there specific values that must always be represented exactly?
- Do these fields need to be consistent with other fields in the file? For instance, are there times when the same frame rate shows up multiple times in the file? If they are different, how are they reconciled?
There are a number of bad outcomes that can arise from not answering these questions. One possibility is that applications commonly write 30/1 for NTSC and then interpret that on read as NTSC, even though NTSC is actually 29.97. Another possibility is that an application writes garbage into the frame rate fields and then ignores the values on read, because it works in a medium that already has a defined frame rate and not all programs validate or use the value on read. A third possibility is that everyone assumes the order is backwards and the odd program written by the person who actually reads the spec can't read everyone else's files. And yes, I've seen all of these kinds of mischief before.
Good file formats are rare, but in my opinion, the Portable Network Graphics (PNG) specification is among the better ones. It uses clear language (must/must not/should/should not), it has rationales for various design decisions, and it attempts to advise what to do when dealing with non-compliance. For instance, when talking about converting samples between different bit depths, it describes the best case (linear mapping), an acceptable approximation (bit replication), and says what you should not do and why (bit shift left). That level of detail doesn't prevent all accidents, but at least it reduces them through awareness, and clarifies who is at fault when an interoperability problem occurs.
Comments
Comments posted:
OT. just wondering, is the ad in the upper right of your page connected with any work you have personally done?
nine - 11 09 07 - 05:38
A format like avi is interesting from a different aspect. On Windows there is an automatized codec locator and extension system, other platforms or application reading avi natively have to add support for each fourcc and format struct separately, usually in a form of a software update. It shouldn't necessarily be that way, but it is. When someone designs a new file format, he uses things which ties it to his own environment more or less. It is hard to avoid, unless that person is educated enough to be familiar with all other platforms as well.
Gabest - 11 09 07 - 13:26
I agree that "useWord97LineBreak" is realy bad, and ambiguitites should really
be removed, but on the other hand, writing extremely strict specifications could
result in nobody using them.
Remember the EDI specifications for Electronic Document Interchange? They where
sooooo strict and detailed, that you needed a weeks reading just to implement
ONE document. A small number of companies "sold" the EDI technology at outragious
prices (and they where bad implementations).
See other detailed specifications (OWL?), which seem not to be getting anywhere.
zardoz - 11 09 07 - 14:00
@nine:
Sort of... I know the guys at TealPoint Software pretty well.
@zardoz:
Very good point. Conciseness is definitely also a valuable (and rare) trait in standards.
Phaeron - 11 09 07 - 23:10
> Underspecification bothers me the most.
Then you haven't looked into the format deeply :)
It makes little sense, until you check the 97-2003 format. The new one is truly a text dump of the old binary structures. It's really a parody. I'd like to know how it's implemented. Also, there are a lot of situations that "trigger" the old format, such as password protection.
All these xml-in-zip formats bother me. They are slow* and overcomplicated. And Adobe plans to do it with the successor to PDF too.
You might want to check a A-V container designed by people who implemented a lot of other ones:
http://svn.mplayerhq.hu/nut/docs/nut.txt..
The specification takes care of the example issue you mention there, in addition to many others. It's very strict and well defined, from track properties to interleaving and frame ordering.
* Excel 2007 even added a binary format where the values and formulas are binary instead of XML for performance...
John - 12 09 07 - 09:11
Here’s what annoys me about the OOXML debate. Ok, the format is flawed, but after 10 years of hearing people complain about Microsoft opening up their format, this is better than having them keep it closed. Microsoft is not under any obligation to open this up at all, and how long will it take for them to 3rd party format and implement it correctly (IE anyone)?
The fact that they’re trying to open it up to ISO and are going through the process is a massive improvement over where we were going from Office 95 to Office 97, when the binary format broke.
Trimbo (link) - 12 09 07 - 14:16
@John:
The NUT description does look pretty good.
@Trimbo:
Oh, don't get me wrong, as an information document it's way better than what Microsoft usually puts out. An ISO standard is not a white paper, though. People have to actually be able to use ISO standards to fully read and write a file format, and the proposed OOXML falls far short of that. It's one thing to have ambiguities, it's another to have entire holes in the format description.
Phaeron - 12 09 07 - 23:12
Avery, how does that NUT format deal with partial files (from bit-torrent, emule, etc)? Can it decode frames without a header (for example does a keyframe store width/height information)? If not, then it is yet another useless container format.
Igor (link) - 07 01 08 - 22:28
I don't know whether NUT does deal with partial files, but I think your condemnation of it for that reason reflects a narrow point of view. People who do professional video production work, for instance, don't really care about that, and those working with mobile devices wouldn't be willing to pay for that level of size overhead.
Phaeron - 07 01 08 - 22:39
Comment form
Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.