Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Making a file format standard is hard work

There has been a lot of discussion lately over Microsoft's Open Office XML (OOXML) format and how it has been going through the ISO standardization process. Now, I'm not in the business of writing productivity software nor do I have any interest in doing so, but purely from a technical standpoint -- political issues aside -- I'd have to agree with detractors that OOXML is not a good standard. Underspecification bothers me the most. Not adequately specifying part of a standard results in ambiguity that can kill the utility of parts of a file format; once everyone starts making different errors in writing the same field, it can become impossible to discern the correct meaning. Having a tag such as "useWord97LineBreak" in a standard without an actual description of what Word 97 does is, of course, an egregious offense. However, I will say that trying to fully specify a file format isn't easy, and OOXML definitely wouldn't have been the first ISO standard that suffers from holes.

The reason is that writing a file format is, well, hard.

Let's take the simple example of storing a frame rate in a file. Because common frame rates are non-integral, and we want to maintain accuracy even over long durations, we'll store it in rational form, as the ratio of two 32-bit integers:

uint32_t frameRateNumerator;
uint32_t frameRateDenominator;

(This is, in fact, how AVI stores frame rates. It is also used with Direct3D 10.)

How many issues can arise with these two fields? Well:

There are a number of bad outcomes that can arise from not answering these questions. One possibility is that applications commonly write 30/1 for NTSC and then interpret that on read as NTSC, even though NTSC is actually 29.97. Another possibility is that an application writes garbage into the frame rate fields and then ignores the values on read, because it works in a medium that already has a defined frame rate and not all programs validate or use the value on read. A third possibility is that everyone assumes the order is backwards and the odd program written by the person who actually reads the spec can't read everyone else's files. And yes, I've seen all of these kinds of mischief before.

Good file formats are rare, but in my opinion, the Portable Network Graphics (PNG) specification is among the better ones. It uses clear language (must/must not/should/should not), it has rationales for various design decisions, and it attempts to advise what to do when dealing with non-compliance. For instance, when talking about converting samples between different bit depths, it describes the best case (linear mapping), an acceptable approximation (bit replication), and says what you should not do and why (bit shift left). That level of detail doesn't prevent all accidents, but at least it reduces them through awareness, and clarifies who is at fault when an interoperability problem occurs.

Comments

Comments posted:


OT. just wondering, is the ad in the upper right of your page connected with any work you have personally done?

nine - 11 09 07 - 05:38


A format like avi is interesting from a different aspect. On Windows there is an automatized codec locator and extension system, other platforms or application reading avi natively have to add support for each fourcc and format struct separately, usually in a form of a software update. It shouldn't necessarily be that way, but it is. When someone designs a new file format, he uses things which ties it to his own environment more or less. It is hard to avoid, unless that person is educated enough to be familiar with all other platforms as well.

Gabest - 11 09 07 - 13:26


I agree that "useWord97LineBreak" is realy bad, and ambiguitites should really
be removed, but on the other hand, writing extremely strict specifications could
result in nobody using them.

Remember the EDI specifications for Electronic Document Interchange? They where
sooooo strict and detailed, that you needed a weeks reading just to implement
ONE document. A small number of companies "sold" the EDI technology at outragious
prices (and they where bad implementations).

See other detailed specifications (OWL?), which seem not to be getting anywhere.

zardoz - 11 09 07 - 14:00


@nine:
Sort of... I know the guys at TealPoint Software pretty well.

@zardoz:
Very good point. Conciseness is definitely also a valuable (and rare) trait in standards.

Phaeron - 11 09 07 - 23:10


> Underspecification bothers me the most.

Then you haven't looked into the format deeply :)

It makes little sense, until you check the 97-2003 format. The new one is truly a text dump of the old binary structures. It's really a parody. I'd like to know how it's implemented. Also, there are a lot of situations that "trigger" the old format, such as password protection.

All these xml-in-zip formats bother me. They are slow* and overcomplicated. And Adobe plans to do it with the successor to PDF too.

You might want to check a A-V container designed by people who implemented a lot of other ones:

http://svn.mplayerhq.hu/nut/docs/nut.txt..

The specification takes care of the example issue you mention there, in addition to many others. It's very strict and well defined, from track properties to interleaving and frame ordering.

* Excel 2007 even added a binary format where the values and formulas are binary instead of XML for performance...

John - 12 09 07 - 09:11


Here’s what annoys me about the OOXML debate. Ok, the format is flawed, but after 10 years of hearing people complain about Microsoft opening up their format, this is better than having them keep it closed. Microsoft is not under any obligation to open this up at all, and how long will it take for them to 3rd party format and implement it correctly (IE anyone)?

The fact that they’re trying to open it up to ISO and are going through the process is a massive improvement over where we were going from Office 95 to Office 97, when the binary format broke.

Trimbo (link) - 12 09 07 - 14:16


@John:
The NUT description does look pretty good.

@Trimbo:
Oh, don't get me wrong, as an information document it's way better than what Microsoft usually puts out. An ISO standard is not a white paper, though. People have to actually be able to use ISO standards to fully read and write a file format, and the proposed OOXML falls far short of that. It's one thing to have ambiguities, it's another to have entire holes in the format description.

Phaeron - 12 09 07 - 23:12


Avery, how does that NUT format deal with partial files (from bit-torrent, emule, etc)? Can it decode frames without a header (for example does a keyframe store width/height information)? If not, then it is yet another useless container format.

Igor (link) - 07 01 08 - 22:28


I don't know whether NUT does deal with partial files, but I think your condemnation of it for that reason reflects a narrow point of view. People who do professional video production work, for instance, don't really care about that, and those working with mobile devices wouldn't be willing to pay for that level of size overhead.

Phaeron - 07 01 08 - 22:39

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.