Current version

v1.9.11 (stable)
v1.10.0 (exp.)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Calendar

« November 2012 »
S M T W T F S
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30  

Archives

01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ JSON >> XML (at least for me)

A few days ago, in a similar mood to the one that caused me to start an Atari emulator, I decided to write my own XML parser.

I've had an increasing interest in language parsers ever since I got to the point of parsing algebraic infix expressions and simple C-like languages. I've written about XML annoyances before, but I don't actually have much occasion to work with XML at the code level, because:

And yet, one of the advantages of XML is that it keeps people from creating their own interchange formats, which are typically far more broken. Since I occasionally do need to import and export little bits of metadata, I wanted to see just how much would be involved in having a little XML parser on the side. It wouldn't need to be terribly fast, as we're talking about a couple of kilobytes of data at most being parsed on a fast CPU, but it would need to be small to be usable. And I just wanted to see if I could do it. So I sat down with the XML 1.0 spec, and started writing a parser.

I have to say, my opinion of XML has dropped several notches in the process (er, lower than it already was), and I'm convinced that we need a major revision or a replacement. I got as far as having a working non-validating, internal-subset-only parser that passed all of the applicable tests in the XML test suite, but after writing more than 2000 lines of code just for the parser and not having even started the DOM yet, I had already run into the following:

All of this adds up to a lot of flexibility and thus overhead that simply isn't necessary for most uses of XML that I've seen. For those of who say who cares and modern systems are fast, I'd like to remind you that every piece of complexity is a piece that can go wrong in terms of an export/import failing, a parser glitch turning into an exploit, or a source of stability problems. This can be true even with a parser that is 100% compliant with the standard if the parser does not have guards against infinite expansion or parser recursion depth. It'd be so much easier if someone would just go through and strip down XML to an "embedded subset" that only contains what most programmers really think is XML and actually use, but I don't see this happening any time soon.

So, in the end, I stopped working on the XML parser and started working on a JSON parser instead. First, it's so much easier to work off of a spec that essentially fits on one page and doesn't have spaghetti hyperlinks like a Choose Your Own Derivation Adventure book. Second, it's so much simpler. Names? Parsed just like strings, which can contain every character except a backslashes and control codes. Entities? Just a reduced set of C-like escapes in strings, and thankfully sans octal. Comments? None. Processing instructions? None. Normalization? None. And as a bonus, it's ideal for serializing property sets or tables. The JSON parser and DOM combined was less than half the size of the XML parser at under 1K lines and took less than a day total to write, and half of that is just UTF-8/16/32 input code (surrogates suck).

To be fair, there are a few downsides to JSON, although IMO they're minor in comparison:

Still, JSON looks much more lightweight for interchange. I'm especially pleased that native parsing support is making it into the next round of browser versions, which hopefully will improve its uptake and therefore available tools.

(Read more....)