« Spam Record Redux | Main | David Weise »

January 30, 2005

The Open Data Format Initiative

Microsoft recently opened the Office XML file formats, as discussed on Slashdot. Robert Scoble describes Jean Paoli as "beaming" when he told him this.

This notion is also near and dear to my heart. Open data formats are important for many reasons, but the main one is that it's my data. When I write a file in Word, the fact that Word can lock the data into a format that no other program can read is a terrible thing. It prevents me from having complete access to the data. Instead of needed to only preserve the file to preserve the information, I also have to keep a running copy of Word, which means I need a computer and operating system that can run Word. People are aware of the issues with preserving data on CDs (even if they haven't done much about it), but the notion of preserving the software that can read the data, and the hardware that can run the software, is rarely mentioned.

My initial frustration about this occured three years ago, when I realize that all my saved Outlook Express email was in a proprietary format, and I could not extract it. I wrote an article for a now-defunct website (I kept a copy from Google's cache), arguing that governments should adopt laws requiring all data on government computers to be in open formats. As I wrote then, "The rationale would be preservation: guaranteeing that government data does not become unusable if the application that reads it becomes unavailable. The assumption is that such a law would spur essentially all manufacturers to make their data compliant."

The next year, I created the Open Data Format Initiative website. At the time there was a push among both state and internatonal governments to require open-source software usage. I wasn't working for Microsoft at the time, but I thought this was a false (and unattainable) goal. Using open-source software was a purchasing decision about price, quality, support, etc. that could be evaluated like any other. The real issue should be open data formats. That was the issue that should be enacted into law, to make sure it was considered where it might otherwise not have been.

I wrote an article for the ODFI website explaining "Why Open Data Format Laws Are Better Than Open Source Laws" which picked up a link from Slashdot. This led to some good feedback and a very active email alias, which clarified my thinking on what an open data format really was, and eventually led to my posting an updated sample Open Data Format bill, suitable for use by any government body that wanted to enact it.

I should point out that the origin of the format didn't matter. It was perfectly fine if the format was invented by a single company, as long as it was documented and freely usable by anyone. For example, there was a recent Slashdot discussion of the OpenDocument format, and one of the benefits listed is that it is supported by the OASIS and ISO standards groups. With ODFI, I didn't care a whit about who came up with a format, only what could be done with it. As this commentary summarizes it: "The difference between open 'standard' and open 'format' is profound. The former refers to something widely recognized or approved as a baseline model, where the latter focuses more unrestricted use. Microsoft’s move suggests that they do in fact recognize that the Office 2003 formats are not considered open standards, but could be considered open formats once licensing restrictions on use and modification are removed."

One of the key issues to emerge from the ODFI discussions, it turned out, was patents on a format, something I had not initially considered. That is why it is very important, as Microsoft explains in the FAQ , that the users of the Office XML format are granted a royalty-free license to any current or future patents on the schema. The Office XML format meets the definition of an open format that the ODFI mailing list arrived at.

So what happened with ODFI? As Technorati will tell you, the site hasn't been updated in 503 days. I just got too busy getting Find the Bug finished. And, in November 2003, I went back to work at Microsoft. I just didn't have the time to do ODFI when I needed to work during the day, finish my book in the evenings, and then start blogging here.

It wasn't that I felt uncomfortable running the website while working at Microsoft. In fact, I think that pushing for open data formats is a great strategy for Microsoft. First of all because it is just the "right thing" to do, but also because it blunts any attempts to pursue an "open-source for government" bill.

Now I would like to be able to say that I went back to Microsoft, worked the system from the inside, and now we are seeing the results. But it's not the case. I resolved to push for open data formats on any product I worked on, but Monad doesn't have any data formats (except some minor ones that are user-visible and therefore have to be open). I had no effect on Microsoft's decision to open Office XML. All I can hope is that ODFI influenced somebody who influenced somebody who influenced somebody (like perhaps this Massachusetts senator)...so maybe in a small way I did help nudge Microsoft in the right direction.

This is not the end of the battle, of course. For one thing, there are millions of documents stored in the old Office format, which Microsoft has to keep supporting as the default for a while. This format has been reverse-engineered, of course, but technically any software that reads it may be violating a patent (there used to be an external email alias at Microsoft that people could contact if they wanted official documentation on the Word binary file format for non-commmercial use, but it is no longer active). Microsoft should open that format right away. Then I'll know the company really understands who my data belongs to.

Posted by AdamBa at January 30, 2005 10:16 PM

Trackback Pings

TrackBack URL for this entry:


Why don't you just use the Word object model to convert those old documents to html?

Posted by: KB at January 31, 2005 09:34 AM