This is G o o g l e's cache of http://www.osopinion.com/perl/story/16034.html.
G o o g l e's cache is the snapshot that we took of the page as we crawled the web.
The page may have changed since that time. Click here for the current page without highlighting.
To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:www.osopinion.com/perl/story/16034.html


Google is not affiliated with the authors of this page nor responsible for its content.

OPINION:
Proposal: The Open Data Format Initiative

Send this Article
Print this Article
Related Stories
Contributed by Adam Barr
osOpinion.com
January 29, 2002


The problem is not limited to just Outlook Express. Most programs - word processors, spreadsheets, databases - save user data in a proprietary format.

In This Story:

Avoiding Dependence

Proprietary Problems

Simple Solutions

Just To Clarify

 Related Stories

I use Outlook Express as my e-mail client, and the other day I decided to back up my saved messages. The data files are in a proprietary format, so just copying them would have forced me to run Outlook Express in order to read them.

Worse, Outlook Express only exports data to Outlook or Exchange. Outlook, in turn, has its own proprietary format. It exports to various applications or to a text file, but the exported data is incomplete (for example, it does not save the date and time of the e-mail).

My e-mail data was trapped inside Outlook Express, and there was no easy way to get it out.

Avoiding Dependence

Outlook Express and Outlook -- while severely limited in how they export data -- can import mail messages from a wide variety of competing products. By the same token, competing products can import from them as well.

But I didn't want my dependence on Outlook Express replaced with dependence on another piece of software. I wanted the e-mail saved as plain text and the attachments saved as separate files.

Eventually, I found some tools that claimed to read Outlook Express data files, but like competing e-mail clients, they depended on reverse-engineering the Outlook Express file format. Using them meant trusting that the reverse-engineering was 100 percent complete.

The annoying part was that this wasn't data I was not supposed to read. Had I the time and inclination, I could have cut and pasted each one from Outlook Express into a text editor and saved it. This was my data, but I couldn't easily do what I wanted with it.

Proprietary Problems

The problem is not limited to just Outlook Express. Most programs -- word processors, spreadsheets, databases -- save user data in a proprietary format. They tend to be similarly generous with importing and stingy with exporting. Many offer the option to save in a standard, publicly defined format, but often these do not preserve 100 percent of the formatting or data.

It's no mystery why every application wants to be a data sink; it's a competitive advantage to lock users into your proprietary format. But users are suffering, prevented from manipulating their own data as they see fit.

Worse, in the future, users may have access only to the data files, with no ability to run the program that originally created them. Programmers seeking to reverse-engineer the format of a particular application can use that application to save various pieces of data and observe the file that results.

If a future programmer had no documentation of the file format and only a small sample of data stored in that format, it might be impossible to extract the data.

Simple Solutions

I propose a solution to this problem: the Open Data Format Initiative (ODFI).

ODFI could begin as a place to aggregate information and could offer programs for interpreting proprietary data formats. But long-term, it would have three goals:

  1. Convince software companies to publicly release complete, official documentation of any data format they use. Companies presumably have this information available internally; it is just a matter of making it public.

  2. Design a standard way for describing data formats (an "ODFI description") and a program to validate that a data file conforms to the ODFI description (known as being "ODFI compliant"). I envision something along the lines of Backus-Naur Form (http://burks.brighton.ac.uk/burks/foldoc/99/9.htm) with a validation program that, given an ODFI description and a data file, could give the meaning of any byte in the file.

  3. Work to pass laws stating that all data stored on government computers be in ODFI-compliant data files. The rationale would be preservation: guaranteeing that government data does not become unusable if the application that reads it becomes unavailable. The assumption is that such a law would spur essentially all manufacturers to make their data ODFI compliant.

Just To Clarify

  1. The goal is not to create any new file formats. It is only to ensure that existing formats that store user data are publicly documented. It is also not meant to create automated programs to convert between different formats -- the output of the ODFI validation program would be human-readable, not machine-readable.

    It is certainly possible and expected that others would write conversion programs between semantically similar ODFI-compliant formats.

  2. ODFI compliance is not expected of non-user data, such as level descriptions for games.

  3. ODFI would be agnostic on open source Latest News about open source vs. closed source, with the exception that an open source program might be considered to automatically satisfy goal #1 above, but not the other two.

  4. This is not targeting Microsoft (Nasdaq: MSFT) Latest News about Microsoft. Every company would be asked to comply.

  5. ODFI is not meant to allow circumvention of encrypted data and therefore should not run afoul of the DMCA Latest News about DMCA. The previous statement assumes that a proper encryption Latest News about encryption format depends on strong cryptography and not merely obscuring the details of the file format.

Talkback Forum


Author's background:
Adam Barr worked at Microsoft for over ten years before leaving in April 2000. His book about his time there, " Proudly Serving My Corporate Masters," was published in December 2000. He lives in Redmond, Washington, and can be reached at adamba@gte.net.

Don't get spun by the media. Spin your own...
Have YOUR Tech/OS Opinion featured on OSO!

See Related Stories
Debunking Mac Myths (*NIX Edition)
(25-Jan-02)
Something Missing From Open Source
(24-Aug-01)
Why Open Standards Matter
(31-Jul-01)
Sun Opens Project 'JXTA' to All
(26-Apr-01)
The Coding Compromise: Open Objects
(21-Feb-01)