A r t i c l e s
Navigation

Note: This site is
a bit older, personal views
may have changed.

M a i n P a g e

D i r e c t o r y

FUSDX Standard


Get right to the point, an example FUSDX file:

FUSDX 0.9b

[column names]
date
price
shortdescription
longdescription

[fields]
12-DEC-2002
27.93
red t-shirt
This red t-shirt is a perfect gift \n\ It also comes with a free tie.  

01-JAN-2003
15.79
blue pants
These pants are for the winter \n\ They also can be cleaned with bleach.

03-MAR-2004
8.44
oranges
These oranges taste good. \n\ They are from japan.

FUSDX Standard 0.9b (soon pushing ahead for 1.0 release!)

Todo

  • Make official PDF/html files containing more formal and specific standard details for software developers to follow.

  • Offer a API and DLL for people to use, with static linking object files for GCC/FPC compilers.

  • Offer code examples and demos for programmers, in all languages such as Cee, modern pascal, C++, smalltalk, php, ruby, java, lisp, and many many more

  • open a project website and bug reporting system with forums and mailing lists

What is it

FUSDX is unlike CSV or XML in some ways, and similar in others. FUSDX is meant for storing square database-like data, and is superior to CSV in many ways.

FUSDX is superior to CSV since the data can be glimpsed by the human much easier, and parsed by computers much easier.

CSV files become unreadable quickly as more data is crammed into them, while fusdx files remain readable. In CSV the delimiters, enclosers, and line feeds distort and obfuscate the data. In FUSDX one cell is not spread across multiple spanning lines if it has carriage returns. In CSV it is, which distorts the entire CSV file since it breaks apart the rows. Rather the new lines are stored as special embedded /n/ character sequences in FUSDX.

FUSDX is meant for web services and data feeds, and common data storage. It can be used for data dumps and safe portable storage of database tables.

The \n\ signifies a NEW LINE. The blank lines signify a new row has been started. \n\ was chosen instead of \n for readability. The fields of each row are in blocks, whereas in CSV they are rammed together and obfuscated with commas (or other delimiters) and quotes (enclosers).

By opening up a FUSDX file, the person can immediately recognize the data and layout. There is no need to even use a GUI excel style viewer tool since the data is easily readable in a text editor. Of course, excel style viewer tools should incorporate the FUSDX standard in the future as the format catches on.

FUSDX is superior to XML in cases where data does not need to be "described" with tags. For example Shopping cart dumps or web services rarely need obfuscated and verbose XML files since they are just exporting square data from a database. In some cases, XML is well suited for documents, when the data inbetween the tags is more dominant than the tags itself. Many times, XML is overengineering and people choose the poorer CSV format. In these cases, FUSDX solves a lot of the world's and the web's problems.

FUSDX is not meant to be an XML killer, but rather a different tool for a different job. It may be considered an XML killer however in cases where people are already using XML for the wrong job. And many of these cases exist.

Real world examples of XML abuse

...some rationale, and why FUSDX can solve many XML problems:

There is no need for an entire shopping cart dump or a wikipedia dump or a amazon services dump to be in XML. Those data feeds from shopping carts and etc dump square database fields, not documents. A single Wikipedia article can be dumped using XML, but it is ridiculous to dump the entire wikipedia database as one large XML file.

Converting a database into XML just for the sake of using "the latest XML technology to satisfy developers" is ludicrous, especially where the data is square and structured. Realistically, most web services end up coming from databases, not documents. Documents should be the end output of data, not the API that users connect to to grab data fields. If users are connecting to individual documents with lots of prose text on them (or verbose text between tags) then XML can in fact be suitable and elegant. In most cases, users connect to large dumps of data, not single pages with lots of prose on them.

As a bonus, FUSDX is also extremely efficient and easy to parse compared to XML. FUSDX is not a replacement for XML when XML really does suit the job, it is just a different tool that we feel people have been missing and in many cases using XML for.

Similar Formats

SDS (simple data storage) is a little text table system written by Vladimir Sibirov which has a similar format. However, FUSDX is different than SDS because FUSDX focuses on being a general purpose export format, while SDS contains some database features such as storing the last insert.

While SDS is being improved with more features to be somewhat like a database, FUSDX is not meant for that. FUSDX is an export format dedicated to exporting just like a RSS/XML/CSV feed is dedicated to exporting and is not good for actually utilizing as a database itself.

Is FUSDX a Database?

FUSDX is just a text storage standard. FUSDX is not a relational database itself, just as a CSV file is not a relational database. FUSDX is for exporting data. Those who choose to write and read to FUSDX files as if they were a database would be doing similar as those that use CSV files for databases (bad choice).

FUSDX files are not meant to be databases, rather they are just storage or export formats. One can export a database or a table to FUSDX and this is what it does well.

Tab/Semicolon/Comma Problems

Much data gets corrupted or misaligned when using CSV files that are delimited by tabs, commas, semicolons. FUSDX files do not suffer this horrid problem which CSV files have. CSV files plague many developers who import or export data, due to problems with escaping the delimiter. Due to the way FUSDX files are designed, this horrible corruption/misalignment problem that CSV has is eliminated. Tabs and commas and semicolons can exist in FUSDX files without any problems.

Are Line Feeds Error Prone?

Consider you download a FUSDX yahoo or google data feed with 3 specified columns:
  date
  price
  descript
Say a human makes a mistake: carriage return inserted in the cell data (fields):
  13-NOV-2007
  24.66
  This is a nice T-Shirt it has
  lots of interesting advantages, on this new line here.

  12-MAR-2007
  24.66
  This is a pair of pants, they are nice on this single line.
Then the parser detects that the first set of columns from November 13th contain 4 lines! But it should only contain three since the FUSDX schema definition at the top has defined that there are only three columns per each row. So the parser stops and notifies of an error, or asks the user what to do.

It is easy to tell that an extra carriage return is inserted because the row should only contain three columns as defined at the top of the file. The number of columns is known before the cell/field parsing begins, since it is defined in the [column names] section at the top (consider this a schema or definition zone).

The corrected file is:

  13-NOV-2007
  24.66
  This is a nice T-Shirt it has /n/ lots of interesting advantages

  12-MAR-2007
  24.66
  This is a pair of pants, they are nice.
It may be hard to debug a file if a horrible inexperienced human has gotten a hold of FUSDX file and inserted all sorts of carriage returns and text. However this is a problem with any format. An XML file can have data in the wrong tags. CSV files can have misaligned cells or missing escape characters. Even worse problems exist with CSV and XML since a CSV or XML file becomes unreadable and obfuscated as it gets more complex. An XML file is even obfuscated when it has less data in it. FUSDX files remain readable no matter how much data is added. No complexities exist in FUSDX as it is designed on KISS (keep it simple silly) principle.

Is FUSDX Ideal?

FUSDX is not perfect, since perfection does not exist. But it works, it is readable, and is better than CSV. Even if a FUSDX file really got mucked up with carriage returns added by a errornous human, you can still usually repair it easily since all you have to do is find the problem area containing more lines than it should (5 columns in a table that should only have 3 is easy to spot, due to the top to bottom beauty of a FUSDX file). This can be detected by the parser automatically, or even just by the human looking at the file in many cases. The same cannot be said for CSV due to all sorts of nesting/escaping complexities that render CSV files unreadable to humans.

Glossary/Definitions

  Field: a cell of data

  Cell: same as a field

  Column: like a column in a database (the vertical part containing cells)

  Row: like a row in a database (the horizonal part containing cells)

  FUSDX: Format for Universal Data eXchange

FUSDX Format Sketch

FUSDX VERSION X.Y.Z
// EMPTY LINE HERE
[column names]
COLUMN1_NAME
COLUMN2_NAME
COLUMN3_NAME
// EMPTY LINE HERE
[fields]
row1 column1 cell
row1 column2 cell
row1 column3 cell
// EMPTY LINE HERE
row2 column1 cell
row2 column2 cell
row2 column3 cell
// EMPTY LINE HERE
row3 column1 cell
row3 column2 cell
row3 column3 cell

Mailing List/Website

FUSDX will have an official website and mailing list. Please bookmark this page and check back. The mailing list links and website location will be placed here in this page ASAP.

About
This site is about programming and other things.
_ _ _