A r t i c l e s
Navigation

Note: This site is
a bit older, personal views
may have changed.

M a i n P a g e

D i r e c t o r y

Xml Is Annoying


-XML is annoying for many reasons. Not just verbosity alone. 
-XML is annoying because it's hard on the eyes.
-XML is annoying because it contains too many tags for which could have been
 organized via squares or carriage returns.
-XML is annoying for the same reasons HTML is annoying. 
-XML is annoying because it contains too many less than and greater than symbols.
-XML is annoying because it is a longform, rather than a shortform
-XML is annoying because it is claimed to be "easy to read for humans", while at the 
 same time in practice it is "hard to read for humans".
-XML is supposed to be human readable, but the verbosity is so hefty that it is not
 human readable. 

However, we'll give XML some respect: larger paragraphs of text and bigger snippets of text are easier to read in XML files, than say smaller XML settings files. I noticed this when looking over FPDOC xml files. There is actually some XML in FPDOC xml files that I can read, because the snippets of text are larger. However, the fact still remains that bigger files with bigger snippets of text may be better off in a database anyway. So it seems a no winner: xml doesn't work for smaller files where ini of cfg files are fit, and xml doesn't work for larger snippets of data because it should be in a database. Understand, that this is not coming from a database weenie either.

I still feel, for example, that FPDOC could be managed via a database.. and that Lazarus LPI files could be done via CFG or ini files. For example, if Lazarus LPI files were config files.. they would offer what disadvantages? What advantage does XML offer for config files? You don't have to be 'standards compliant' for config files. No one from a standards committee is going to spank you if you use nice looking and easy to read config files. For example, consider apache config files.. which are elegant and easy to read. Who's spanking apache? No one.

The problem with XML is that people are using it for settings files, smaller snippets of data, data files, real time data feeds - but no one actually has proven that there is any advantage to using XML for these situations. XML is an alternative that does the same thing as config files and tab delimited feeds, usually.


XML is annoying because it's verbose and hard to read, and the verbosity seems to lessen the claimed benefits (it's supposed to be human readable, but the verbosity is so hefty that it is not easily human readable). With assembly code or perl, it may be hard to read, but it at least offers you benefits. XML is hard to read, is in long form, is parsed slow (not just because it's verbose, but because text is harder to extract out of it from a programming perspective).
When you open up a spreadsheet or a database file which is small or large in size, it's easy to read - no matter what the size of the file. With XML, as the file grows, it becomes harder to read. XML is supposedly "safe" because it's so verbose. i.e. nothing could possibly get corrupted in our text file, since XML wraps everything up so safely, right? If we use this much safety around our data, surely the data will be extra safe now, right?
When you open up a source code file, hopefully it is easy to read in your programming language. But with XML, this is not the case. From both the computer's perspective and the human's perspective, XML is hard to read, slow to parse, hard to organize, hard to store great quantities of data. Well, XML does get easier to read when there is more text such as paragraphs between tags.. but these paragraphs of text really should already be in a database! With XML, just opening the file and trying to read small files, or files with lots of one line text between tags is quite disturbing. Since many XML files are hard to read for humans, it should have some advantages then, like fast parsing speed? No, it's generally not that fast to parse. It is actually easier to write a parser for XML in some cases though, since it's so dumb and simple with the closing and opening tags. Parsing the file is annoying though, because the verbosity of the file takes a lot of bandwidth and memory (on the internet, bandwidth and speed do count.. Google for example is fast, and everyone likes that). Even on today's new processors, XML uses quite a bit of hefty bandwidth.
 1. The more dumber the text, the more easier it is for
    humans to read. 
    False.

 2. The more verbose the markup, the more easier it must
    for humans to read. 
    False

 3. CSV is not cross-platform. XML is.
    False

Comments:
If CSV is not fit, solutions in the future such as (USDXFormat) will solve that problem, but CSV or pipe delimited files could still be used in many cases where XML is used currently.

The argument that "but INI files and config files are simple, XML is for more complicated data" is false, because for more complicated data generally XML just pushes the limits, and a square storage such as a spreadsheet/database/table fits the solution.

The "but that's just a database weenie speaking" argument is false, because XML is generally used in situations where huge chunks of data are being transferred, or it's used or in small chunk situations. And those are the perfect situations for a database, USDXFormat, and INI files.

The "But with XML, everything is XML. There are no INI files, config files. If everyone goes and makes their own file formats, then we have no common standard to follow. XML is consistent everywhere, as XML" argument is ludicrous. Then what's stopping you from using USDXFormat everywhere, even in place of INI files? (This can be done, but really it's not recommended.. because there are some situations where a standard INI file will work, and does work. And INI files generally are a standard format.) Plus, who says that using XML for everything text file storage related is a good thing? Why use HTML and text files at all then? Shouldn't they also be XML files? Different tools for different needs. No one is saying we all should use different INI file formats or USDXFormats. No, those are a standard too just like XML is a standard.


Misuse of XML:
XML is quite often used in situations where small tiny chunks of data are being used. CoLinux used XML for a while but then reverted back to a standard linux style config file. They realized that XML was harder to read and didn't help their users. Config files are generally a standard format that everyone across platforms knows, and in CoLinux's case, config files were more readable than XML files. Yet XML intends to be human readable and cross platform. True, XML is cross-platform and somewhat human readable, but config files are more human readable and also cross platform.

XML is used often when big chunks of data are being transferred and stored(Example Amazon Web Services). A CSV file might not perfectly fit the needs for Amazon Web services, but isn't the data fairly consistent? i.e repetitive similar data over and over again. So the XML becomes redundant.

Book description, author, price. 
Book description, author, price. 
Book description, author, price. 
etc. etc. etc. 
What is the reason that we need an XML file for the above?
This is the perfect application for a database style file, since data is simple and square in structure.. not hierarchical)
XML claims to be "easy to read" and yet it is also noted by many as "hard to read". This is an outright clash. Clashes are dangerous.

Rather, saying that XML is easy to read is a mistake due to humans assuming that the more verbose something is the easier it is to read.

XML is also hard for a computer system to read, i.e. slow.

"But today's processors are getting faster" has nothing to do with the problems with XML.

XML is annoying, not just slow and verbose. The annoyance factor of having to type out descriptions of data that you already knew what was. The Annoyance of taking on a risk of making the data even harder to read than when it was in SQL, INI, CSV, binary, or config file format.

If XML is so great, why have things like clear easy to modify config files in GnuLinux? Is it possible that some times in small and large applications, cfg files are a simpler format and easier to read and maintain? Yes.. we already know that. But we already do seem some application developers converting all their settings files over to XML. But there is nothing wrong with using cfg files for one application, and a database for another application. It does not have to be unified to XML. Is it possible that in medium-large applications, a CSV file, embedded database, or FUSDX file would be an answer?

XML solves all X problems only if you believe so. Since it's so unstructured, it can appear to solve many problems. But a swiss army knife sometimes just doesn't have the torque to loosen a bolt that's stuck. The swiss army knife becomes a flimsy showpeice which manages to crack some cork screws, but not a true tool, or a true problem solver. But with this universal non-structured language, there also comes serious side effects.

Would you rather space out a file by hitting the enter key or by typing every time you wished for a new line? That's another problem with XML: you can't design an editor program for XML since XML tags are rather created on the fly. The tag style syntax of XML stays constant, but the tag names in the syntax are always changing depending on the project. A database doesn't use tags.. just squares. Editing a database is much easier than editing an XML file. You just plop your cursor into a square and edit it. Make a comment column in the database, and there you have your "tags". Yes, XML files are more unstructured and therefore may seem to offer numerous advantages.. but what is more important to you? Editing a neat and clean database, or a text based jungle of tags?

If "todays computers are getting faster" then why can't we make these computers smart enough to parse data that doesn't need to be verbose? Doesn't it make sense that if a computer is smarter and faster today, that we should also have smarter and faster tools to help us think up better languages that read clean and precise? Our compuers are capable of parsing INI files and accessing databases, and config files, and FUSDX files.. and they are smart enough to do all this. Why if computers were dumb, and couldn't do all this.. it may make sense to use something like XML. It would be even wiser for people to say "today's computers are getting smarter and faster, and so is today's data". Not "today's computers are getting smarter and faster, but today's data is getting slower and dumber. So hardware sales are increasing."


From a Pascal perspective, XML is annoying for the same reason this WOULD BE annoying if this was enforced in the Pascal language:
< procedure > GetInfo < /procedure >
 < directive >begin< /directive >
  < Main Code block >showmessage('Application started'); 
 end;
The problems: your eyes already know it's "the main code block" because you were smart enough as a human being to realize this by it's visual layout. The XML doesn't help the human utilize automatic visual layout pointers. With XML you have indentation and tags... so it gets to be a bit overkill. With Python, you have just indentation alone, so that gets to be underkill. Why not something inbetween, shall we? Compare XML to something like so:
 Data1 Info1;
 begin
  cat1('Application started');
  cat2('The tallest building');
 end;
We are just discussing shortforms here. Playing with shortforms and indentation, that is all. The problem with XML is that it is a longform, not a shortform. True, some shortforms are extreme and not easy to read.. such as a regular expression syntax, some C code, Perl code. But shortforms like CSV, Java, Pascal, Python, Smalltalk.. these all seem to have a nice ring to them in some sense. And they are not long forms, compared to XML. Both easy to read for the human, and the computer. But XML? It isn't.

So why do we need to have overly verbose tags wrapped around data, when the majority of data layout is already obvious? For example, data is squarish, blockish looking. Human beings have a high enough IQ to figure out that a line in a spreadsheet separates the data.


XML tries to define everything as something inside a tag. This is constraining, because everything must be inside a tag, not matter what. Languages who use different characters and reserved words throughout, are much less constraining, and much more easier to read. So now we agree that XML is constraining, even though some may be under the impression that it's a "universal swiss army knife". Constraining, but universal? Clashing..
Most likely people have adopted XML due to the fact that it's something that works right now, and we haven't got a database solution which is slightly better than CSV. In cases of documents and paragraphs, CSV doesn't seem all that useful to people. But how often do we really use XML for document situations? (FPCDoc is an exception). Isn't XML used mainly for data? People act as if we can't improve CSV. XML just happily replaces CSV, and there is nothing else too it? Well, sorry, but there must be a way to extend CSV, or create something similar to CSV but better. We can't just stop at XML and assume it's the solution to all of our DATA transfer problems.
Why do people choose XML?
 -CSV is not easy enough to edit in a text editor
  A more vertical solution would be easier, since people
  tend to read from top to bottom.
 -CSV has been known to have delimitter problems.
 -Database A isn't portable with Database B.
 -some people have no clue what CSV or Fusdx is.
 -XML syntax is like HTML. People who like HTML should
  like XML. 
But we should note that if HTML is for documents, maybe XML is also best fit for documents, not neccessarily data feeds and settings files. Sure, XML is more complex than HTML, but it still holds similar restrictions like HTML. You've got tags, and tags, and tags.. but nothing else. No reserved words, no language. Just tags. Has this been studied or proven, that just tags, tags, and tags are the best way to get data from one place to another (that seems to be the main use of XML). Can tags do and be everything? Surely our lives must be more complicated than just tags.. Tags can't be the ONLY thing we should base a text feed on. How about reserved words? How about file structure, more reliance on visual indentation? Who has studied or proven that LOTS OF tags are neccessarily our best solution? HTML may be wonderful from a short term parser writer's perspective.

HTML and XML are fairly easy to parse, in a short term sense.. but in the long term, a slightly more complicated language or file format may benefit us greatly. Are there XML weenies out there who just use XML because the parser was easy for them to create, versus them having to create something like an SQL database file format? "It's so simple just use XML" and "it's so complicated and proprietory.. Oracle and SQL". But doesn't the extremely complex parser pay off in the end? CSV is too simple, XML is too simple. Why not use a format that is more complicated than CSV and less TagIsh than XML, then? Not in existance yet?


Why stop at XML and CSV? We can exchange databases without using XML or CSV.
 -Databases are too LARGE and incompatible with each other.
 -XML is too tag heavy symbolic and verbose. 
So think of SMALL databases and text file dumps. Dumps that are slightly more advanced than CSV format, but not outrageously redundant and verbose as XML. Think of low mark up, easy to read files, easy to parse files, with no incompatibilities between databases. Think of being able to embed a database in to your application without having to ship the database software with the application. Think of a file format more readable and less error prone than CSV, less delimiter error prone, less verbose than XML, less tagged than XML.

There is FUSDX, CSV files, Ini files, Embedded Databases..who don't claim to be the answer to everything, like XML does. CSV isn't extended enough, so something needs to take over. And, XML still may be used too.. for situations maybe like FPDOC, where you are describing documents. There may even be a use for XML after all. But then again, if FPDOC could be done via a database, why not use that database? At least there are options.


See also: XmlSucks, XML vs Embedded Database, USDXFormat.

About
This site is about programming and other things.
_ _ _