A r t i c l e s
Navigation

Note: This site is
a bit older, personal views
may have changed.

M a i n P a g e

D i r e c t o r y

Google Does Not Get Relational


Google (at least many of its top employees) don't understand the relational model.

http://labs.google.com/papers/sawzall.html

No no no! Confusing the logical relational model with physical limitations.. I wish I could send this to Fabian Pascal so that he could have a tap at the above naivety.. but his dbdebunk.com website and email address is currently not so active or responding to requests.

The above shot is a picture of Google's Sawzall PDF book that is full of ignorance (not to mention also full of brilliance.. but critical analysis must start now!).

Traditional file systems couldn't store as much as GFS (google file system) can. Does that mean we do not use files at Google because traditional file system techniques don't work? No.. the file system techniques are still valid (a file is just a model).

The relational model (or a relational technique) does not define how large physically a database can be, nor does it define how many computers a relational database can span. Google employees do not separate physical vs logical concerns in the article.

Relational surely can't work for them, because relational has physical limits! Right? Nope. Nope. Nope. Nope. Relational techniques do and will in fact work on massive data sets - it's just the current products available that may fail, just as current file systems on personal computers would fail for google's needs!

Rob Pike et al are implying the relational model and relational techniques have some physical limitation - which is pure BS. The products, some of them, do have limits.. but not the actual relational TECHNIQUES.

If anything, Google should be the smart ones who realize that they need to have a distributed google relational database (GRD) in addition to a distributed google file system (GFS).

Traditional Database Techniques?

Pike et al say:
"These large datasets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database."

Ahh, if only Pike and friends would stop confusing the relational model with physical implementation! The logical relational database does not define whether or not one gigabyte or twenty thousand petabytes of data will fit in it! That is a physical implementation detail.

Clusters of google relational data (GRD), if the staff at google had a fricking clue about data management, could be a solution to their big data problem.

Existing crappy products that only allow 34 clustered database servers (or whatever other limits maximum) is something smart staff at google should work around, just as they worked around the limited file system when they created GFS (google file system). The relational model doesn't define whether data can be stored on 400 or 50,000 clusters.

Everything Was a File Though!

That is so 1970's!

An arrogant, stubborn attitude such as "everything is just a file" is exactly the reason the relational model is being reinvented in GFS (google file system) and Sawzall/Mapreduce.

A Solution To Google's Problem

Humorous, but also serious.

How many times shall I repeat it: relational model and relational technique does not define whether or not data will fit in it as Rob Pike and friends seem to think. A relational database (custom made by a company like google) could span across as many computers as they want.. similar to how GFS (google file system) spans across multiple machines.

I suspect Rob Pike et al are stuck in their old mode of physical file thinking from Bell labs and Unix. Rob Pike worked at Bell Labs for many years where everything was a file. That was a nice abstraction - but Rob (and friends) have to get over this arrogant everything is a file that can be grepped nature.

You see, files are just poor database entries!

Date, extension, file name. 
Date, extension, file name. 
See a pattern?
Consider the following structures in a typical file system:
FileName      Date         Size
---------     -----        -----
foobar.txt    09/08/2004   8923 
another.txt   07/03/2003   21604 
What do we have there? A relational variable (table) holding several descriptions of files. But it is not an extensible RelVar(table). It's a poor relvar (table). What does the data relate to? Well, there are files that have dates and sizes, the file name, and the file content is located somewhere. It's just a poor form of a database, with limited query abilities on the size, date, filename, etc.

A proper query language is what: grep? Rob thinks so, because of his unix roots. But think about using grep in addition to using a query language (which a relational database (GRD) would offers). Utilizing both - grep on your result set along with a query language, is of course possible - not just grep alone.

Repeat it Once More

What we are seeing over and over again, in history, and in the future, I repeat: a File System used for data storage is just a poor reinvention of a relational database, with very limited attributes (columns): date, size, time. Then people add after hacks like grep to scan the entire file system in their own inefficient way, and the customize it and optimize it and end up with a poor reinvention of the relational model marketed as some other product. Google Grep, Google File System, or whatever it may be.

Wouldn't it be nice if we could expand our file system to contain more than just a date stamp, file name, and extension when we need it? What if we needed a few other structured attributes (columns) such as a "parse count", "processed already boolean" and "filtered count", "last analyzed date". If we really want a powerful file system, the file system needs to become a database. This will be really hard for Pike and friends to understand though, due to their everything is a file nature).

It reminds me of the ones who still think in "Standard Pascal" or "Basic Language (goto line number)" since those habits were ingrained in them at an early age. Whatever people grow up on, they seem to hold with them. Being stuck in this mode that "everything is just a file that can be grepped" is bad, I'm afraid to say.

Repeat It Again

This everything is a file ingraining they had while growing up on Unix and at Bell Labs just causes them to reinvent the relational database over and over and over again.

See my solution to their problem:

http://z505.com/images/relational-parallel-sop.png

Pike and friends simply don't understand data management essentials. They only understand C programming and unix (and a couple other things). Whether they realize it or not, all data management reinventions, including an extended file system that spans across several machines, is just reinventing a relational model under a different name - without all the advantages of the relational model (until 10 years later, when they re-implement every relational feature into their file system and continue to call it a file system).

Compliments, but Criticism

Pike and friends are smart - but they aren't smart enough. If they'd study the relational model they would realize that relations are not about physical storage - but a logical issue. Whether or not existing database products available on the cheap have physical limits or not has nothing to do with the limitations of the relational model! The relational model only discusses relations and how these relations are useful to us - not physical storage limits! It is Google's (or whoever's) job to bust the physical limits of existing products.

Rob Pike and friends are posting a straw man article for me to burn down fairly quickly. And I will mention once again, that their article is not pure dumbfounding - it is brilliant, but they are not brilliant enough. Hence why I am criticizing their PDF file. I am simply pointing out that databases can be improved to span across several servers too, and this physical limitation Pike talks about has nothing to do with the relational model or the traditional database techniques such as query languages!

About
This site is about programming and other things.
_ _ _