9 January 2012
As soon as we started work on Nosql Distilled we were faced with a tricky conundrum - what are we writing about? What exactly is a NoSQL database? There's no strong definition of the concept out there, no trademarks, no standard group, not even a manifesto.
The term originally surfaced at an informal meetup on June 11 2009 in San Francisco organized by Johan Oskarsson. At the session there were presentations from Voldemort, Cassandra, Dynomite, HBase, Hypertable, CouchDB, and MongoDB. The term caught on rapidly and few people would argue that only the databases mentioned at that meeting should be called NoSQL.
Indeed there's often a twist in the name itself: many advocates of NoSQL say that it does not mean a "no" to SQL, rather it means Not Only SQL. On this point I think it's useful to separate an individual database from the kind of ecosystem that NoSQL advocates see as the future. When we say "x is a NoSQL database" I think it's silly to interpret NoSQL as "Not Only" because that would render the term meaningless. (You could then reasonably argue that SQL Server (say) is a NoSQL database.) So I think it's best to say a "NoSQL database" is a "no-sql" database. You should separately interpret the NoSQL ecosystem as a "not only" - although I prefer the term PolyglotPersistence for this usage. 
Even with this matter out of the way, it's still not easy to define a NoSQL database. Does any database that doesn't use SQL qualify? How about older database technologies such as IMS or MUMPS? How about a relational system that didn't have SQL (such as the early Ingres)? What happens if someone manages to bolt a SQL interface onto one of the original septet?
So for our book we took a view that NoSQL refers to a particular rush of recent databases. Some characteristics are common amongst these databases, but none are definitional.
- Not using the relational model (nor the SQL language)
- Open source
- Designed to run on large clusters
- Based on the needs of 21st century web properties
- No schema, allowing fields to be added to any record without controls
While I'm used to the blurry lines of definitions in the software industry, I confess my heart sinks at yet another one. But the important thing is that these databases provide a important addition to the way we'll be building application in next couple of decades. A lack of a clear definition will be no more than a gnat bite on their future successes.
1: If we take the "not-only" interpretation, then we should write "NOSQL" rather than "NoSQL". I almost always see it written as "NoSQL".