non RDBMS, is it what web application needs?
January 6, 2009 § Leave a comment
Google, with its BigTable and Amazon’s S3 certainly paved this trends.
And nowadays, there are many emerging distributed file storage and key-value databases.
To name a few of those:
- HadoopFS – Hadoop’s file system. Written in Java.
- Hypertable – based on Google’s BigTable.
- MemcacheDB – its benchmark claims 18868 writes per second. It uses BerkeleyDB and based on Memcache. I will definitely keep an eye on this one.
- Ringo – Another key-value storage.
What would be the advantage and disadvantage?
Fast writes is certainly one of the big advantage of these systems. On top of that these systems are built to handle high concurrency by being simpler than RDBMS.
Not having SQL parser overhead is also another obvious advantage that immediately brings a disadvantage. It’s hard to perform various different reporting scheme without SQL. I would imagine that a lot of reporting attributes would have to be stored as part of the data so that reducer() function can be applied on them. That requires a lot of forward thinking.
I would not count “not having SQL increases learning curve” argument as disadvantage. Most of these storage systems have simple API calls for programmers to use.
What kind of applications would benefit from such storage mechanism?
Usage for Photo and Video sharing sites is no brainer, each media file can be keyed to its hashed filename and stored in one of those distributed storage.
All sorts of publishing applications, such as blog, wiki, or forum can also use the same technique.