It’s just data

Dare Takes a Look at CouchDB

Dare Obasanjo: Recently I took a look at CouchDB because I saw it favorably mentioned by Sam Ruby and when Sam says some technology is interesting, he’s always right

Dare’s review of CouchDB is worth a read.  (Update: so are Assaf Arkin's and Damien Katz's responses)  He gets more things right than wrong.  And he doesn’t get things wrong so much as he has a tendency to make unqualified statements that need to be qualified.  Like statements that things that are interesting to me tend to be interesting to Dare (but to my kids?  Not so much).  Another example:

One thing that not so interesting is that editing documents is lockless and utilizes optimistic concurrency which means more work for clients.

That’s definitely a statement that requires qualification.  Perhaps one like this one:

Document oriented database work well for semi-structured data where each item is mostly independent and is often processed or retrieved in isolation.

While that is a good qualification, it errs on being a bit too restrictive, particularly when Dare follows up with:

However there are also lots of Web applications that are about managing heavily structured, highly interrelated data (e.g. sites that heavily utilize tagging or social networking) where the document-centric model doesn’t quite fit.

Prior to the web, most hypertext theory centered around bidirectional links and fixed schemas.  Approaches that wouldn’t scale and don’t easily evolve.  By contrast, the web is made up of sites that independently update so that Dare can post to his web site without requiring anything like a lock that would affect me posting to mine.  Both of our sites enable comments, so there is a limited ability for others to post things, but this tends to operate at such a low rate (dozens of updates per day) that optimistic concurrency isn’t much of an issue.

And yet search engines like Google and approaches like map/reduce show that such sites can be reasonably indexed.

Concrete example, from the social networking space.  My Facebook profile could be viewed as a document.  One with one primary author and limited abilities for others to modify it.  And yet things like the News Feed could easily be produced by a map/reduce job.  In parallel.  Across a large cluster of commodity machines and is highly scalable manner.

To get a perspective on why this is important, consider that I started looking at this from the other side.  What happens when your application grows so large that you have no choice but to massively employ techniques like sharding?  What do you have to give up?  What do you need to add back in in order to mitigate the loss of the things you give up?

At a certain point, referential integrity has to be given up.  Scale a bit further, and even the notion of a relation in the relational database sense of the word starts to break down.  To cope, you denormalize a bit, not so much for performance reasons (though that’s important too), but as a self defense mechanism so that the pieces of data that you do have have enough context to be meaningful.

What replaces a Department table in a typical Company/Employees database (or one that identifies a Group in a Facebook like appication) when faced with the prospects of mega-sharding?  The CouchDB answer is views, ones that are computed by map/reduce jobs that essentially extracts (or maps) “tags” and “social relations” from profile documents and reduces them into documents of their own right.

This leads to

although focusing on JSON instead of XML makes it buzzword compliant

Dare, you say this like it was a bad thing :-).  Do you really want to continue to program with Circles, Triangles, and Rectangles?  Or would you rather your program looks something like this?

And then to:

and is definitely not a replacement/evolution of relational databases

What I have come to realize is that the very things that make J2EE and Relational Databases suitable for Enterprise scale applications are the very things that act as road bumps on workgroup scale and on web scale applications.  Simply put, relational databases will get squeezed on both sides.

Footnote: as I was writing this, I saw Chuck Vose's take.  His first bet takes some of my thinking to its logical conclusion, but he doesn’t yet see what I see in CouchDB.  Perhaps this post will help shed some light on why I think CouchDB is in line with my other bets.  His second bet goes off the rails [heh] a bit with:

And I realize that this is all possible in the REST model, but it makes the controllers obscene sometimes.

I’d like to put forward another possibility.  While DHH is enamored of REST (and I deserve a small bit of the “blame” for that) his views on “stored procedures” is widely known (search for “Choose a single layer of cleverness").  Perhaps the map/reduce abstraction might just cause him to give a little on the latter in order to maintain the former.

A closing thought: couch.ini talks about a "JsServer”, but in reality any language that can evaluate a view, read from stdin, write to stdout, and produce and consume JSON could be used.


Some Thoughts on CouchDB and Relational Databases

Some Thoughts on CouchDB and Relational Databases And Sam Rubys response is good too...

Excerpt from Application Error at

[from topfunky] Sam Ruby: Dare Takes a Look at CouchDB

[link]...

Excerpt from del.icio.us/network/kevinmarsh at

Sam Ruby - Dare Takes a Look at CouchDB : "What happens when your application grows so large that you have no choice but to massively employ techniques like sharding? What do you have to give up? What do you need to add back in in order to mitigate...

Excerpt from Tim's Weblog at

Sam Ruby responds to Dare Obasanjo on CouchDB

[link] [more]...

Excerpt from reddit.com: programming - newest submissions at

he very things that make J2EE and Relational Databases suitable for Enterprise scale applications are the very things that act as road bumps on workgroup scale and on web scale applications.  Simply put, relational databases will get squeezed on both sides.

They won’t.  Fact is, apart from Alphora Dataphor they haven’t even arrived.  But even non-relational SQL won’t get squeezed so fast, and when it does, it will by proper relational systems.  It takes a data model to substitute a data model, not only an application-ſpecific technology implementation.

Note: I manually marked the link above as "nofollow" and changed it to point to Google's "interstitial" page.  Is it spam, or is it real?  You decide. — Sam Ruby

Posted by Leandro Guimarães Faria Corcete DUTRA at

Sam Ruby: Dare Takes a Look at CouchDB

[link]...

Excerpt from del.icio.us/tag/couchdb at

System overload

Erlang is highly concurrent. Damien is not. CouchDb has been getting tons of interest. I can barely keep up with it. I wish I weren’t so busy so I could actually respond to some of the stuff people are......

Excerpt from Damien Katz at

Some Thoughts on CouchDB and Relational Databases

Some Thoughts on CouchDB and Relational Databases And Sam Rubys response is good too (thanks johan )...

Excerpt from Kiyo's Tumblr at

Thoughtstack

... [more]

Trackback from Eighty-Twenty

at

Oh crap, I just invented Prolog

Following a link trail that started with a discussion of CouchDB , I just found this old comment posted by Bill de hÓra : Perhaps we can go top down - write smart analysers to dynamically denorm data based on usage patterns; indeed database...

Excerpt from Messages not Models at

links for 2007-09-15

RubyForge: Ruvi: Project Info VIM clone in Ruby ONLamp.com — An Introduction to Erlang (tags: erlang programming tutorial) Handwriting on the Sky - The xUnit Paradox (tags: testing api) Stevey’s Home Page - The Emacs Problem (tags: lisp...

Excerpt from Mike Does Tech at

Bigbig Linkdump for Sept 17, 2007

Prison Planet: 9/11 First Responder Heard WTC 7 Demolition Countdown Stevey’s Home Page: The Emacs Problem  Andy Matuschak: Getting Started with Cocoa: a Friendlier Approach ONLamp.com: An Introduction to Erlang Haskell-cafe: MonadGL -...

Excerpt from have browser, will travel at

Sam Ruby - Dare Takes a Look at CouchDB : "What happens when your application grows so large that you have no choice but to massively employ techniques like sharding? What do you have to give up? What do you need to add back in in order to mitigate...

Excerpt from Tim's Weblog at

Quote of the Day

At a certain point, referential integrity has to be given up. Scale a bit further, and even the notion of a relation in the relational database sense of the word starts to break down. To cope, you denormalize a bit, not so much for performance...

Excerpt from Cafe con Leche XML News and Resources at

Database indexes are less useful than you think

An index helps you find an item without scanning all of the data. David DeWitt has made comments opposing index-light systems such as MapReduce, SimpleDB, and CouchDB. David DeWitt failed to tell us about your schemas falling apart as you scale up....

Excerpt from Daniel Lemire's blog at

Sam Ruby: Dare Takes a Look at CouchDB

Sam Ruby: Dare Takes a Look at CouchDB : "What replaces a Department table in a typical Company/Employees database (or one that identifies a Group in a Facebook like appication) when faced with the prospects of mega-sharding?� The CouchDB answer is...

Excerpt from Lagado Notebook at

Add your comment