The relational model is just a way to represent reality. It happens to have some very useful properties, such as closure over many useful operations — but it’s a purely logical model of reality. You can implement relational operations using hash joins, MapReduce, or pen and paper.
So, right away, it’s meaningless to talk about the scalability of the relational model. Given a particular question, it might be difficult to break it down into bite-sized pieces and distribute it to N worker nodes. But going with MapReduce doesn’t solve that scalability problem — it just means that you will have a hard time defining a useful map or reduce operation, or you will have to change the question into something easier to answer.
There may exist scalability problems in:
- SQL, which defines requirements outside the scope of the relational model, such as ACID properties and transactional semantics.
- Traditional architectures and implementations of SQL, such as the “table is a file” equivalence, lack of sophisticated types, etc.
- Particular implementations of SQL — e.g. “MySQL can’t do it, so the relational model doesn’t scale”.
Why are these distinctions important? As with many debates, terminology confusion is at the core, and prevents us from dealing with the problems directly. If SQL is defined in a way that causes scalability problems, we need to identify precisely those requirements that cause a problem, so that we can proceed forward without losing all that has been gained. If the traditional architectures are not suitable for some important use-cases, they need to be adapted. If some particular implementations are not suitable, developers need to switch or demand that it be made competitive.
The NoSQL movement (or at least the hype surrounding it) is far too disorganized to make real progress. Usually, incremental progress is best; and sometimes a fresh start is best, after drawing on years of lessons learned. But it’s never a good idea to start over with complete disregard for the past. For instance, an article from Digg starts off great:
The fundamental problem is endemic to the relational database mindset, which places the burden of computation on reads rather than writes.
That’s good because he blames it on the mindset not the model, and then identifies a specific problem. But then the article completely falls flat:
Computing the intersection with a JOIN is much too slow in MySQL, so we have to do it in PHP.
A join is faster in PHP than MySQL? Why bother even discussing SQL versus NoSQL if your particular implementation of SQL — MySQL — can’t even do a hash join, the exact operation that you need? Particularly when almost every other implementation can (including PostgreSQL)? That kind of reasoning won’t lead to solutions.
So, where do we go from here?
- Separate the SQL model from the other requirements (some of which may limit scalability) when discussing improvements.
- Improve the SQL model (my readers know that I’ve criticized SQL’s logical problems many times in the past).
- Improve the implementations of SQL, particularly how tables are physically stored.
- If you’re particularly ambitious, come up with a relational alternative to SQL that takes into account what’s been learned after decades of SQL and can become the next general-purpose DBMS language.