Database for a Zoo: the problem and the solution

Let’s say you’re operating a zoo, and you have this simple constraint:

You can put many animals of the same type into a single cage; or distribute them among many cages; but you cannot mix animals of different types within a single cage.

This rule prevents, for example, assigning a zebra to live in the same cage as a lion. Simple, right?

How do you enforce it? Any ideas yet? Keep reading: I will present a solution that uses a generalization of the standard UNIQUE constraint.

(Don’t dismiss the problem too quickly. As with most simple-sounding problems, it’s a fairly general problem with many applications.)

Continue reading

Why PostgreSQL Already Has Query Hints

This is a counterpoint to Josh’s recent post: Why PostgreSQL Doesn’t Have Query Hints. I don’t really disagree, except that I think that there are many different definitions of “hints” floating around, leading to a lot of confusion. I could subtitle this post “More Terminology Confusion” after my previous entry.

So, let’s pick a reasonable definition: “hints are some mechanism to influence the SQL planner to choose a better plan”. Why did I choose that definition? Because it’s the actual use case. If a user encounters a bad plan, or an unstable plan, they need a way to get it to choose a better plan. There’s plenty of room to argue about the right way to do that and the wrong way, but almost every DBMS allows some form of hints. Including PostgreSQL.

Continue reading

Temporal PostgreSQL Roadmap

Why are temporal extensions in PostgreSQL important? Quite simply, managing time data is one of the most common requirements, and current general-purpose database systems don’t provide us with the basic tools to do it. Every general-purpose DBMS falls short both in terms of usability and performance when trying to manage temporal data.

What is already done?

Continue reading

Scalability and the Relational Model

The relational model is just a way to represent reality. It happens to have some very useful properties, such as closure over many useful operations — but it’s a purely logical model of reality. You can implement relational operations using hash joins, MapReduce, or pen and paper.

So, right away, it’s meaningless to talk about the scalability of the relational model. Given a particular question, it might be difficult to break it down into bite-sized pieces and distribute it to N worker nodes. But going with MapReduce doesn’t solve that scalability problem — it just means that you will have a hard time defining a useful map or reduce operation, or you will have to change the question into something easier to answer.

Continue reading

Temporal Keys, Part 1

“Schedule conflict” — it’s one of the simplest and most common constraints for business or any other organization. One person cannot be in two places at the same time; and in many cases a only a single person can use a given resource at any time.

Does your database system know anything about a schedule conflict? Should it?

Constraints are always enforced at some point, it’s just a matter of when, how, and the cost of correcting the situation.

Continue reading

What is the deal with NULLs?

A recent thread on pgsql-hackers warrants some more extensive discussion. In the past, I’ve criticized NULL semantics, but in this post I’d just like to explain some corner cases that I think you’ll find interesting, and try to straighten out some myths and misconceptions.

First off, I’m strictly discussing SQL NULL here. SQL NULL is peculiar in a number of ways, and the general excuse for this is that there is a need to represent “missing information” — which may be true. But there are lots of ways to represent missing information, as I pointed out in a previous post, and SQL’s approach to missing information is, well, “unique”.

Continue reading