Let’s say you’re operating a zoo, and you have this simple constraint:
You can put many animals of the same type into a single cage; or distribute them among many cages; but you cannot mix animals of different types within a single cage.
This rule prevents, for example, assigning a zebra to live in the same cage as a lion. Simple, right?
How do you enforce it? Any ideas yet? Keep reading: I will present a solution that uses a generalization of the standard UNIQUE constraint.
(Don’t dismiss the problem too quickly. As with most simple-sounding problems, it’s a fairly general problem with many applications.)
First of all, let me say that, in one sense, it’s easy to solve: see if there are any animals already assigned to the cage, and if so, make sure they are the same type. That has two problems:
- You have to remember to do that each time. It’s extra code to maintain, possibly an extra round-trip, slightly annoying, and won’t work unless all access to the database goes through that code path.
- More subtly, the pattern read, decide what to write, write is prone to race conditions when another process writes after you read and before you write. Without excessive locking, solving this is hard to get right — and likely to pass tests during development before failing in production.
[ Aside: if you use true serializability in PostgreSQL 9.1, that completely solves problem #2, but problem #1 remains. ]
Those are exactly the kinds of problems that a DBMS is meant to solve. But what to do? Unique indexes don’t seem to solve the problem very directly, and neither do foreign keys. I believe that they can be combined to solve the problem by using two unique indexes, a foreign key, and an extra table, but that sounds painful (perhaps someone else has a simpler way to accomplish this with SQL standard features?). Row locking and triggers might be an alternative, but also not a very clean solution.
A better solution exists in PostgreSQL 9.1 using Exclusion Constraints (Exclusion Constraints were introduced in 9.0, but this solution requires the slightly-more-powerful version in 9.1). If you have never seen an Exclusion Constraint before, I suggest reading a previous post of mine.
Exclusion Constraints have the following semantics (copied from documentation link above):
The EXCLUDE clause defines an exclusion constraint, which guarantees that if any two rows are compared on the specified column(s) or expression(s) using the specified operator(s), not all of these comparisons will return TRUE. If all of the specified operators test for equality, this is equivalent to a UNIQUE constraint…
First, as a prerequisite, we need to install btree_gist
into our database (make sure you have the contrib package itself installed first):
CREATE EXTENSION btree_gist;
Now, we can use an exclude constraint like so:
CREATE TABLE zoo ( animal_name TEXT, animal_type TEXT, cage INTEGER, UNIQUE (animal_name), EXCLUDE USING gist (animal_type WITH <>, cage WITH =) );
Working from the definition above, what does this exclusion constraint mean? If any two tuples in the relation are ever compared (let’s call these TupleA and TupleB), then the following will never evaluate to TRUE:
TupleA.animal_type <> TupleB.animal_type AND TupleA.cage = TupleB.cage
[ Observe how this would be equivalent to a UNIQUE constraint if both operators were "=". The trick is that we can use a different operator -- in this case, "<>" (not equals). ]
Results:
=> insert into zoo values('Zap', 'zebra', 1); INSERT 0 1 => insert into zoo values('Larry', 'lion', 2); INSERT 0 1 => insert into zoo values('Zachary', 'zebra', 1); INSERT 0 1 => insert into zoo values('Zeta', 'zebra', 2); ERROR: conflicting key value violates exclusion constraint "zoo_animal_type_cage_excl" DETAIL: Key (animal_type, cage)=(zebra, 2) conflicts with existing key (animal_type, cage)=(lion, 2). => insert into zoo values('Zeta', 'zebra', 3); INSERT 0 1 => insert into zoo values('Lenny', 'lion', 2); INSERT 0 1 => insert into zoo values('Lance', 'lion', 1); ERROR: conflicting key value violates exclusion constraint "zoo_animal_type_cage_excl" DETAIL: Key (animal_type, cage)=(lion, 1) conflicts with existing key (animal_type, cage)=(zebra, 1). => select * from zoo order by cage;
animal_name | animal_type | cage -------------+-------------+------ Zap | zebra | 1 Zachary | zebra | 1 Larry | lion | 2 Lenny | lion | 2 Zeta | zebra | 3 (5 rows)
And that is precisely the constraint that we need to enforce!
- The constraint is declarative so you don’t have to deal with different access paths to the database or different versions of the code. Merely the fact that the constraint exists means that PostgreSQL will guarantee it.
- The constraint is also immune from race conditions — as are all EXCLUDE constraints — because again, PostgreSQL guarantees it.
Those are nice properties to have, and if used properly, will simplify the overall application complexity and improve robustness.
I believe (if it were available) CREATE ASSERTION would do the job.
Essentially, we want to make sure the imaginary table of invalid cages where animals run amock regardless of species, begin eating each other, and hilarity ensues, is empty.
Giving us
CREATE ASSERTION no_invalid_cages
CHECK(
(SELECT COUNT(*) FROM
(
SELECT cage
FROM zoo
GROUP BY cage
HAVING COUNT(DISTINCT animal_type) > 1
) AS invalid_cages
) = 0
);
*sigh*. It was nicely laid out, but all the tabs have been removed.
Another solution that would be possible if sub-selects were allowed as table constraints. But this is another feature not implements. However, I believe that index based solutions are superior.
Perhaps in the future once exclusion constraints are fully implements and “rounded-out”, it be nice if a mechanism could be implemented that could translate ASSERTION constraints or TABLE sub-Select constraints into exclusion constraints. This way PostgreSQL can add these two additional features to its SQL feature portfolio.
“Perhaps in the future once exclusion constraints are fully implements”
I hope they are, two releases of them now. I am working on some other things around exclusion constraints, like Range Types, but Exclusion Constraints themselves should be reliable.
“it be nice if a mechanism could be implemented that could translate ASSERTION constraints or TABLE sub-Select constraints into exclusion constraints”
Although exclusion constraints are more general than unique, they are still explicitly a special case (more special than Assert or CHECK with subselects). Exclusion constraints are, essentially, the special case of constraints that can be enforced by searching for conflicting tuples (which can be implemented with efficient searches, like an index search).
The most general constraints need something more like serializable snapshot isolation, because the test might be for some very complex violation that wouldn’t be found just by looking for a certain class of tuples that conflict.
Yes, true.
A little bit more difficult to specify, and probably more difficult to make it efficient though.
So what changed between 9.0 and 9.1, w.r.t. exclusion constraints? The particular bit of documentation linked above looks the same (not that it should be expected to have changed; it’s just that it was the one place I immediately thought to look).
My understanding is that exclusion constraints rely upon the features on a single index type. In 9.0 there wasn’t a contrib module for the new index type btree_gist, so you were either limited to the operations of simple btree indexes or gist indexes. And there wasn’t a way for building multi-column constraints that used different types of indexes.
btree_gist has been around for a while, but you’re right that the 9.0 version wasn’t enough: the 9.1 version is the first to searching on “<>“, so the 9.0 version would not have worked for this example. That’s also the reason that I couldn’t use btree directly (it still doens’t allow searches on <>).
It’s also true that you can’t mix and match index types.
And, as I mention below, there’s the overly-conservative error check in 9.0 that was removed in 9.1.
There was an extra error check that we left in 9.0 that was (intentionally) more conservative than required, with the idea that it might give more information if someone ran into a bug.
The extra check threw an error if the enforcement mechanism, while looking for conflicting tuples, didn’t even find the tuple being currently inserted. The underlying assumption there is that, if there is an exclusion constraint, then a tuple will always conflict with itself. However, that’s not true for cases such as this “zoo” example, or for empty ranges in a schedule conflict example.
We considered trying to document that somehow, but since the limitation was removed in 9.1, the cost of carrying around such a caveat seemed to outweigh the benefits.
Hi Jeff I believe that your Exlusion Contraints are the Most Innovative new capability in any rdbms over the few years and in Postgres since Window Functions in v 8.4 – Particularly in conjunction with your Period Datatype (such a shame range types did not make 9.1). We could not find a way to Exclude Period Gaps in v 9 – Does enabling opertor in 9.1 now make this possible ?
I’m glad you found it useful!
To prevent “gaps” is a little tricky, but perhaps it can be done with triggers, row-level locks, and exclusion constraints. In a way, it’s kind of like a foreign key into the same table. The tricky part is the initial condition for a group where there is no other period to “connect” with.
I’ll think about it a little more and write up a blog post about it.
Reminds me of REFUSE rules in the original postgres (not postgresql) rule system.
Interesting. Did that allow constraints that would work even with concurrent updates?