Why are temporal extensions in PostgreSQL important? Quite simply, managing time data is one of the most common requirements, and current general-purpose database systems don’t provide us with the basic tools to do it. Every general-purpose DBMS falls short both in terms of usability and performance when trying to manage temporal data.
What is already done?
- PERIOD data type, which can represent anchored intervals of time; that is, a chunk of time with a definite beginning and a definite end (in contrast to a SQL INTERVAL, which is not anchored to any specific beginning or end time).
- Critical for usability because it acts as a set of time, so you can easily test for containment and other operations without using awkward constructs like BETWEEN or lots of comparisons (and keeping track of inclusivity/exclusivity of boundary points).
- Critical for performance because you can index the values for efficient “contains” and “overlaps” queries (among others).
- Temporal Keys (called Exclusion Constraints, and will be available in the next release of PostgreSQL, 9.0), which can enforce the constraint that no two periods of time (usually for a given resource, like a person) overlap. See the documentation (look for the word “EXCLUDE”), and see my previous articles (part 1 and part 2) on the subject.
- Critical for usability to avoid procedural, error-prone hacks to enforce the constraint with triggers or by splitting time into big chunks.
- Critical for performance because it performs comparably to a UNIQUE index, unlike the other procedural hacks which are generally too slow to use for most real systems.
What needs to be done?
- Range Types — Aside from PERIOD, which is based on TIMESTAMPTZ, it would also be useful to have very similar types based on, for example, DATE. It doesn’t stop there, so the natural conclusion is to generalize PERIOD into “range types” which could be based on almost any subtype.
- Range Keys, Foreign Range Keys — If Range Types are known to the Postgres engine, that means that we can have syntactic sugar for range keys (like temporal keys, except for any range type), etc., that would internally use Exclusion Constraints.
- Range Join — If Range Types are known to the Postgres engine, there could be syntactic sugar for a “range join,” that is, a join based on “overlaps” rather than “equals”. More importantly, there could be a new join type, a Range Merge Join, that could perform this join efficiently (without a Range Merge Join, a range join would always be a nested loop join).
- Simple table logs — The ability to easily create an effective “audit log” or similar trigger-based table log, that can record changes and be efficiently queried for historical state or state changes.
I’ll be speaking on this subject (specifically, the new Exclusion Constraints feature) in the upcoming PostgreSQL Conference EAST 2010 (my talk description) in Philadelphia later this month and PGCon 2010 (my talk description) in Ottawa this May. In the past, these conferences and others have been a great place to get ideas and help me move the temporal features forward.
The existing features have been picking up a little steam lately. The temporal-general mailing list has some traffic now — fairly low, but enough that others contribute to the discussions, which is a great start. I’ve also received some great feedback from a number of people, including the folks at PGX. There’s still a ways to go before we have all the features we want, but progress is being made.
I have seen a very interesting presentation from Magnus Hagander (at FOSDEM 2010 Brussels) about this great feature – and i hope, 9.1 will contains this great feature.
I wish you success!
Regards, Andreas
One lesser feature that I believe may be useful would be range aggregates. Rather than simply summing the durations of a range for a group, it would be nice to get a set of continuous ranges for the group.
By “continuous ranges,” do you mean something that knows that “melts” two or more overlapping, non-identical intervals into their union? That sounds really neat!
Yes, either overlapping or touching (is overlapping the same a touching?). Having this would make be a really nice feature that (to me) would simply range queries.
Richard,
You can install chronosDB over top of Jeff’s period data type. ChronosDB does things like union, minus and intersect non-contiguous sets (arrays of periods)
Not sure if this falls into TEMPORAL data types, but I’ve found a need for what I call a FUZZY DATE. For example, a blog post may have a precise date, like 2010/03/09, and you may know the precise date of a person’s birth or some other event. However, some events or dates are less precise, e.g., a monthly magazine may have a publication date of March 2010, a quarterly journal may be dated Winter 2010, a book may have simply have a year as copyright date, and sometimes you may only know a historical date only approximately, e.g., circa 1502. You’d still like to order the events/date in some reasonable manner or search for events that took place in March 2010 and find the journal, the magazine and the blog post.
This can be modeled in a way similar to numbers with only a few significant figures: truncation.
For example, you can use a CHECK constraint like “CHECK(the_month = date_trunc(‘month’, the_month))”. This might be a good use for domains.
Fuzzy date, fuzzy interval, Allen’s algebra,…,etc. This is a big and separate topic. I’m thinking about hdate data type for historical data and looks like it should looks like,
{date1,date2, pdate1,pdate2,…,pdateN}, where date1,date2 – are dates to represent fuzzy date and pdates are probable dates, ordered by reliability. Example: in the beginning of 20-th century, but most probable in 1925 or 1927 year. Then, we could store this like: {01-01-1900,31-12-1999, 01-01-1925,31-12-1925,01-01-1927,31-13-1927}. pdates here represents as two dates, since they are fuzzy.