Databases and Data Types

Hypothetically, if I were to use a much simpler database (let’s say SQLite or BerkeleyDB) for a project, rather than a powerful RDBMS like PostgreSQL, what would be the first thing that I would miss from PostgreSQL? Not “miss” in a theoretical, long term sense, but “miss” in that it would cause a practical impediment to actually constructing even a prototype application?

Would it be triggers? Certainly not. Server side functions? Those would certainly be missed, although may projects go along fine without them. SQL’s relational algebra and calculus that make a relational database relational? Maybe, that would certainly be preferable to procedural code using BerkeleyDB, but I don’t think this would be the first thing I’d miss (digression: actually, it wouldn’t be missed until later, when I was actually trying to debug a difficult problem or look at the data for analysis).

The first thing I’d miss, very specifically, would be a TIMESTAMPTZ (or similar) datatype. Everything else pales in comparison to the lack of this single datatype. SQLite and BerkeleyDB both really just store text/binary data, and don’t really have datatypes. What’s strange is that even numbers are probably not as important of a type. How often do I find myself sorting by numerical value, or selecting records where some numerical value falls within some range? It happens, but not nearly as often as performing the same operations with time data. I don’t think that I’m unique in this, either.

This makes me think that types are very important to consider during development. And these ideas are a big part of what motivated me to work on the “t_interval” type (a project I’m working on with Selena and grzm), something that I think is needed in all RDBMSs.

I’ve spent a significant amount of time (especially lately) thinking about data types, not just for databases but also for languages. I consider SQLite or BerkeleyDB to be analogous to assembly language: very simple, just pushes bytes around. It’s very simple to do simple things, you can pretty much draw a picture of exactly what’s happening (usually the picture would consist of a lot of boxes and arrows). It’s much harder, however, to actually write complex systems in assembly, because it’s harder to prove that the assumptions you’re making later in the software are actually true. The exact same thing is true of simple databases: I put the bytes in, I get them back out; simple, right? But any complex software built on a “simple” database usually becomes a mess of inconsistent data, and the meaning of the data usually drifts with time. The complexity of PostgreSQL can really simplify things.

Back to PostgreSQL: writing your own data type is one of the most interesting things that you can do with PostgreSQL (aside from working on the source code itself). PostgreSQL treats your data type the same way that it treats an internal data type, and that’s very important. The data type you write is not a second-class citizen, and you can make a “myinteger” type that’s every bit as good as the built-in integer type. While constructing a data type, you need to interact with C functions, types, operators, operator classes, input parsing, and text representation. You might even want to build the routines necessary for B-Tree, GiST, or GIN indexing. And in the process, you learn a lot about PostgreSQL. More than just learning PostgreSQL internals, you also learn a lot about types in general, and how your application can make better use of types. If you’re a postgresql power user who wants to know how postgres works in greater depth, I highly recommend building a data type.

Comments are closed.