Building SQL Strings Dynamically, in 2011

I saw a recent post Avoid Smart Logic for Conditional WHERE Clauses which actually recommended, “the best solution is to build the SQL statement dynamically—only with the required filters and bind parameters”. Ordinarily I appreciate that author’s posts, but this time I think that he let confusion run amok, as can be seen in a thread on reddit.

To dispel that confusion: parameterized queries don’t have any plausible downsides; always use them in applications. Saved plans have trade-offs; use them sometimes, and only if you understand the trade-offs.

When query parameters are conflated with saved plans, it’s creates FUD about SQL systems because it mixes the fear around SQL injection with the mysticism around the SQL optimizer. Such confusion about the layers of a SQL system are a big part of the reason that some developers move to the deceptive simplicity of NoSQL systems (I say “deceptive” here because it often just moves an even greater complexity into the application — but that’s another topic).

The confusion started with this query from the original article:

SELECT first_name, last_name, subsidiary_id, employee_id
FROM employees
WHERE ( subsidiary_id    = :sub_id OR :sub_id IS NULL )
  AND ( employee_id      = :emp_id OR :emp_id IS NULL )
  AND ( UPPER(last_name) = :name   OR :name   IS NULL )

[ Aside: In PostgreSQL those parameters should be $1, $2, and $3; but that's not relevant to this discussion. ]

The idea is that one such query can be used for several types of searches. If you want to ignore one of those WHERE conditions, you just pass a NULL as one of the parameters, and it makes one side of the OR always TRUE, thus the condition might as well not be there. So, each condition can either be there and have one argument (restricting the results of the query), or be ignored by passing a NULL argument; thus effectively giving you 8 queries from one SQL string. By eliminating the need to use different SQL strings depending on which conditions you want to use, you reduce the opportunity for error.

The problem is that the article says this kind of query is a problem. The reasoning goes something like this:

  1. Using bind parameters forces the plan to be saved and reused for multiple queries.
  2. When a plan is saved for multiple queries, the planner doesn’t have the actual argument values.
  3. Because the planner doesn’t have the actual argument values, the “x IS NULL” conditions aren’t constant at plan time, and therefore the planner isn’t able to simplify the conditions (e.g., if one condition is always TRUE, just remove it).
  4. Therefore it makes a bad plan.

However, #1 is simply untrue, at least in PostgreSQL. PostgreSQL can save the plan, but you don’t have to. See the documentation for PQexecParams. Here’s an example in ruby using the “pg” gem (EDIT: Note: this does not use any magic query-building behind the scenes, it uses a protocol level feature in the PostgreSQL server to bind the arguments):

require 'rubygems'
require 'pg'

conn = PGconn.connect("dbname=postgres")

conn.exec("CREATE TABLE foo(i int)")
conn.exec("INSERT INTO foo SELECT generate_series(1,10000)")
conn.exec("CREATE INDEX foo_idx ON foo (i)")
conn.exec("ANALYZE foo")

# Insert using parameters. Planner sees the real arguments, so it will
# make the same plan as if you inlined them into the SQL string. In
# this case, 3 is not NULL, so it is simplified to just "WHERE i = 3",
# and it will choose to use an index on "i" for a fast search.
res = conn.exec("explain SELECT * FROM foo WHERE i = $1 OR $1 IS NULL", [3])
res.each{ |r| puts r['QUERY PLAN'] }

# Now, the argument is NULL, so the condition is always true, and
# removed completely. It will surely choose a sequential scan.
res = conn.exec("explain SELECT * FROM foo WHERE i = $1 OR $1 IS NULL", [nil])
res.each{ |r| puts r['QUERY PLAN'] }

# Saves the plan. It doesn't know whether the argument is NULL or not
# yet (because the arguments aren't provided yet), so the plan might
# not be good.
conn.prepare("myplan", "SELECT * FROM foo WHERE i = $1 OR $1 IS NULL")

# We can execute this with:
res = conn.exec_prepared("myplan",[3])
puts res.to_a.length
res = conn.exec_prepared("myplan",[nil])
puts res.to_a.length

# But to see the plan, we have to use the SQL string form so that we
# can use EXPLAIN. This plan should use an index, but because we're
# using a saved plan, it doesn't know to use the index. Also notice
# that it wasn't able to simplify the conditions away like it did for
# the sequential scan without the saved plan.
res = conn.exec("explain execute myplan(3)")
res.each{ |r| puts r['QUERY PLAN'] }

# ...and use the same plan again, even with different argument.
res = conn.exec("explain execute myplan(NULL)")
res.each{ |r| puts r['QUERY PLAN'] }

conn.exec("DROP TABLE foo")

See? If you know what you are doing, and want to save a plan, then save it. If not, do the simple thing, and PostgreSQL will have the information it needs to make a good plan.

My next article will be a simple introduction to database system architecture that will hopefully make SQL a little less mystical.

6 thoughts on “Building SQL Strings Dynamically, in 2011

  1. – (I say “deceptive” here because it often just moves an even greater complexity into the application — but that’s another topic).

    Well, it’s the main topic!! Coders want an infinite employment plan, and this is their main weapon.

    So far as statement caching/saving/fooing is concerned, each engine does this its own way. Some save the entire query, parameters and all, and only reuse when a subsequent is an exact match. Others save the text, with “slots” for the parameters, and reuse if the text(s) match.

    BTW, prepared statements are not a function of the engine, but the engine side of jdbc (or equivalent) driver. Prepared statement reuse is a separate issue from the engine’s reuse of statements. In engines with which I am familiar (PG being the least familiar), prepared statements are attached to the session/connection.

    “Prepared statements only last for the duration of the current database session”

  2. The problem I run into with this construct is with functions because they store their plans without me asking. I wish there were a way to say “re-parse this function every time it’s called”.

    • If you use EXECUTE rather than PERFORM, it won’t save the plan.

      However, I agree that saving the plan should be more explicit.

      • There are two problems with that: it clutters the syntax (even though dollar quoting helps), and it doesn’t work at all in pure sql functions.