700 queries for one page load?

My employer is evaluating RT. So, we have a test instance set up, of course, and it’s in a working state. The application is written in Perl with the Mason framework, it has support for many databases (including PostgreSQL), and generally has a decent feature set.

We need to migrate from our previous system, so we need to import the data into RT. RT’s tools and APIs do not meet our importing needs, so we look into the schema to do the import directly. I decide the easiest way to figure out how the schema fits together is to just turn on query logging. So, we reload the page, and all of a sudden a deluge of output appears in our “tail -f”. At first I thought it was just the normal number of queries generated by some abstraction layer, and we could dig through them. But it was about 700 queries.

Turns out that RT serves even the static content through Mason (at least when doing the easy install), and it decides to generate around 19 database queries to retrieve each piece of static content. We can solve that fairly easily by making sure that Mason doesn’t handle the static content. The dynamic content generates about 100 queries still, but from 700 down to 100 is about an order of magnitude improvement.

Additionally, as you might expect from this type of application, the schema is pretty bad. It’s basically a lot of auto-generated id references all over the place pointing through poorly-named tables, a lot of “type” attributes (that hold the name of an application object class), and some bizarre organization. For instance, to associate a comment with a ticket, you have to join through the “transactions” table (a log of operations that were performed). There are also no foreign keys, at all, which is just wonderful when trying to figure out the schema.

There are some simple, well established best practices for database development that are easy to find. It’s not even hard to support multiple databases, you can just write separate data access code for each database you want to support. Yet application developers who know nothing about data management decide they can do it better. RT is a perfect example of this. I’m not saying all developers are like this, but I think there’s a much greater tendency for developers to re-invent data management than, say, invent a new memory management scheme, or invent a new filesystem API, or a new sorting algorithm.

6 thoughts on “700 queries for one page load?

  1. 700 eh? That sounds…wrong. And like a prefetch or a caching layer isn’t working right. Would you be up for helping us triage this on rt-devel@lists.bestpractical.com? I’m not going to pretend that RT is a paragon of relational design, but that sounds like something is broken. And we do like to fix broken things ;-)

    Best,
    Jesse (RT is generally considered to be all my fault)

    • I’ll post a report to the list on Monday.

      As I said most of that was because the static content was also causing database access, which we can avoid (bringing us down to about 100 queries).

      Sorry for the harsh words in the post, but needless to say I was stunned by the sheer volume of database queries. Even 100 queries is a lot for one page load.

      • The harsh words were completely justified. Generally you’re shooting for 0 or 1 queries per page load, and 100 is ludicrously too many.

        • 0 or 1 query per page load? Why don’t you go back to the magic land with the faeries whence you obviously came?

          100 is on the high side, yes. I’d say 20 and up would be performed for a page performing some work on a very well written system.

          I used to work somewhere where we had around 8-900 queries per page load. They were all lightning fast mySQL selects, and not causing any sort of a bottleneck

          • I’m sure if you get creative enough, you can think of some kind of use for many queries to load one page.

            But 100 queries for one page load is a red flag. At a minimum you should do a double-take on a number like that, and re-examine the design that lead to that outcome.

            The fact is that many of those RT queries are *duplicates*. I believe many of the rest are just an artifact of poor design.

            Good design indicates that you issue about one query for each different type of information that you need.

          • It’s not faerie-land when Postgres has this amazing ability to return useful serialized data structures at each row and column. :-)