Dave Schweisguth in a Bottle

How many meanings of that can you think of?

Avoiding extra queries in ActiveRecord 3

leave a comment »

ActiveRecord 3 introduces a new query interface based on Arel. The new query methods are more succinct than the old :conditions etc. syntax, and they’re easier to combine into complex queries. Scopes (formerly known as named scopes) are still present and useful, but there’s much less difference between a scope and a simple method that returns a query than there was before. Along with these carrots encouraging you to rewrite all your queries there is the stick of deprecation of the old syntax in Rails 3.1 and removal in Rails 3.2, so most people migrate to the new interface when they migrate to Rails 3.

Migrating GWW to the new interface went smoothly, but when I was reviewing some pages for performance recently I noticed ActiveRecord issuing some extra queries that I didn’t expect, and that weren’t necessary. Fortunately, once noticed, they were easy to eliminate.

The extra queries came about because the new query interface issues SQL lazily. If you step through this method in a debugger while watching your test log,

class Photo
  def self.print_unfound_titles
    unfound_photos = where :game_status => 'unfound'
    puts (unfound_photos.map &:title).join ", "

you’ll see that unfound_photos is an ActiveRecord::Relation rather than an Array, and it issues an SQL query only at the last possible moment, when map is called.

Lazy evaluation allows you to pass around queries and add to them without running them until you’re ready. But, as in real life, sometimes laziness results in doing more work in the long run. GWW had several situations like this

class PhotosController
  def map
    @photos = Photos.order :dateadded
    first_dateadded = @photos.first.dateadded
    @photos.each { |photo| photo[:color] = color_gradient first_dateadded, photo.dateadded }

in which it got a list of something from the database and made some use of the first or last item before iterating over the list. In Rails 3, this issued two queries:

  Photo Load (0.5ms)  SELECT `photos`.* FROM `photos` ORDER BY dateadded LIMIT 1
  Photo Load (0.3ms)  SELECT `photos`.* FROM `photos` ORDER BY dateadded

(Note that ActiveRecord finders don’t belong in controllers. I wrote that here to illustrate my point succinctly.)

The old Rails 2 query, Photos.all :order => :dateadded, had returned an Array, and .first simply returned the first element. In Rails 3, order returned a Relation and .first returned a second, refined Relation. Calling .dateadded caused the second Relation to issue a SQL query, and then iterating over @photos caused the first Relation to issue a second SQL query.

Having noticed the problem, the solution was simple: force the first Relation to run its SQL right away

class PhotosController
  def map
    @photos = order(:dateadded).to_a
    first_dateadded = @photos.first.dateadded
    @photos.each { |photo| photo[:color] = color_gradient first_dateadded, photo.dateadded }

thus turning .first into an array reference again and eliminating the LIMIT 1 query.

You can also fully load a Relation by calling .reload on it, and .reload returns self so you can chain method calls, but I find that .to_a better expresses that loading (not reloading) is taking place.

There are quite a few situations in which ActiveRecord loads a Relation only partially, and in which you might want to tell it to load fully. (I say “fully” rather than “eagerly” because the latter refers to the separate decision of whether to load associations.) Looking at ActiveRecord::FinderMethods, we see that some forms of .first load fully and some do not. .last generally works like .first, except of course for its lastness.

A not-yet-loaded Relation will also issue a reduced query when you call .size, .empty? or the blockless forms of .any? or .many?. These methods run a COUNT(*) query. (Oddly, .length always runs the full query; this seems like a bug.) As before, if you’re going to need the full query results anyway, you can eliminate the reduced query by calling .to_a on the Relation right away.

Although it’s good to know the kinds of methods that result in reduced queries, so that you have an intuition about when it might help to eliminate them, it’s probably counterproductive to try to learn all the details of the cases in which ActiveRecord might issue such a query. Furthermore, the implementation of any of those methods might change in future versions. One tactic would be to call .to_a in any case where an application first looks at some property of a query method’s return value and then uses the full query results, but that would be cargo cult programming. Instead, do what you were probably already doing: if the performance of a web page or other ActiveRecord-using process matters, periodically review the SQL that it issues and take whatever steps are necessary to eliminate redundant queries and address inefficient ones. There are plenty of reasons to do that besides the extra LIMIT 1 and COUNT(*) queries discussed here.


Written by dschweisguth

May 1, 2011 at 10:33

Posted in Programming, Rails, Ruby

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: