Active Record Query Optimization Tips 🏃♂️🏃♂️🏃♂️
Simple But Effective Ways To Improve Active Record Query Performance
1. Eager Loading With includes
to Avoid n+1 Queries
Use includes
to avoid n+1
queries. For example, if there are n
rows in the users
table, the following ActiveRecord
code would generate n+1
queries:
# Assuming User has the following association
class User
has_many :posts
end
# Running this activerecord code
users = User.all
users.each do |user|
user.posts
end
# Will generate n+1 queries, where n is the number of users, namely
# SELECT * FROM users;
# SELECT * FROM posts WHERE posts.user_id = 1;
# SELECT * FROM posts WHERE posts.user_id = 2;
# SELECT * FROM posts WHERE posts.user_id = 2;
# .....
# SELECT * FROM posts WHERE posts.user_id = n;
However, using includes
will only run 2 queries. One to load all the users
, and another to load all posts
associated with those users
.
# Eager-load all users' posts with `includes`
users = User.includes(:posts)
users.each do |user|
user.posts
end
# This will only run 2 queries.
# SELECT * FROM users;
# SELECT * FROM posts WHERE posts.user_ids IN (1,2,3,4,....,n);
2. Use find_each
When Loading a Large Amount of Records
This next tip will actually increase the number of queries made, but it’s meant to reduce memory use when loading a huge amount of records.
# Selecting all users
User.all.each do |user|
notify(user)
end
# Generates this query:
# SELECT * FROM users;
# Selecting all users with `find_each` will query by batches of size
# determined by the :batch_size option (defaults to 1000)
User.all.find_each(batch_size: 5000) do |user|
notify(user)
end
# Generates the queries
# SELECT * FROM users ORDER BY users.id ASC LIMIT 5000;
# SELECT * FROM users WHERE users.id > 5000 ORDER BY users.id ASC LIMIT 5000;
# SELECT * FROM users WHERE users.id > 10000 ORDER BY users.id ASC LIMIT 5000;
# .....
Rather than loading a huge amount of records into memory for processing (and possibly crashing the server if there are too many records) find_each
loads one batch at a time, and garbage collects them after use
3. Select
ing And Pluck
-ing Only Attributes We Need
Sometimes we don’t need all the model attributes, but only a few. Use pluck
to query the needed attribute(s) and return them as an array of strings instead of an ActiveRecord_Relation
. However, if an ActiveRecord_Relation
is needed , use select
instead.
# Suppose we load all user records, instantiate them as `User` objects
# only to select one attribute. This is wasteful since we select all
# columns from the database and load them all into memory.
user_emails = User.where(status: "active").map(&:email)
# SELECT * FROM users;
# Instead, we can use `pluck` to modify the query to select only the
# column(s) we need. It also returns the attributes as an array of strings
# instead of `User` objects, reducing memory use.
user_emails = User.where(status: "active").pluck(:email)
# Select users.email FROM users;
# If an `ActiveRecord_Relation` of `User` objects is still needed,
# then use `select`. This returns the result as an `ActiveRecord_Relation`
# of `User` objects but with only the requested attribute(s) in each object.
user_emails = User.where(status: "active").select(:email)
# Select users.email FROM users;
# Only if you really need ALL the attributes of a model, should you not use
# `pluck` or `select`.
users = User.where(status: "active")
4. Check existence of record with exists?
instead of present?
Sometimes, we only care to know if a record exists, but don’t need to actually do anything with that record. Use exists?
instead of present?
.
# If we care to know a record exists, but actually don't
# need to do anything with the record...
if User.where(email: "teemo@shrooms.net").present?
puts "There is a user with email address teemo@shrooms.net"
else
puts "There is no user with email address teemo@shrooms.net"
end
# then using `present?` is wasteful, as it selects all the column
# and loads the object into memory.
# SELECT * FROM users WHERE email` = 'teemo@shrooms.net';
# Instead, use `exists?`
if User.where(email: "teemo@shrooms.net").exists?
puts "There is a user with email address teemo@shrooms.net"
else
puts "There is no user with email address teemo@shrooms.net"
end
# The advantage is the corresponding query limits only to 1 record
# and does not select any columns.
# SELECT 1 FROM users WHERE email` = 'teemo@shrooms.net' LIMIT 1;
5. Prefer ActiveRecord::Relation#size
over ActiveRecord::Calculations#count
Assuming users
is an ActiveRecord::Relation
, both users.count
and users.size
return the number of users. However, users.count
will always run the query SELECT COUNT(*) FROM users WHERE ...
even if the relation has already been loaded. On the other hand, size is smarter in that it will call length
on the relation if it already has been loaded (saving us a database query) but calls count
if the relation has not been loaded.
From the ActiveRecord
source code:
# File activerecord/lib/active_record/relation.rb, line 210
def size
loaded? ? @records.length : count(:all)
end
6. Bulk Delete With delete_all
When destroying a large number of records that do not require Active Record callbacks (which are almost considered an anti-pattern now) to be invoked, use delete_all
or instead of invoking destroy
on individual objects.
# Deleting a large amount of records by invoking `destroy` on
# each record results in a single query per record.
deadbeat_users = User.where(payment_status: "deadbeat")
deadbeat_users.each do |deadbeat|
deadbeat.destroy
end
# DELETE FROM users WHERE id = 13;
# DELETE FROM users WHERE id = 42;
# DELETE FROM users WHERE id = 49;
# DELETE FROM users WHERE id = 420;
# DELETE FROM users WHERE id = 666;
# ...
# Use `delete_all` to batch delete these records with one query,
# if we do not need `after_destroy` callbacks to be invoked. Note
# AR callbacks are pretty much an antipattern these days.
deadbeat_users.delete_all
# DELETE FROM users WHERE users.status = 'deadbeat';
7. Bulk Creation
For those who didn’t realize that ActiveRecord::Base#create
can accept an array of hashes, it can and it will run one query instead of n
, but only if the underlying database engine supports batch INSERT
.
# When creating a lot of records from an array of hashes
new_users = [
{name: "Teemo", email: "teemo@shrooms.net"},
{name: "Heim", email: "heim@turrets.com"},
{name: "Annie", email: "annie@tibbers.org" },
...
{name: "Volli", email: "volli@woods.biz" },
]
# Do not create each one individually as it results in one query, per record.
new_users.each do |user_hash|
User.create(user_hash)
end
# INSERT INTO users (name, email) VALUES ('Teemo', 'teemo@shrooms.net');
# INSERT INTO users (name, email) VALUES ('Heim', 'heim@turrets.com');
# INSERT INTO users (name, email) VALUES ('Annie', 'annie@tibbers.org');
# ...
# INSERT INTO users (name, email) VALUES ('Volli', 'volli@woods.biz');
# Instead, pass the array into ActiveRecord::Base#create, which will create
# all the records in one query if the database supports this feature.
User.create(new_users)
# INSERT INTO users (name, email)
# VALUES
# ('Teemo', 'teemo@shrooms.net'),
# ('Heim', 'heim@turrets.com'),
# ('Annie', 'annie@tibbers.org'),
# ....
# ('Volli', 'volli@woods.biz');
8. Bulk Update
Similarly, there is also a bulk_update
command that can update multiple records in one query, assuming the underlying database supports this.
awesome_users = User.where(payment_status: "on_time")
# Instead of updating each individual record
deadbeat_users.each do |user|
user.update(awesome: true)
end
# Just do one bulk update
User.where(payment_status: "on_time").update_all(awesome: true)
9. Sometimes In-memory Operations Are Better Than Querying
Sometimes, it’s better to actually use in-memory selection instead of querying. For example, suppose we want to select from an array of emails those which aren’t in the database yet.
# Suppose we want to select from these n emails those that
# are not already in the `users` table.
emails = [
"teemo@shrooms.net",
"heim@turrets.org",
"annie@tibbers.gov",
...
"volli@jungle.com"
]
new_emails = []
emails.each do |email|
unless User.where(email: email).exists?
new_emails << email
end
end
# The above will run n queries, one for each email
# SELECT 1 FROM users WHERE email` = 'teemo@shrooms.net' LIMIT 1;
# SELECT 1 FROM users WHERE email` = 'heim@turrets.org' LIMIT 1;
# SELECT 1 FROM users WHERE email` = 'annie@tibbers.gov' LIMIT 1;
# ...
# SELECT 1 FROM users WHERE email` = 'volli@jungle.net' LIMIT 1;
# Instead, let's grab all the emails from the database first in one query,
# then filter in memory and store in a Set (since we want to only
# check for inclusion).
existing_emails = Set.new(User.pluck(:email))
# SELECT users.email from users;
new_emails = []
emails.each do |email|
unless existing_emails.include?(email)
new_emails << email
end
end
view raw
Note this example is essentially the opposite of the find_each
example earlier. Here, we are reducing the number of queries, but increasing the memory use. In the find_each
example, we are increasing the number of queries but reducing the memory use. As such, there is no absolute rule for whether it’s more important to reduce the number of queries or reduce memory use. It depends on the situation, so exercise good judgment.
10. Bullet Gem
The bullet gem is a helpful tool for determining whether or not you need to eager-load but also if you are unnecessarily eager-loading. To use, simply enable it in the test
environment
# config/environments/test.rb
config.after_initialize do
Bullet.enable = true
Bullet.bullet_logger = true
Bullet.raise = true # raise an error if n+1 query occurs
end
Then run all your specs: bundle exec rspec
and watch for warnings on n+1
query issues.
11. Benchmarking Your Changes
To see the results of your query optimizations, use the Ruby’s Benchmark.realtime
to time them.
require "benchmark"
user = User.find_by_name("Teemo")
# See the time it takes to count the comments of a post in an n+1 query
puts Benchmark.realtime { user.posts.each { |post| puts post.comments.count } }
# See the time it takes to count the comments of a post with eager-loading
# Should be faster
puts Benchmark.realtime { user.includes(:posts).posts.each { |post| puts post.comments.count } }
But note that databases like MySQL will often cache queries in memory so if you run the same query twice, the second time will always be faster. As such, be sure to clear the cache query between benchmarks. In MySQL, you can do this by running this in the MySQL console:
RESET QUERY CACHE;
12. Don’t Forget to Create Indexes
Indexes on table columns will increase retrieval speed at the expense of write speed as well as using more disk space, so use them wisely.
Feel free to leave comments, questions, suggestions, corrections below.