How to speed up Ruby on Rails updates using PostgreSQL
While working with Ruby on Rails, I sometimes need to update a set of fields very quickly. Here are four different ways to speed up your updates.
#1 The simple update
Let’s assume that you are updating last time a user has logged in. For that, most of the time the following code works well:
#2 Updating multiple items at a time
In most situations the above statement works fine, but what if you need to update a lot of items.
Say that you have a user that is currently checking out, and you need to update the purchased items at the time of a transaction. Using touch in a looping structure, might be a little slow and has potential race conditions. So, you might want to resort to:
I assume that checkout_items contain all the items that you want to update. The code above avoids the race condition and is completely atomic.
A word of caution, this will overwrite any data that is current in the purchased_at field, so you may want to structure your code in such a way to avoid such scenarios.
#3 Update a set of records based on other records.
In other times, you might be migrating data either in a rake task or as part of a migration. In that case, it might be possible that you need to update a set of fields based on another set of fields. This kind of operation could take a few minutes or hours depending on your data set. Using straight SQL is your best bet at speeding things up.
In this example, we want to save the user’s card UUID at the time of purchase. We already have the card information stored in the user table; we just need to move it to the transaction table. Here is the SQL to do it.
Your schema would look something like this:
Use this code to generate it:
Generate some data:
Run this code to update all the transactions card_uuid:
#4 Updating records based on stored IDs.
Lastly, you might be performing a complex ETL process. Because of data available at the time, you happen to have a set of ID’s that you need to update within a text field in your database. I’ve only seen this used in a memory store, such as RabbitMQ or Redis when data is precomputed in memory and dumped to a relational database for analysis.
Say that you are building an Ad tracking platform. You know the last time a site was considered as a potential candidate for a redirect. This logic along with an elaborate set of rules are stored in a NodeJS server. When you migrate the data from memory store to your relational database, you already have the set of IDs for the websites that were considered. All you need to do now is store the site_redirects and update the last_considered_at on each of the sites records.
In this case, we might assume your database table is called site_redirects and the field containing your IDs is sites_considered_ids.
Your schema looks like this:
Run this code to generate the above schema:
Create your records like this:
At the time of your ETL, you would be running something like this:
The last statement is something that I find amazing! It updates all the sites last_considered_at record based on when the site_redirects record was created.
Usually, a large number of SQL calls to performed to accomplish the above. That’s very slow. Here you are doing it in a single SQL call. Pretty Amazing!
I hope that you have enjoyed this post. If you have any comments, please post them below.