Stop guessing, SQL joins explained
Ahh, joins. The point where you start to realize the true power of the relational database, sort of like Darth Vader and the dark side, but less dark. Let’s start with the example we will be following.
Imagine that you are back in high school. You are walking down the hall and you spot your true love. Everything slows down, your heart starts to race and you can’t help but stare at the person. After 2 weeks of thinking about it, you finally decide to ask them out on a date. To your surprise, they agree! Wow this is amazing! I guess wishes do come true.
To avoid messing up the first date, you ask all your friends to give suggestions on what to do on your first date. So, you make a little tiny survey and send it out to your friends. After a couple of days, you check the results.
Oh, this is exciting I wonder what the suggestions are going to be like!
Your database tables look like this:
Table “user”
Column | Type | Modifiers |
---|---|---|
id | integer | Primary key not null |
first_name | text | |
last_name | text |
Table “activities”
Column | Type | Modifiers |
---|---|---|
id | integer | Primary key not null |
name | text | |
user_id | integer |
Let’s start with looking at some of the data:
Users table
id | first_name | last_name |
---|---|---|
1 | Pam | Beesly |
2 | Dwight | Schrute |
3 | Jim | Halpert |
4 | Michael | Scott |
5 | Andy | Bernard |
I know what you’re thinking, are these people from “The Office”, and yes, yes they are.
The response are coming in super fast and they are hilarious! Just check out some of these suggestions:
id | name | user_id |
---|---|---|
1 | Sky diving | 1 |
2 | Eating ice cream until your stomach hurts | 2 |
3 | Watching old episodes of star trek the next generation | 1 |
4 | Long walks on the beach looking into each others eyes | 3 |
5 | Desserts with lots of peanuts | 2 |
6 | Rock climbing | 3 |
7 | Biking race | 1 |
8 | Eating ice cream until your stomach hurts | 1 |
9 | French desserts | |
10 | Swimming with the whales |
Amazing! There are a few things that you might have noticed.
- There seems to be a bug in the code. Some of the suggestions came from users that are not entered into the user table. i.e. the user_id column is blank
- Two people registered, but never answered the question. (Michael and Andy)
Great now that we have our data, we want to know who sent what response. To get that we will need to a join, an inner join I might add.
Now I know what your thinking, I have never joined two tables together. How can I possibly know if it’s correct? Don’t worry, I will show you step by step so that we are all on the same page.
Venn diagrams and Pancakes
Before we go on, we want to understand why things are called they way they are. So that we can better understand their purpose. Here’s an analogy.
Have you ever tried to make two pancakes in a single pan at the same time? Sometimes, even though you try really hard, they end up merging together! Well Venn diagrams are a lot like two pancakes in a pan the just happen to merge together.
Inner joins & Pancakes
Now, one pancake represents a list of your friends. The other pancake represents the suggestions on a first date also known as activities. The inner join is where you put names and associate them to the activities, that’s the delicious center. You essentially will know which friend suggested what.
This means that we will only get data that is common between the two data sets. You will not get response that don’t have names and names that don’t have responses. Some might call this inner part of the pancake. Haha, get it, inner for inner join. Wow.
Beautifully delicious, isn’t it?
Let’s look at this in SQL:
Key areas to focus on would be “INNER JOIN” key words. This is what joins the two tables together, followed by the table we are joining “activities”. Also, we need to tell it which columns to join together, we use the keyword “ON” for this. In this case, activities.user_id and users.id.
After executing the SQL statement, we get the following data:
id | first_name | last_name | id | name | user_id |
---|---|---|---|---|---|
1 | Pam | Beesly | 1 | Sky diving | 1 |
2 | Dwight | Schrute | 2 | Eating ice cream until your stomach hurts | 2 |
1 | Pam | Beesly | 3 | Watching old episodes of star trek the next generation | 1 |
3 | Jim | Halpert | 4 | Long walks on the beach looking into each others eyes | 3 |
2 | Dwight | Schrute | 5 | Desserts with lots of peanuts | 2 |
3 | Jim | Halpert | 6 | Rock climbing | 3 |
1 | Pam | Beesly | 7 | Biking race | 1 |
1 | Pam | Beesly | 8 | Eating ice cream until your stomach hurts | 1 |
Few of things to notice:
- I have ordered the data based on activity id to make it easier to follow.
- The data contains only people that have answered the question.
- Some of the people have answered question multiple times, that means they appear multiple times.
- Activities that did not contain a user id are excluded.
So an inner join just gets you what is common between the two data sets and repeats where necessary.
Left join (aka Left outer join) & Pancakes
Suppose that you need to know all the people that registered, regardless if they answered a question or not. So, since the left pancake represents all the people, that have registered, you want the whole left side. But wait there is more! You also want the delicious center because that contains the association between the responses and the people who did answer.
Assuming that we start with the users table, aka the people who registered, we get every person regardless if they answered the question or not. This compared to the inner join, where every person listed has answered the question.
Let’s jump back into the SQL to see what this looks like.
Holy guacamole! This looks exactly the same as the “INNER JOIN” except that we have the keywords “LEFT JOIN” to make a left join. Good eye, Batman! Let’s take a look at our data:
id | first_name | last_name | id | name | user_id |
---|---|---|---|---|---|
1 | Pam | Beesly | 1 | Sky diving | 1 |
2 | Dwight | Schrute | 2 | Eating ice cream until your stomach hurts | 2 |
1 | Pam | Beesly | 3 | Watching old episodes of star trek the next generation | 1 |
3 | Jim | Halpert | 4 | Long walks on the beach looking into each others eyes | 3 |
2 | Dwight | Schrute | 5 | Desserts with lots of peanuts | 2 |
3 | Jim | Halpert | 6 | Rock climbing | 3 |
1 | Pam | Beesly | 7 | Biking race | 1 |
1 | Pam | Beesly | 8 | Eating ice cream until your stomach hurts | 1 |
4 | Michael | Scott | |||
5 | Andy | Bernard |
Couple of things to notice:
- Michael and Andy pop up because with a left join it keeps all the data that is contained in the leftmost table, meaning the table that’s mentioned first in the SQL.
- Notice that not all the activities are listed. We don’t see activities that are not assigned to a user.
So in a left join, we always get all the data from the left table, in this case the user table.
Left table | Right table | ||
---|---|---|---|
SELECT * FROM | users | LEFT JOIN | activities on |
Right join (aka right outer join) & Pancakes
Ok, what if you wanted to get all the responses, regardless of they have a person attached to them or not. That’s a right join! You want the right pancake which represents all the suggested activities, plus the delicious center which represents the names associated to those responses. Mmmm Yummy!
Brace yourself, here comes the SQL!
Wow, did you see that. Three SQL statements that look exactly the same just went by. It must be a glitch in the Matrix!
Nope! The syntax is very similar, only thing that changed is that we use “RIGHT JOIN” instead of “INNER JOIN” or “LEFT JOIN”. Our “ON” keyword still tells us what columns we are joining on, activities.user_id and users.id. Let’s look at our data:
id | first_name | last_name | id | name | user_id |
---|---|---|---|---|---|
1 | Pam | Beesly | 1 | Sky diving | 1 |
2 | Dwight | Schrute | 2 | Eating ice cream until your stomach hurts | 2 |
1 | Pam | Beesly | 3 | Watching old episodes of star trek the next generation | 1 |
3 | Jim | Halpert | 4 | Long walks on the beach looking into each others eyes | 3 |
2 | Dwight | Schrute | 5 | Desserts with lots of peanuts | 2 |
3 | Jim | Halpert | 6 | Rock climbing | 3 |
1 | Pam | Beesly | 7 | Biking race | 1 |
1 | Pam | Beesly | 8 | Eating ice cream until your stomach hurts | 1 |
9 | French desserts | ||||
10 | Swimming with the whales |
Couple of things to notice:
- We got all the activities but not all the people who registered. In this case Michael and Andy are missing.
- We did get such responses such as “swimming with the whales” and “French desserts”, which were entered by non-registered people.
So in a right join, which is the other side of the table that we are joining to, we always get all the data from the right table. In this case the activities table.
Left table | Right table | ||
---|---|---|---|
SELECT * FROM | users | RIGHT JOIN | activities on |
How to remember left from right?
To remember left join from right join, just ask yourself:
What table has all the data I want?
Left table | Right table | ||
---|---|---|---|
SELECT * FROM | users | [LEFT OR RIGHT] JOIN | activities on |
In this case, if it’s the users table It’s a left join. If it’s the activities table, it’s a right join.
Full outer join (all the pancakes)
Finally, you may want all the data. So, you want all the activities and all the people who have registered regardless whether the activity has a name or if the person registered and never answered the question. That’s the whole pancake, left, right and the center. Every sweet delicious bite!
Let’s jump back into the SQL.
As with all the SQL statements that came before this one, the only thing that changed is “full outer join” instead of “left join”,”right join”, or “inner join”.
Let’s look at out data:
id | first_name | last_name | id | name | user_id |
---|---|---|---|---|---|
1 | Pam | Beesly | 1 | Sky diving | 1 |
10 | Swimming with the whales | ||||
2 | Dwight | Schrute | 2 | Eating ice cream until your stomach hurts | 2 |
1 | Pam | Beesly | 3 | Watching old episodes of star trek the next generation | 1 |
3 | Jim | Halpert | 4 | Long walks on the beach looking into each others eyes | 3 |
2 | Dwight | Schrute | 5 | Desserts with lots of peanuts | 2 |
3 | Jim | Halpert | 6 | Rock climbing | 3 |
1 | Pam | Beesly | 7 | Biking race | 1 |
1 | Pam | Beesly | 8 | Eating ice cream until your stomach hurts | 1 |
9 | French desserts | ||||
5 | Andy | Bernard | |||
4 | Michael | Scott |
Couple of things to notice:
- We get all the responses. “Swimming with the whales”, “French desserts”
- Also, “Michael” and “Andy” are also here even though they did not answer the question.
The outer join lets us see all the data regardless if it’s not included in any of the tables that it’s joined to.
Final thoughts
Well I hope that settles that. Thanks for taking the time to read this post. Leave a comment if you have any questions.
In the next post, I’ll be talking about subqueries. Sign up and you will get the very next one in your inbox.
Until next time.