Queries with the word “not” in it often signify that we need to carefully think about our logic. Consider these information needs.
- Find the names of sailors that have not reserved a boat
- Find the names of red boats that have not been reserved
- Find the names of sailors that have not reserved a red boat.
- Find the names of boats that are not red.
All of these information needs have the word “not” in them, but it’s used in two different senses. Think back to the ERD for this database and recall that the reserves table represents the many-to-many relationship between sailors and boats. Sailors that have reserved boats or boats that have been reserved are said to participate in the relationship. Sailors that have not reserved a boat do not participate in the relationship, and boats that have never been reserved also do not participate. The first three information needs are about “not reserved” and they are ultimately asking for results that do not participate in the relationship.
We have two ways to find non-participating items: NOT IN and OUTER JOIN
If the first three information needs are about non-participation and will require our attention, what about the last one? It has the word “not”, but it concerned with a value. It’s just asking us to use not equals.
SELECT sname FROM Sailors S INNER JOIN Reserves R ON S.sid = R.sid INNER JOIN Boats B ON R.bid = B.bid WHERE color <> 'red'
When we see the word “not” in an information need, we must be cautious. Sometimes the SQL will be more involved than others. The fourth one was the simplest. The first three are more complex
NOT IN
We’ve so far seen the word “not” in our SQL with the NOT IN operator, but we haven’t used it. To find sailors that have not reserved a boat (information need #1 above), we can rewrite the information need to indicate a value not in a list is required.
Rewritten information need: Find the names of sailors that are not in the list of sailors that have reserved a boat.
Just as with intersection queries we must create a list of sid’s to represent sailors.
SELECT sname FROM Sailors WHERE sid NOT IN (SELECT sid FROM Reserves)
The second information need is very similar if we rewrite it to contain not in.
Rewritten information need: Find the names of red boats that are not in the list of boats that have been reserved
Just as with sailors, we need to generate a list of boat id’s for the inner query. We must return boats that are red AND are not in that list of ids. So the WHERE clause will have two tests in it–one for color and one for non-participation with NOT IN.
SELECT bname FROM Boats WHERE color = 'red' AND bid NOT IN (SELECT bid FROM Reserves)
Now we may deal with information need #3, finding sailors that have not reserved a red boat. There’s nothing new about this one. We can rewrite the information need either literally or in our imagination and then translate that into SQL.
Rewritten information need: Find the names of sailors that are not in the list of sailors that have reserved a red boat.
We need to generate the list of id’s of sailors that have reserved red boats. For this list, we must use the Reserves table and the Boats table because we need sid’s of sailors with reservations and colors of boats.
SELECT sname FROM Sailors WHERE sid NOT IN (SELECT sid FROM Reserves R INNER JOIN Boats B ON R.bid = B.bid WHERE color = 'red')
Outer Join
There is another way to write each of these non-participation queries that was mentioned above. We can use special type of join called an OUTER JOIN. So far we have seen the INNER JOIN operation which combines two tables on the left and right side of INNER JOIN by matching up rows from each table that satisfy the join condition. The rows satisfying the join condition from either table are the rows that participate in the join. Any rows from either table that don’t show up in the result do not participatel they are non-participating rows. An OUTER JOIN is a similar operation that can be thought of as “INNER JOIN + non-participating rows”.
Information need: Find the names of sailors that have reserved only red boats.
This one doesn’t have “not” in it, but it’s still very similar because “only red” also means “not any other colors”. We have to recognize that the above query returns sailors who also have no reservations (because they are not in the list of sailors who have reserved red boats), so if we modify the outer query to only check for sailors with reservations not being in the list, we will have it:
SELECT sname FROM Sailors S INNER JOIN Reserves R1 ON S.sid = R1.sid WHERE S.sid NOT IN (SELECT sid FROM Reserves R2 INNER JOIN Boats B ON R2.bid = B.bid WHERE B.color <> 'red')
At this point, we can speed through queries like “Find sailors by name that have reserved a green boat but not a red boat.” We specialize the outer query to deal with reservations for green boats and keep the inner query the same:
SELECT sname FROM Sailors S INNER JOIN Reserves R1 ON S.sid = R1.sid INNER JOIN Boats B1ON R1.bid = B1.bid WHERE B1.color = 'green' AND S.sid NOT IN (SELECT sid FROM Reserves R2 INNER JOIN Boats B2 ON R2.bid = B2.bid WHERE B.color <> 'red')
There’s another kind of query where the word may not be immediately obvious, but it’s lurking in the background. Here’s an example: “Find the names of all sailors, and if a sailor has reserved a boat, also include the ids of every boat they have reserved.” Uh-oh. This ones kind of tricky. We’re seemingly being asked for two things here. One is to return all sailors, even those that have not reserved a boat. The other is to match up sailors with the ids of boats they’ve reserved. The sailors that have reserved boats participate in the join with reserves, but those that never have do not. Here we need both kinds of sailors.
There’s a special group of operations called an OUTER JOINs that return both participating and non-participating rows. There are in fact three of these OUTER JOINs, left outer joins (LEFT JOIN), right outer joins (RIGHT JOIN), and full outer joins (not actually available in the DBMS we use). What they all have in common is that they give us the INNER JOIN results we’re used to by now plus information about the rows that do not participate in joins.
Remember that the FROM clause with an INNER JOIN looks like
FROM LeftTable INNER JOIN RightTable ON ...
The different kinds of outer join depend on whether the table on the left, the right, or both the table on the left and the one on the right are treated specially. In a LEFT JOIN, the table on the left (the first table) is treated differently. Every row from that table will be in the query results whether or not it participates in the join. The syntax is otherwise just like INNER JOIN with the join condition following the ON keyword.
SELECT * FROM Sailors S LEFT JOIN Reserves R ON S.sid = R.sid
sid | sname | rating | age | sid | bid | date |
---|---|---|---|---|---|---|
22 | Dustin | 7 | 45.0 | 22 | 101 | Nov. 29, 2012 |
22 | Dustin | 7 | 45.0 | 22 | 102 | Nov. 29, 2012 |
22 | Dustin | 7 | 45.0 | 22 | 103 | Nov. 29, 2012 |
22 | Dustin | 7 | 45.0 | 22 | 104 | Nov. 29, 2012 |
29 | Brutus | 1 | 33.0 | NULL | NULL | NULL |
31 | Lubber | 8 | 55.5 | 31 | 102 | Nov. 29, 2012 |
31 | Lubber | 8 | 55.5 | 31 | 103 | Nov. 29, 2012 |
31 | Lubber | 8 | 55.5 | 31 | 104 | Nov. 29, 2012 |
32 | Andy | 8 | 25.5 | NULL | NULL | NULL |
58 | Rusty | 10 | 35.0 | NULL | NULL | NULL |
64 | Horatio | 7 | 35.0 | 64 | 101 | Nov. 29, 2012 |
64 | Horatio | 7 | 35.0 | 64 | 102 | Nov. 29, 2012 |
71 | Zorba | 10 | 16.0 | NULL | NULL | NULL |
74 | Horatio | 9 | 35.0 | 74 | 103 | Nov. 29, 2012 |
85 | Art | 3 | 25.5 | NULL | NULL | NULL |
95 | Bob | 3 | 63.5 | NULL | NULL | NULL |
A RIGHT JOIN is similar except the non-participating rows from the right table (the second table) are included along with the INNER JOIN results. In a full outer join, non-participating rows from both tables would be included. Not every database includes all three joins. MySQL, for instance includes LEFT JOIN and RIGHT JOIN but not a full outer join, and Microsoft Access includes only LEFT JOIN. The lack of RIGHT JOIN is not a serious issue, though, for one just need to remember to always write the “special” table first and use a LEFT JOIN whenever one of these outer joins is needed.
If you’re keeping up, you’ll go “wait a minute. If the row from a table doesn’t participate in a join, then there’s no row from the other table to match it up with. What about the columns from the other table that appear in the result?” You’re right about that, and that’s why the outer join is special. SQL creates null values for every column in the other table to “fill in” the missing spaces in the results returned by an outer join. So, for instance, for Sailors LEFT JOIN Reserves, all three Reserves table columns would be full of NULL values for Sailors that have not reserved a boat.
This property gives us another way to write non-participation queries.
Information Need: Find the names of sailors that have not reserved a boat.
SELECT sname FROM Sailors S LEFT JOIN Reserves R ON S.sid = R.sid WHERE R.sid IS NULL
In this query above, we could have used any column from Reserves to test for NULL. Using the foreign key in the join condition is a good choice, though, for it makes our intention clear to only retrieve non-participating rows from the Sailors table.
Information Need: Find the names of all sailors, and if a sailor has reserved a boat, also include the ids of every boat they have reserved.
So back to the problematic information need that started the coverage of outer joins. We can use an outer join to construct queries where you are asked for multiple pieces of information, but some of those are “optional” and may not appear in the answer. The names of sailors are mandatory but the ids of boats are optional. We need to include both those columns in the SELECT clause, and the optional ones may sometimes have NULL values. That’s OK. Since we want all the sailors, participating and non-participating ones, we don’t need to filter by participation in the WHERE clause. In fact, we don’t need to filter at all, so the WHERE clause can be omitted.
SELECT sname, bid FROM Sailors S LEFT JOIN Reserves R ON S.sid = R.sid
Before you read further, go ahead and try to write SQL for this next one where the id of boats has been changed to the names.
Information Need: Find the names of all sailors, and if a sailor has reserved a boat, also include the names of every boat they have reserved.
The obvious approach is to recognize the boat names are in the Boats table and add an INNER JOIN with Boats, too, because that’s what we’re used to doing.
SELECT sname, bid FROM Sailors S LEFT JOIN Reserves R ON S.sid = R.sid INNER JOIN Boats B ON R.bid = B.bid
If you execute this query, you’ll see that only the participating sailors (those that have reserved a boat) apper. The reason for this is that rows for non-participating sailors will have a NULL value for R.bid, and the join condition can’t be satisfied when R.bid IS NULL. So only rows with non-NULL R.bid values can evaluate to TRUE. This is subtle, but it’s important to understand that joining multiple tables when outer joins are involved requires care. The solution is to first join the Reserves and Boats tables to ensure every reservation can be matched up with its boat. The table resulting from that join can then be outer joined with the Sailors table. There are two ways to make this happen. One is to add () to the query above to ensure the INNER JOIN occurs first. This addition of () is analogous to writing 2 + (3 + 4) on a calculator to force the addition of 3 and 4 to happen before the first term gets added.
SELECT sname, bname FROM Sailors S LEFT JOIN (Reserves R INNER JOIN Boats B ON R.bid = B.bid) ON S.sid = R.sid
The other solution is to remember that RIGHT JOIN is a thing. We can put the Sailors table after the join operators that way and not have to worry about ().
SELECT sname, bname FROM Reserves R INNER JOIN Boats B ON R.bid = B.bid RIGHT JOIN Sailors S ON R.sid = S.sid