Queries with the word “not” in it often signify that we need to carefully think about our logic. Consider these information needs.

  1. Find the names of sailors that have not reserved a boat
  2. Find the names of red boats that have not been reserved
  3. Find the names of sailors that have not reserved a red boat.
  4. Find the names of boats that are not red.

All of these information needs have the word “not” in them, but it’s used in two different senses. Think back to the ERD for this database and recall that the reserves table represents the many-to-many relationship between sailors and boats. Sailors that have reserved boats or boats that have been reserved are said to participate in the relationship. Sailors that have not reserved a boat do not participate in the relationship, and boats that have never been reserved also do not participate. The first three information needs are about “not reserved” and they are ultimately asking for results that do not participate in the relationship.

We have two ways to find non-participating items: NOT IN and OUTER JOIN

If the first three information needs are about non-participation and will require our attention, what about the last one? It has the word “not”, but it concerned with a value. It’s just asking us to use not equals.

SELECT sname
FROM Sailors S INNER JOIN Reserves R ON S.sid = R.sid 
   INNER JOIN Boats B ON R.bid = B.bid
WHERE color <> 'red'

When we see the word “not” in an information need, we must be cautious. Sometimes the SQL will be more involved than others. The fourth one was the simplest. The first three are more complex

NOT IN

We’ve so far seen the word “not” in our SQL with the NOT IN operator, but we haven’t used it. To find sailors that have not reserved a boat (information need #1 above), we can rewrite the information need to indicate a value not in a list is required.

Rewritten information need: Find the names of sailors that are not in the list of sailors that have reserved a boat.

Just as with intersection queries we must create a list of sid’s to represent sailors.

SELECT sname
FROM Sailors
WHERE sid NOT IN (SELECT sid
                                    FROM Reserves)

The second information need is very similar if we rewrite it to contain not in.

Rewritten information need: Find the names of red boats that are not in the list of boats that have been reserved

Just as with sailors, we need to generate a list of boat id’s for the inner query. We must return boats that are red AND are not in that list of ids. So the WHERE clause will have two tests in it–one for color and one for non-participation with NOT IN.

SELECT bname
FROM Boats
WHERE color = 'red' AND bid NOT IN (SELECT bid
                                                                   FROM Reserves)

Now we may deal with information need #3, finding sailors that have not reserved a red boat. There’s nothing new about this one. We can rewrite the information need either literally or in our imagination and then translate that into SQL.

Rewritten information need: Find the names of sailors that are not in the list of sailors that have reserved a red boat.

We need to generate the list of id’s of sailors that have reserved red boats. For this list, we must use the Reserves table and the Boats table because we need sid’s of sailors with reservations and colors of boats.

SELECT sname
FROM Sailors
WHERE sid NOT IN (SELECT sid
                                   FROM Reserves R INNER JOIN Boats B
                                      ON R.bid = B.bid
                                   WHERE color = 'red')

Outer Join

There is another way to write each of these non-participation queries that was mentioned above. We can use special type of join called an OUTER JOIN. So far we have seen the INNER JOIN operation which combines two tables on the left and right side of INNER JOIN by matching up rows from each table that satisfy the join condition. The rows satisfying the join condition from either table are the rows that participate in the join. Any rows from either table that don’t show up in the result do not participatel they are non-participating rows. An OUTER JOIN is a similar operation that can be thought of as “INNER JOIN + non-participating rows”.

Information need: Find the names of sailors that have reserved only red boats.

This one doesn’t have “not” in it, but it’s still very similar because “only red” also means “not any other colors”. We have to recognize that the above query returns sailors who also have no reservations (because they are not in the list of sailors who have reserved red boats), so if we modify the outer query to only check for sailors with reservations not being in the list, we will have it:

SELECT sname
FROM Sailors S INNER JOIN Reserves R1 ON S.sid = R1.sid
WHERE S.sid NOT IN (SELECT sid
                                      FROM Reserves R2 INNER JOIN Boats B 
                                         ON R2.bid = B.bid
                                      WHERE B.color <> 'red')

At this point, we can speed through queries like “Find sailors by name that have reserved a green boat but not a red boat.” We specialize the outer query to deal with reservations for green boats and keep the inner query the same:

SELECT sname
FROM Sailors S INNER JOIN Reserves R1 ON S.sid = R1.sid
   INNER JOIN Boats B1ON R1.bid = B1.bid 
WHERE B1.color = 'green' AND 
   S.sid NOT IN (SELECT sid
                           FROM Reserves R2 INNER JOIN Boats B2 
                              ON R2.bid = B2.bid
                           WHERE B.color <> 'red')

There’s another kind of query where the word may not be immediately obvious, but it’s lurking in the background. Here’s an example: “Find the names of all sailors, and if a sailor has reserved a boat, also include the ids of every boat they have reserved.” Uh-oh. This ones kind of tricky. We’re seemingly being asked for two things here. One is to return all sailors, even those that have not reserved a boat. The other is to match up sailors with the ids of boats they’ve reserved. The sailors that have reserved boats participate in the join with reserves, but those that never have do not. Here we need both kinds of sailors.

There’s a special group of operations called an OUTER JOINs that return both participating and non-participating rows. There are in fact three of these OUTER JOINs, left outer joins (LEFT JOIN), right outer joins (RIGHT JOIN), and full outer joins (not actually available in the DBMS we use). What they all have in common is that they give us the INNER JOIN results we’re used to by now plus information about the rows that do not participate in joins.

Remember that the FROM clause with an INNER JOIN looks like

FROM LeftTable INNER JOIN RightTable ON ...

The different kinds of outer join depend on whether the table on the left, the right, or both the table on the left and the one on the right are treated specially. In a LEFT JOIN, the table on the left (the first table) is treated differently. Every row from that table will be in the query results whether or not it participates in the join. The syntax is otherwise just like INNER JOIN with the join condition following the ON keyword.

SELECT *
FROM Sailors S LEFT JOIN Reserves R ON S.sid = R.sid
sidsnameratingagesidbiddate
22Dustin745.022101Nov. 29, 2012
22Dustin745.022102Nov. 29, 2012
22Dustin745.022103Nov. 29, 2012
22Dustin745.022104Nov. 29, 2012
29Brutus133.0NULLNULLNULL
31Lubber855.531102Nov. 29, 2012
31Lubber855.531103Nov. 29, 2012
31Lubber855.531104Nov. 29, 2012
32Andy825.5NULLNULLNULL
58Rusty1035.0NULLNULLNULL
64Horatio735.064101Nov. 29, 2012
64Horatio735.064102Nov. 29, 2012
71Zorba1016.0NULLNULLNULL
74Horatio935.074103Nov. 29, 2012
85Art325.5NULLNULLNULL
95Bob363.5NULLNULLNULL
The results of Sailors LEFT JOIN Reserves include all sailors that have reserved a boat along with those that haven’t.

A RIGHT JOIN is similar except the non-participating rows from the right table (the second table) are included along with the INNER JOIN results. In a full outer join, non-participating rows from both tables would be included. Not every database includes all three joins. MySQL, for instance includes LEFT JOIN and RIGHT JOIN but not a full outer join, and Microsoft Access includes only LEFT JOIN. The lack of RIGHT JOIN is not a serious issue, though, for one just need to remember to always write the “special” table first and use a LEFT JOIN whenever one of these outer joins is needed.

If you’re keeping up, you’ll go “wait a minute. If the row from a table doesn’t participate in a join, then there’s no row from the other table to match it up with. What about the columns from the other table that appear in the result?” You’re right about that, and that’s why the outer join is special. SQL creates null values for every column in the other table to “fill in” the missing spaces in the results returned by an outer join. So, for instance, for Sailors LEFT JOIN Reserves, all three Reserves table columns would be full of NULL values for Sailors that have not reserved a boat.

This property gives us another way to write non-participation queries.

Information Need: Find the names of sailors that have not reserved a boat.

SELECT sname
FROM Sailors S LEFT JOIN Reserves R ON S.sid = R.sid
WHERE R.sid IS NULL

In this query above, we could have used any column from Reserves to test for NULL. Using the foreign key in the join condition is a good choice, though, for it makes our intention clear to only retrieve non-participating rows from the Sailors table.

Information Need: Find the names of all sailors, and if a sailor has reserved a boat, also include the ids of every boat they have reserved.

So back to the problematic information need that started the coverage of outer joins. We can use an outer join to construct queries where you are asked for multiple pieces of information, but some of those are “optional” and may not appear in the answer. The names of sailors are mandatory but the ids of boats are optional. We need to include both those columns in the SELECT clause, and the optional ones may sometimes have NULL values. That’s OK. Since we want all the sailors, participating and non-participating ones, we don’t need to filter by participation in the WHERE clause. In fact, we don’t need to filter at all, so the WHERE clause can be omitted.

SELECT sname, bid
FROM Sailors S LEFT JOIN Reserves R ON S.sid = R.sid

Before you read further, go ahead and try to write SQL for this next one where the id of boats has been changed to the names.

Information Need: Find the names of all sailors, and if a sailor has reserved a boat, also include the names of every boat they have reserved.

The obvious approach is to recognize the boat names are in the Boats table and add an INNER JOIN with Boats, too, because that’s what we’re used to doing.

SELECT sname, bid
FROM Sailors S LEFT JOIN Reserves R ON S.sid = R.sid
   INNER JOIN Boats B ON R.bid = B.bid

If you execute this query, you’ll see that only the participating sailors (those that have reserved a boat) apper. The reason for this is that rows for non-participating sailors will have a NULL value for R.bid, and the join condition can’t be satisfied when R.bid IS NULL. So only rows with non-NULL R.bid values can evaluate to TRUE. This is subtle, but it’s important to understand that joining multiple tables when outer joins are involved requires care. The solution is to first join the Reserves and Boats tables to ensure every reservation can be matched up with its boat. The table resulting from that join can then be outer joined with the Sailors table. There are two ways to make this happen. One is to add () to the query above to ensure the INNER JOIN occurs first. This addition of () is analogous to writing 2 + (3 + 4) on a calculator to force the addition of 3 and 4 to happen before the first term gets added.

SELECT sname, bname
FROM Sailors S LEFT JOIN (Reserves R INNER JOIN Boats B ON R.bid = B.bid) ON S.sid = R.sid

The other solution is to remember that RIGHT JOIN is a thing. We can put the Sailors table after the join operators that way and not have to worry about ().

SELECT sname, bname
FROM Reserves R INNER JOIN Boats B ON R.bid = B.bid 
   RIGHT JOIN Sailors S ON R.sid = S.sid