The “Sailors Reserve Boats” ERD and database we’ve been using in these notes is from the textbook Database Management Systems by Ramakrishnan and Gehrke. That database stores information about sailors at a boating club who take different boats out sailing at different times.

The Sailor Reserves Boat ERD

We’ve seen the schema diagram for this database before, too. We can see how the many-to-many Reserves relationship becomes the Reserves table in the database. Note that each line in the ERD representing the Sailor entity and the Boat entity in the reservation becomes a foreign key in the schema diagram. Each foreign key references (points to) a primary key, resulting in a key pairing or primary key-foreign key pairing. This pairing is how relationships are indicated between tables in a database.

Schema diagram for the Sailors Reserve Boats database

To complete the tour of the database, we can look at each table to see the data stored in this instance of it. First is the Boats table where each row is a record storing information about the available boats. The output shows how the table would be presented if the SQL was typed in directly to the MySQL DBMS.

mysql> SELECT *
    -> FROM Boats;
+-----+-----------+-------+
| bid | bname     | color |
+-----+-----------+-------+
| 101 | Interlake | blue  |
| 102 | Interlake | red   |
| 103 | Clipper   | green |
| 104 | Marine    | red   |
+-----+-----------+-------+
4 rows in set (0.00 sec)

Here is the Sailors table in a similar format.

mysql> SELECT * 
    -> FROM Sailors;
+-----+---------+--------+------+
| sid | sname   | rating | age  |
+-----+---------+--------+------+
|  22 | Dustin  |      7 |   45 |
|  29 | Brutus  |      1 |   33 |
|  31 | Lubber  |      8 | 55.5 |
|  32 | Andy    |      8 | 25.5 |
|  58 | Rusty   |     10 |   35 |
|  64 | Horatio |      7 |   35 |
|  71 | Zorba   |     10 |   16 |
|  74 | Horatio |      9 |   35 |
|  85 | Art     |      3 | 25.5 |
|  95 | Bob     |      3 | 63.5 |
+-----+---------+--------+------+
10 rows in set (0.00 sec)

If we try to keep track of which sailor reserved or used which boat at which time, we don’t have enough information between the Sailors and Boats tables. We also need to record information of the form (sailor, boat, time). In other words, a sailor (row from the Sailors table), a boat (row from the Boats table), and the time define a reservation of a boat by a sailor. This reservation is a third record that the database must manage. Each kind of records is stored in its own table, so we end up with that third table Reserves that represents these reservations.

mysql> SELECT * 
    -> FROM Reserves;
+-----+-----+------------+
| sid | bid | date       |
+-----+-----+------------+
|  22 | 101 | 2012-11-29 |
|  22 | 102 | 2012-11-29 |
|  22 | 103 | 2012-11-29 |
|  22 | 104 | 2012-11-29 |
|  31 | 102 | 2012-11-29 |
|  31 | 103 | 2012-11-29 |
|  31 | 104 | 2012-11-29 |
|  64 | 101 | 2012-11-29 |
|  64 | 102 | 2012-11-29 |
|  74 | 103 | 2012-11-29 |
+-----+-----+------------+
10 rows in set (0.00 sec)

Queries requiring more than one table

Information Need: What are the names of Sailors that have reserved boats?

The Sailors table is clearly needed for this query, for it is the only table with information about sailors names. A quick look at the data above should convince you that not every sailor has reserved a boat yet, so some sailors shouldn’t appear in the answer. The only way to tell whether a sailor has reserved a boat is to see if that sailor appears in a row of the reserves table. So we have to use two tables in this query. We can ask whether we need the third table, Boats, in this query, and we can hopefully convince ourselves we don’t require any information specific to the Boats table (name & color). We also don’t even care which boat (which bid) the sailor reserved. Just whether or not they reserved at least one boat.

Having established we need the Sailors and Reserves table for this query, we can note that there is a relationship (key pairing) between these two tables in the schema diagram. This relationship indicates how we match up a sailor with their reservation. We want the same sid value to appear from each table. Below are two rows, one for each table, corresponding to a sailor and one of their reservations. The rows match up, correspond, or “glue together” because both are for the same sid, which uniquely represents any and every sailor since it is the primary key of the Sailors table. It’s in the Reserves table as a foreign key. It exists in that table to refer to exactly one row in the Sailors table.

mysql> SELECT * 
    -> FROM Sailors
    -> WHERE sid = 74;
mysql> SELECT * FROM Sailors WHERE sid = 74;
+-----+---------+--------+------+
| sid | sname   | rating | age  |
+-----+---------+--------+------+
|  74 | Horatio |      9 |   35 |
+-----+---------+--------+------+
1 row in set (0.00 sec)


mysql> SELECT *
    -> FROM Reserves
    -> WHERE sid = 74;
+-----+-----+------------+
| sid | bid | date       |
+-----+-----+------------+
|  74 | 103 | 2012-11-29 |
+-----+-----+------------+
1 row in set (0.00 sec)

The two rows above can be matched up meaningfully to tell us that Horatio, age 35, reserved boat 103 on Nov. 29, 2011. If we wanted more information about this reservation, we would also match up the row from Reserves with the row for Boat 103 from the Boats table to find out the boat name and color. Here we do not have to do that since the information need does not require any information from the third table.

Looking at the Sailors and Reserves tables above, we could create all the matches by hand and answer the information need that way. That’s tedious, and it would not be practical for larger tables. Of course we need the DBMS to compute this for us. The two rows that were glued together above represent a single row in a table the DBMS will build for us to compute the answer to the query. This table is a temporary one that only exists while the query result is being computed, so it’s not part of the schema. Below is what it looks like.

+-----+---------+--------+------+-----+-----+------------+
| sid | sname   | rating | age  | sid | bid | date       |
+-----+---------+--------+------+-----+-----+------------+
|  22 | Dustin  |      7 |   45 |  22 | 101 | 2012-11-29 |
|  22 | Dustin  |      7 |   45 |  22 | 102 | 2012-11-29 |
|  22 | Dustin  |      7 |   45 |  22 | 103 | 2012-11-29 |
|  22 | Dustin  |      7 |   45 |  22 | 104 | 2012-11-29 |
|  31 | Lubber  |      8 | 55.5 |  31 | 102 | 2012-11-29 |
|  31 | Lubber  |      8 | 55.5 |  31 | 103 | 2012-11-29 |
|  31 | Lubber  |      8 | 55.5 |  31 | 104 | 2012-11-29 |
|  64 | Horatio |      7 |   35 |  64 | 101 | 2012-11-29 |
|  64 | Horatio |      7 |   35 |  64 | 102 | 2012-11-29 |
|  74 | Horatio |      9 |   35 |  74 | 103 | 2012-11-29 |
+-----+---------+--------+------+-----+-----+------------+
10 rows in set (0.00 sec)

The operation used to compute this special table is called an INNER JOIN. An INNER JOIN is a way to combine two tables by matching up one row from each table based on a join conditon–a TRUE/FALSE test that uses values from each row, keeping those combinations of rows that pass the test as TRUE. In this example, the join condition between the INNER JOIN of the Sailors and Reserves table is that a Sailors row and a Reserves row match up if and only if both rows are for the same sailor. If and only if both rows have equal values in their sid columns.

This join condition is using the primary key column of the Sailors table and the sid foreign key column of the Reserves table. These two columns are the ones in the Sailors-Reserves relationship with the primary key foreign key pairing. INNER JOINs are the SQL operation that combines tables based on their key pairing relationships. 99.9% of the time, INNER JOINs will take place on a key pairing. This knowledge is invaluable. If you know the key pairings and figure out which tables are needed in a query, there will be no guesswork writing the INNER JOINs. If you neglect this concept, then every INNER JOIN will be a struggle because you won’t have the correct thought process for solving the mini-problem that each query presents.

Note that the same Sailor row can appear more than once in the INNER JOIN table shown above, indicating that that sailor has made multiple reservations. Also note that the sid column appears twice in the results, once for each table that it belongs to. Note also that the total number of columns is the number of columns in the Sailors table plus the number from the Reserves table. That is, all the possible columns are retained when gluing together or matching up rows in this way. Finally, note that the number of rows in the INNER JOIN table is the same as in the table with the foreign key–the Reserves table. Each foreign key row matches up with exactly one Sailor and not the other way around.

Going back to the original information need, we note that we have built a table containing Sailors with their matching reservations. We have created a table with the rows we need, and we don’t need to filter the rows any more. The WHERE clause can be left out of our query. We only need to return names, though, so we need to make sure we only specify that column in the SELECT clause. The only new syntax we need to learn is the FROM clause where the INNER JOIN happens (where the new table is built to use in the query).

SELECT sname 
FROM Sailors INNER JOIN Reserves ON Sailors.sid = Reserves.sid

Because INNER JOIN requires two tables (it’s what we earlier in Java called a binary operator), we write one table name on the left side and one on the right side. After the right-hand table, we add the keyword ON and then follow that with the join condition. The join condition is a test that is TRUE or FALSE similar to what we have seen for the WHERE clause. Here we want the same sailor from each table, and since both column names are called sid, we have to say which table each sid comes from. We do that by putting the name of the table followed by a dot. We can always specify exactly which table a column comes from

SELECT Sailors.sname
FROM Sailors INNER JOIN Reserves ON Sailors.sid = Reserves.sid

Some SQL programmers always do this, and some never do. In this course it’s up to you. It’s required to state which table a column is in when the column name is ambiguous, i.e. it could be in more than one table. This required indication of the table name to resolve ambiguous column names is called qualification. When it’s unique/unambiguous like sname, it’s optional to specify the table it came from.

Aliases for tables

Typing long table names repeatedly in a query can get tedious, and longer queries are generally harder to read and think about than shorter one. It’s possible to rename a table in the FROM clause with an alias and use that alias to refer to the table every else. The way to create the alias is to place it right after the table name in FROM as shown in the first example below. It’s also common to use the AS keyword to indicate “here comes an alias”, but AS is optional. I tend to not use it in my own SQL; the choice is yours.

SELECT S.sname
FROM Sailors S INNER JOIN Reserves R ON S.sid = R.sid;

-- the AS keyword is optional, here is the same example including it
SELECT S.sname
FROM Sailors AS S INNER JOIN Reserves AS R ON S.sid = R.sid;

How INNER JOIN is used in SELECT FROM WHERE

The process the database uses to build a result table from a SELECT FROM WHERE query is:

  1. Specify a single table (FROM clause)
  2. Filter rows (WHERE clause)
  3. Return columns (SELECT clause)

The thought process for using more than one table in a query is that we use the separate, existing tables in the database to build a new table we need that will contain our query results. We may have to remove some of the rows from that new table, and we may have to remove columns from that table to generate our results. The main way to build a new table from two existing tables is to INNER JOIN those two tables. An INNER JOIN is an operation that starts with two tables and matches up rows between the two tables. A join condition is a test that is used to determine which rows can be matched up. The third table that is the result of the INNER JOIN will contain all the columns from both tables. Now that we have more than one table, we can be more general and accurate about the process:

  1. A single table with all the info needed for the query result may not exist in the database. It may be a combination of one or more tables based on their relationships in the database. If it’s a single table, specify it. If the info is in more than one table, INNER JOIN those tables based on the relationships (primary key-foreign key pairings) between tables. (FROM clause)
  2. Filter rows (WHERE clause)
  3. Return columns (SELECT clause)

Joining more than two tables

Information need: What are the sailor names and boat names for each reservation?

We need the Sailors table for sname. We need the Boats table for bname. At this point, it’s tempting to try FROM Sailors INNER JOIN Boats, but that won’t work because we don’t have any columns in common to use for the join condition. Said another way, there’s no arrow between Sailors and Boats in the schema diagram, so we don’t have a key pairing to use for the join. We have to use the Reserves table to match up sailors with their reservation, and we need to use the Reserves table to match up boats with their reservation. Sailors INNER JOIN Reserves involves the Reserves.sid/Sailors.sid key pairing. Boats INNER JOIN Reserves will involve a Reserves.bid/Boats.big key pairing. It’s tempting to think we’ll have to use the Reserves table twice, once for each join, but that’s not how it works. We’ve already seen that Sailors INNER JOIN Reserves will give us a table that has all the columns from Reserves, so it will have a bid column in it.

  1. Sailors INNER JOIN Reserves is a table.
  2. That table has a bid column in it.
  3. INNER JOIN requires two tables. The two tables don’t have to be single tables in our database. They can be tables we have built in the query.
  4. So we build a table using INNER JOIN and then INNER JOIN that one with Boats–(Sailors INNER JOIN Reserves) INNER JOIN Boats

The () in #4 above are meant to suggest the way we write arithmetic sometimes. 3 + 4 + 5 can be written as (3 + 4) + 5 to emphasize that adding three numbers consists of two steps: 1. add two numbers and 2. add the third number to the intermediate result. INNER JOIN is the same way. We can join two tables and then join the third table to the intermediate result. The syntax is

SELECT S.sname, B.bname
FROM Sailors S INNER JOIN Reserves R ON S.sid = R.sid 
   INNER JOIN Boats B ON R.bid = B.bid;

The original tables in the first INNER JOIN still keep their identity in the query, so we can still refer to R and S separately even though they are always combined as “S INNER JOIN R”; this is just a convenience that SQL gives us. To aid in readability, I put the second INNER JOIN on a new line and indented it, but this choice is only cosmetic. It also emphasizes that the first INNER JOIN builds one table, and then that newly-built table is joined with the third table in the query.

This section concludes by showing the full table resulting from joining all three tables together and keeping all the columns.

mysql> SELECT * 
    ->FROM Sailors S INNER JOIN Reserves R ON S.sid = R.sid 
    ->  INNER JOIN Boats B ON R.bid = B.bid;
+-----+---------+--------+------+-----+-----+------------+-----+-----------+-------+
| sid | sname   | rating | age  | sid | bid | date       | bid | bname     | color |
+-----+---------+--------+------+-----+-----+------------+-----+-----------+-------+
|  22 | Dustin  |      7 |   45 |  22 | 101 | 2012-11-29 | 101 | Interlake | blue  |
|  22 | Dustin  |      7 |   45 |  22 | 102 | 2012-11-29 | 102 | Interlake | red   |
|  22 | Dustin  |      7 |   45 |  22 | 103 | 2012-11-29 | 103 | Clipper   | green |
|  22 | Dustin  |      7 |   45 |  22 | 104 | 2012-11-29 | 104 | Marine    | red   |
|  31 | Lubber  |      8 | 55.5 |  31 | 102 | 2012-11-29 | 102 | Interlake | red   |
|  31 | Lubber  |      8 | 55.5 |  31 | 103 | 2012-11-29 | 103 | Clipper   | green |
|  31 | Lubber  |      8 | 55.5 |  31 | 104 | 2012-11-29 | 104 | Marine    | red   |
|  64 | Horatio |      7 |   35 |  64 | 101 | 2012-11-29 | 101 | Interlake | blue  |
|  64 | Horatio |      7 |   35 |  64 | 102 | 2012-11-29 | 102 | Interlake | red   |
|  74 | Horatio |      9 |   35 |  74 | 103 | 2012-11-29 | 103 | Clipper   | green |
+-----+---------+--------+------+-----+-----+------------+-----+-----------+-------+
10 rows in set (0.00 sec)

Exercises

What colors boats have been reserved?
SELECT color
FROM Boats B INNER JOIN Reserves R ON B.bid = R.bid
What are the names of sailors that have reserved red boats?
SELECT sname
FROM Sailors S INNER JOIN Reserves R ON R.sid = S.sid INNER JOIN Boats B ON R.bid = B.bid
WHERE color = 'red'
What are the id’s of Boats that have been reserved?

Be careful with this one! You don’t need the Boats table here because the bid is already in the Reserves table. So this is a single-table query instead of an INNER JOIN one.

SELECT bid
FROM Reserves