Red/Blue Boat Queries
Now let’s focus on information needs 3 and 4 from the previous notes having to do with sailors reserving red or blue boats in number 3 and red and blue boats in number 4.
- Find the names of all sailors that have reserved a red boat OR a blue boat.
- Find the names of all sailors that have reserved a red boat AND a blue boat.
To come to grips with the difference between these, consider the table below and work out the results of each of the correct answers to 3 and 4.
sid | sname | color |
---|---|---|
22 | Dustin | blue |
22 | Dustin | red |
22 | Dustin | green |
31 | Lubber | red |
31 | Lubber | green |
64 | Horatio | blue |
64 | Horatio | red |
74 | Horatio | green |
For information need 3 (red or blue boats), a sailor who has reserved only 1 of the 2 colors will appear in the results. So Dustin, who has reserved both colors, appears along with Lubber and Horatio (sid 64) who have only reserved 1 color each. Think about how to write this query in SQL and then verify you did it correctly.
Find the names of all sailors that have reserved a red boat OR a blue boat.
This one is not too bad. We can conceptually think we need names of sailors and colors of boats. We will need the sailors and boats table. They have no relationship (key pairing) between them, so we must join each with the Reserves table in order to build a table containing information about each sailor and a bot they have reserved. From that table, we can keep reservations where the color of the boat is red or the color of the boat is blue. Of these reservations, we just need to return the name of the sailor involved.
SELECT DISTINCT sname FROM Sailors S INNER JOIN Reserves R ON S.sid = R.sid INNER JOIN Boats B ON B.bid = R.bid WHERE color = 'red' OR color = 'blue'
Many students want to write this as “WHERE color = ‘red’ OR ‘blue'”, but that doesn’t work. That’s a common pitfall to avoid. Each condition on either side of OR needs be a comparison of the form column name compared to value.
It’s tempting to many students to take the SQL above and change OR to AND in order to write SQL that generates the results needed for #4 (sailors who have reserved both a red boat and a blue boat). The problem is that color holds an atomic value (because all database tables are in 1NF), so if color = ‘red’ is TRUE, color = ‘blue’ will be FALSE and vice versa.In other words, color = ‘red’ AND color = ‘blue’ will always be false, and we may as have WHERE FALSE for our WHERE clause. Our description of how WHERE worked indicated we only receive rows in the result that have the WHERE clause evaluate to TRUE, so with WHERE color = ‘red’ AND color = ‘blue’, we will always get back 0 rows, the empty table. We have to learn another way to write this query.
First, let’s learn about the IN operator which will turn out to be a new tool we need to write the query.
The IN operator
Before we figure out why we can’t write the second query just as simply, let’s see another way to write this query. It seems silly and/or stupid to have to test twice for the same attribute having one value or another. Wouldn’t it be great do that in one fell swoop? SQL gives us a way to search for a value in a list–we can use the IN keyword. It’s syntax is value IN tuple, and in the WHERE clause, we usually use the name of a column as a stand-in for the value of that column in each particular row: column name IN tuple. Tuple is a vocabulary word from when we were learning the relational model. Recall that it describes a collection or list of values like a row in a table or a record. You can substitute list most anytime tuple appears below if that matches your mental model better; they are essentially interchangeable. We can hard-code a list or tuple by putting values in () separated by commas. For the two colors, we can use (‘blue’,’red’) as a tuple of strings. The query would look like
SELECT S.sname FROM Sailors S INNER JOIN Reserves R ON S.sid = R.sid INNER JOIN Boats B ON R.bid = B.bid WHERE B.color IN ('red','blue')
When we’re using IN, we can imagine that we’ve rewritten the information need a little bit. Instead of “Find the names of all sailors that have reserved a red boat OR a blue boat.” we’re solving it as “Find the names of all sailors who have reserved a boat whose color is in the list ‘red’, ‘blue’.” Most any query that needs IN to be written can be reworded with some concept of “in a list” or “not in a list”. For a list of just two values it seems like overkill; using IN isn’t really any better than two comparisons separated by OR. But if we were checking for lots of values, using IN can be a real time saver.
Information need: Find the names of all sailors that have reserved boats that are red or yellow or green or brown or scarlet or black or ochre or peach or ruby or olive or violet or fawn or lilac or gold or chocolate or mauve.
SELECT sname FROM WHERE color IN ('red', 'yellow', 'green', 'brown', 'scarlet', 'black', 'ochre', 'peach', 'ruby', 'olive', 'violet', 'fawn', 'lilac', 'gold', 'chocolate', 'mauve')
To summarize, the IN operator is used as value IN tuple and evaluates to TRUE if value appears at least one time in the tuple or list. It is logically equivalent to value = ANY tuple, and some students discover that. However, = ANY is not idiomatic SQL, and it should be avoided. Use IN instead like the rest of the SQL world. There is also a NOT IN operator that behaves very similarly. The syntax is value NOT IN tuple, and it evaluates to TRUE if the value is not in the list or tuple.
What does ‘red’ IN (‘red’,’blue’,NULL) evaluate to?
‘red’ is in the tuple, so it evaluates to TRUE. Remember IN means “equal to at least on item in the tuple”
What does ‘red’ NOT IN (‘green’,’blue’) evaluate to?
‘red’ is not in the tuple, so it evaluates to TRUE. Remember NOT IN means “not equal to any values in the tuple”.
What does ‘red’ IN (‘green’,’blue’,NULL) evaluate to?
‘red’ is not equal to any values in the tuple, but the NULL value represents an unknown value. It may be ‘red’, or it may be something else. So it is impossible to conclude red is or is not in the list. The value cannot be TRUE, and the value cannot be FALSE. It evaluates to UNKNOWN. This is important! Using IN with tuples that may contain NULL is the biggest ‘gotcha’ to watch out for when working with NULL values in I308!
What does ‘red’ NOT IN (‘green’,’blue’,NULL) evaluate to?
‘red’ is not equal to any values in the tuple, but the NULL value represents an unknown value. It may be ‘red’, or it may be something else. So it is impossible to conclude red is or is not in the list. The value cannot be TRUE, and the value cannot be FALSE. It evaluates to UNKNOWN.
What does ‘red’ NOT IN (‘red’,’blue’,NULL) evaluate to?
‘red’ is in the tuple, so it doesn’t matter that NULL is also in that list. Since there is an exact match with the value, the value is in the tuple and the expression evaluates to FALSE.
As a rule of thumb, IN and NOT IN are logical opposites whenever there is an exact match. So if value IN tuple is TRUE because of an exact match, then value NOT IN tuple will be FALSE for the same value and tuple. If there is not an exact match AND the tuple contains no NULL values, then they remain logical opposites. If there is a NULL value and no exact match in the tuple, then they both evaluate to UKNOWN. Interestingly, since NOT UNKNOWN = UNKNOWN, they are still logical opposites, but this is a little bit counterintuitive at first.
Creating tuples or lists dynamically
Now, let’s look finally look in detail at writing SQL for the information need #4, the names of sailors that reserved both a red and a blue boat. The main SQL concept we need is that
a single column table can be used as a tuple or list of values
For nested queries we’ve seen two ways to create them:
- A 1×1 table is equivalent to the single value stored in that table
- An Mx1 table (1 column, many rows) is equivalent to the list of values stored in that column.
It’s easy to generate a tuple of values directly from the database. Anytime we have only one column in the SELECT clause, we can generate the list of values from that column.
SELECT color FROM Boats
generates the tuple (‘red’,’green’,’blue’). If you run this query, you may see the colors appear in a different order depending on the state of the database. The order doesn’t matter in a list of values since we’re looking for matches anywhere in the list.
To write the red and blue boat query, let’s rewrite the information need to make clear how we can search for a value in a list with it.
Original Information Need: Find the names of all sailors that have reserved a red boat AND a blue boat.
Rewritten Information Need: Find the names of all sailors that have reserved a red boat and are in the list of sailors that have reserved a blue boat.
We can write SQL for this query and solve the problem. The only detail we need to be able to manage is generating a list of sailors that have reserved a particular color of boat. To generate a list, we need to create a single-column table. Since it’s a list of sailors, we need to generate a list of values that uniquely identify sailors. We can’t generate a list of names, we must generate a list of ids because sid is a primary key of the Sailors table but sname is not. Below is the list of sailors that have reserved a blue boat.
SELECT BlueR.sid FROM Reserves BlueR INNER JOIN Boats BlueB ON BlueR.bid = BlueB.bid WHERE BlueB.color = 'blue'
The table aliases are a little longer than usual to remind us that these tables are all for sailors that have reservations for blue boats. Note that we don’t need to use the sailors table to generate this list because the Reserves table contains those id’s, too.
Now that we have the list of blue boat sailors, we can finally write down the full query. It’s helpful to use table alias names for red boats just like we used for blue boats above. The first part of the rewritten information need is “Find the names of all sailors that have reserved a red boat”
SELECT sname FROM Sailors S INNER JOIN Reserves RedR ON S.sid = RedR.sid INNER JOIN Boats RedB ON RedR.bid = RedB.bid WHERE RedB.color = 'red'
The second part of the information need is “and are in the list of sailors that have reserved a blue boat.” We need to add our blue-boat list of sailors to the query now. Here is how we think of the next step
SELECT sname FROM Sailors S INNER JOIN Reserves RedR ON S.sid = RedR.sid INNER JOIN Boats RedB ON RedR.bid = RedB.bid WHERE RedB.color = 'red' AND sid IN (list of blue boat sailors)
and then we finally substitute in the inner SELECT FROM WHERE query that generates the list of sailors.
SELECT sname FROM Sailors S INNER JOIN Reserves RedR ON S.sid = RedR.sid INNER JOIN Boats RedB ON RedR.bid = RedB.bid WHERE RedB.color = 'red' AND sid IN (SELECT BlueR.sid FROM Reserves BlueR INNER JOIN Boats BlueB ON BlueR.bid = BlueB.bid WHERE BlueB.color = 'blue')
Intersection Queries
The query we just wrote down is an example of an intersection query. We could once again rewrite the information need as “Find the names in the intersection of all sailors that reserved red boats with all those that reserved blue boats.” By separately generating two lists, one for red-boat sailors and one for blue-boat sailors, we are finding sailors in both lists (the intersection of the two lists) when we use the AND sid IN “connector” to indicate the sailors we want in our result should appear in both lists of sailors.