So far, we’ve worked with single table queries and with multiple table queries where we joined two tables together on a primary key-foreign key pairing using INNER JOIN in the FROM clause. These kinds of queries answer the information needs we’ve encountered so far, but a lot of information needs are more complicated. Consider these two information needs for the Sailors Reserve Boats database?
- Find the names of all sailors with a rating less than 7.
- Find the names of all sailors with a rating less than Dustin’s.
One of these we know how to write in SQL, and one of these we don’t. Can you tall which is which? Likewise, consider these two:
- Find the names of all sailors that have reserved a red boat OR a blue boat.
- Find the names of all sailors that have reserved a red boat AND a blue boat.
Once again we’ve only seen enough SQL to answer one of these. The only difference between these two queries is the OR and the AND they each have.
We should be able to write down SQL for #1 and #3 immediately. The second ones of each pair (#2 and #4) are complicated enough that they will dominate a lot of the SQL we look at for the remainder of the semester. We’ll start with the second one and devote quite a bit of time to it before getting back to the last one in the next set of notes.
So, without further ado, let’s write down the answer to the first one. Remember our general game plan so far for writing down SQL for a query:
- Figure out which tables are involved. Put them in the FROM clause using INNER JOIN to combine multiple tables.
- Figure out which conditions are filtering the rows we return. Those get added to the WHERE clause.
- Figure out which attributes are involved. Put them in the SELECT clause.
We can carry out 1-3 in any order mentally, but the database always determines the query results in the order given above. A single table is built based on the information in the FROM clause, and the WHERE clause is used to filter rows in that table. Finally the SELECT clause determines which columns are in the final result set that is the response to the information need.
Dustin’s Rating
With that recap, let’s write down SQL for the first information need above. Try it on your own as a refresher before reviewing the solution.
Find the names of all sailors with a rating less than 7.
There’s not a lot that needs to written about this one. We only need to use the sname and rating columns, and those are only in the Sailors table. Our FROM clause is simple. The rating less than 7 is used to determine which sailors (rows from the Sailors table) we get in our results. That gives us the WHERE clause. We only need the sname column in our final result, so the SELECT column is finalized, too.
SELECT sname FROM Sailors WHERE rating < 7
The difference between information need 1 and information need 2 is that instead of a rating of 7’s, we need to use Dustin’s rating. Many students want to find Dustin’s rating from the Sailors table and just use that in the query. Since Dustin’s rating happens to be 7 (what a coincidence!), students’ thinking is that the SQL above is the solution. There are two problems with that logic.
- In a different instance of the database, the sailor named Dustin may have a different rating
- Even if Dustin’s rating is 7 now, it may not be 7 later.
To write a general query, we have to use Dustin’s rating from the database at the instant the query is executed. We’re not allowed to hard-code values in when those values are subject to change or subject to being incorrect. Even though Dustin’s rating is 7 right now, the two informations needs are distinct because in the first one, 7 will never change in that information need. In the second one the value of Dustin’s rating may well change.
Writing the Dustin’s Rating Query
To get started writing the query that satisfies the information need #2 “Find the names of all sailors with a rating less than Dustin’s.”, let’s first write a related query.
Find Dustin’s rating.
We know how to do this one. Here’s the answer:
SELECT rating FROM Sailors WHERE sname = 'Dustin'
Because there’s only 1 sailor named Dustin, this query will return 1 row. Because we only have one column in the SELECT clause, it will only return 1 column. Although we think about the answer to this information need as a single number, 7, the database gives us that result in a 1×1 (1 row x 1 column) table that holds that value in its only cell.
A 1×1 table in SQL can be used as if it were the value stored in the only cell in that table.
With this knowledge, we can now tackle the Dustin’s rating query. First, let’s remember the SQL for the first information need “Find the names of all sailors with a rating less than 7”
SELECT sname FROM Sailors WHERE rating < (7)
The only thing I did different was put () around 7. The thing we need to do now is replace that single value with a SELECT FROM WHERE query that generates a 1×1 table containing that value. Here’s the process one more time:
- Start by writing a query with a hard-coded value. If you don’t know the actual, correct value, who cares? Just make one up.
- Put parentheses around that value in the WHERE clause
- Replace the value in ()’s with a SELECT FROM WHERE query that returns the value you need as a 1×1 table. A 1×1 table can be used as a value in a query, and SELECT FROM WHERE queries can be used to create 1×1 tables.
Following that procedure, I’ve already done 1 and 2, so now I’m going to follow step 3, and replace 7 with a SELECT FROM WHERE query that returns 1 row and 1 column. it’s easy to guarantee I get 1 column back; I just have one column in the SELECT clause. Here because there’s only one row with sname Dustin in the Sailors table, we get the 1 row back.
SELECT sname FROM Sailors WHERE rating < (SELECT rating FROM Sailors WHERE sname = 'Dustin')
This actually work, and it returns the correct Sailors. Go try it! This is our first example of a nested query, which is a term meaning a SELECT FROM WHERE query that has 1 or more other SELECT FROM WHERE queries “nested” inside it. We have to write nested queries most anytime we are answering an information need that depends on value(s) already stored in the database like Dustin’s rating here. The beginning SELECT FROM WHERE query is called the outer query. Any other ones we add that are nested inside it are called inner queries. Inner queries are used to compute values or tables that the outer query needs to use. If we first write the outer query with hard-coded values, then we can replace those values with inner queries. This is a good approach to carry out literally at first. With more practice, you may start to recognize the patterns that nested queries solve, and then you can write them down without any intermediate steps using hard-coded values.
Nested queries can be confusing at first because I used the Sailors table twice. They’re distinct in the query, and you can think of them as two different copies of the sailors table–one in the outer query and one in the inner one. If you want to make that more explicit, you can add aliases:
SELECT sname FROM Sailors S1 WHERE rating < (SELECT rating FROM Sailors S2 WHERE sname = 'Dustin')
but the result of the query is the same in either case. SQL automatically treats the inner query as its own scope separate from the outer query. In fact, since they are independent, we can think of this as a two step process. The inner query gets evaluated first to yield the value of 7, and then the outer query gets executed using that value. This is actually how the database optimizes queries like this, for there’s no need to compute the inner query for every row of the outer query. (Remember we previously described the WHERE clause as doing the same thing for each row in the table built from FROM.) Instead the inner query can be computed once and its value saved to be used by the outer one. Queries in which this optimization is possible are called uncorrelated nested queries. These are the most common kinds of nested queries, and a more precise definition of uncorrelated nested query is one in which the inner query does not depend on the outer query. The inner query uses no values from the outer query. The opposite kind of nested query is called a correlated nested query, but we only see these kinds of queries a little bit in I308 and later in the course.
When writing nested queries, it’s good to let the indentation remind you which parts belong to the outer and inner queries, so take care with formatting these. Add whitespaces to keep each SELECT FROM WHERE query aligned as best you can.
Horatio’s rating
Now, let’s do the same thing to find all the sailors whose rating is greater than Horatio’s? Well, it doesn’t work. The database gives us the following error message. Note that ‘subquery‘ is a synonym for inner query.
The DBMS returned the following error message: (1242, ‘Subquery returns more than 1 row’)
Why did it do that? Because there are two Horatios in our instance of the sailor’s database. rating > (…) is a comparison between two values, and it thus expects the inner query to return a 1 x 1 table, but with two Horatios, we return a 2 x 1 table. What do we do in this situation? Well, SQL to the rescue! There are two keywords ANY and ALL which we can put between the comparison operator (> in this example) and the inner query. If we put ALL, then the comparison must be TRUE for ALL values in the list generated by the inner query in order for the condition in WHERE to evaluate to TRUE. If we use ANY (or its synonym SOME in some databases), then the comparison must be TRUE for at least one value in the list in order for the condition to be TRUE.
All that’s a little abstract, so let’s see a few examples to make it more precise.
What is the result of 2 < ANY (1, 3)
In this example, the () are used to hold a tuple, a list of values. We can’t use something like 2 < (1,3) in SQL or math because we don’t know how to compare 2 to the two different values in any meaningful way. With ANY or ALL, we can though. With ANY, we compare 2 against all the values individually and see there values. So 2 < 1 is FALSE and 2 < 3 is TRUE. Or we can think of it as the tuple (FALSE, TRUE). So the rule for ANY is that the comparison is TRUE if any of the comparisons are TRUE. Since one of the comparisons is, 2 < ANY (1,3) is TRUE.
What is the result of ‘Lubber’ > ALL (‘Horatio’, ‘Dustin’)
Here we’re comparing strings and using ALL instead of ANY. The result will be TRUE only if all the individual comparisons are TRUE. Since we’re comparing strings, we’re consider alphabetical order. ‘B’ > ‘A’ and ‘Brutus’ > ‘Albus’ because the first one is greater than (comes later in abc order) than the second one.
So for the two comparisons in this one, Lubber comes after Horatio and after Dustin. Each comparison is TRUE, or they’re (TRUE, TRUE) as a tuple. Since all the comparisons are TRUE, the result is TRUE.
So let’s use these ideas in a query now.
Information need: Find the names of sailors whose ratings are less than some sailor name Horatio.
The word “some” in the information need tells us the sailors in the result don’t have to have a rating less than ALL Horatios. They have to have a rating less than at least one sailor named Horatio, so we can write this as an ANY query.
SELECT sname FROM Sailors WHERE rating < ANY (SELECT rating FROM Sailors WHERE sname = 'Horatio')