Now that we’ve seen tables, we can begin to carry out one of the most important database tasks–querying the database. Query means to ask a question, and when we ask the database a question, it gives us any appropriate data it has stored as the answer. We’ll call the question asked the information need, although we won’t always write it so it’s grammatically a question. Sometimes it will be a statement. For example, the statement “Find the number of sailors.” is an equivalent information need to the question “How many sailors are there?”
Although the information need or the question is a plain English sentence in our class, the answer is not. Databases return tables as responses to queries. The SQL query language allows us to describe the table we need that matches the information need, and the database management system fills in that table we describe with the data it contains that matches the description.
Let’s take a look at some examples of the first (information need) and last (result table) parts of the query process without worrying about the SQL we would have to write to get the result. A good exercise is to look back at the Sailors table below and make sure you can see what the answer to each information need should be before expanding it to reveal the result.
sid | sname | rating | age |
---|---|---|---|
22 | Dustin | 7 | 45.0 |
29 | Brutus | 1 | 33.0 |
31 | Lubber | 8 | 55.5 |
32 | Andy | 8 | 25.5 |
58 | Rusty | 10 | 35.0 |
64 | Horatio | 7 | 35.0 |
71 | Zorba | 10 | 16.0 |
74 | Horatio | 9 | 35.0 |
85 | Art | 3 | 25.5 |
95 | Bob | 3 | 63.5 |
information Need: Find all information about all sailors
All information means all columns are needed, and all sailors means all rows because each row is a sailor. So the result for this information need should be the entire table above.
Information Need: Find the names and ages of all sailors.
sname | age |
---|---|
Dustin | 45.0 |
Brutus | 33.0 |
Lubber | 55.5 |
Andy | 25.5 |
Rusty | 35.0 |
Horatio | 35.0 |
Zorba | 16.0 |
Horatio | 35.0 |
Art | 25.5 |
Bob | 63.5 |
Information Need: Find all information about all sailors whose rating is 7.
sid | sname | rating | age |
---|---|---|---|
22 | Dustin | 7 | 45.0 |
64 | Horatio | 7 | 35.0 |
Information Need: Find the names and ages of sailors whose rating is 7.
sname | age |
---|---|
Dustin | 45.0 |
Horatio | 35.0 |
These 4 illustrate all the possible ways in which a results table can be built. A query can:
- Return an entire table (all rows, all columns)
- Return a subset of a table’s columns and all rows (some columns, all rows)
- Return all columns but a subset of the rows (all columns, some rows)
- Return a subset of the columns for a subset of the rows (some columns, some rows)
The SQL query language gives us ways to describe the information need that essentially involve us telling the database what kind of table to construct and return to us in response. When you read an information need and are aware of what the results should look like (which rows, which columns), it’s significantly easier to write an SQL query than when you’re flying blind.
The basic SQL query statement is the SELECT FROM WHERE statement. Each of the parts SELECT, FROM, and WHERE are known as clauses (“the SELECT clause”, “the FROM clause”, “the WHERE clause”). There are other optional clauses we’ll see later, too. Here is a skeleton of what it looks like:
SELECT comma-separated list of column name(s) FROM table [WHERE row condition is true]
The [] above are a common convention when describing programming language syntax (the rules that must be followed for your program to be valid). They mean [stuff inside here is optional]. So the WHERE clause is optional, but it’s important enough to be included in the name (people say “SELECT FROM WHERE” queries much more often than they say “SELECT FROM” queries) and most queries include the WHERE clause anyway.
How to write a SELECT FROM WHERE query
Information Need: Find all information about all sailors.
Result: All rows and all columns from the Sailors table
Step 1: Figure out which table(s) you need to build the result. Tables go in the FROM clause.
For the example information needs shown above, the Sailors table contained all the information needed to generate the response. Queries where all the information is contained in a single table are unsurprisingly called single-table queries, and that’s what we’ll learn first. For a single table query, once you know which table contains the information needed for the result, you use that table’s name in the FROM clause:
SELECT FROM Sailors WHERE
Step 2: Figure out which column(s) you need in the result. They go in the SELECT clause.
If you need all the column names, then you can use a wildcard character * to mean “all the columns”. So for the first example information need “Find all information about all sailors.” we know that all rows and all columns of the sailors table are needed. Since all columns are needed, we can use SELECT * and since we’re using the Sailors table, we can use FROM sailors. So far we have
SELECT * FROM Sailors
Step 3. The WHERE clause is optional. If you need all rows (you don’t need to filter rows), then you’re done.
In fact, that’s all we need here because when we want all the rows from a table in our response, we omit the WHERE clause. The purpose of the WHERE clause is to filter out rows so that only some of them make the cut and appear in the response. So the query above is our first successful SELECT FROM WHERE query even though it has no WHERE clause.
It’s also possible to write this one as
SELECT sid, sname, age, rating FROM Sailors
but this form should not be recommended because we’ve seen above that tables can change in a database. If the Sailors table is modified so that column names change, columns are added, or columns are removed, then this query would no longer return all information about all sailors. It’s safer to use the wildcard because it will always work for this table. SQL queries are often saved and ran over and over again, so writing them to be robust and failsafe is a necessity.
Information Need: Find the names and ages of all sailors.
Result: All rows, only the sname and age from the Sailors table
Step 1: Figure out which table(s) you need to build the result. Tables go in the FROM clause.
Just like the previous information need, we are using the Sailors table.
SELECT FROM Sailors WHERE
Step 2: Figure out which column(s) you need in the result. They go in the SELECT clause.
The main difference with this one is we have to specify the columns we need by name. Since the column names go in a comma-separated list in the SELECT clause, this will work:
SELECT sname, age FROM Sailors WHERE
Step 3. The WHERE clause is optional. If you need all rows, then you’re done.
SELECT sname, age FROM Sailors
The WHERE Clause
The WHERE clause is complex enough to get its own section. We’ve already seen that its purpose is to filter rows so only some rows in a table appear in the result. The way it filters rows is to deal with 1 row at a time and test the row using the data in that row. If the test is passed, then the row is in the result. If the test is failed, then it is not. The test in the both the sample information needs above is for sailors with a rating of 7. Note that this test is based on the values stored in the record for each sailor. If the sailor’s rating is equal to 7, they are in the answer. If the rating is not, the sailor is not. So for all ten sailors in the table, the database would go row-by-row (i.e. sailor by sailor because each row is a record) and compare the value in the rating column to 7. If they are equal, that row is added to the result table. If they are not equal, that row is not added to the result table. Note also that even though there are 10 sailors, we can use the column name to represent the value in that column for each sailor. Column names are the building blocks of the WHERE clause.
Information Need: Find all information about all sailors whose rating is 7.
Result: Only 2 rows for the sailors with a rating of 7, all columns from the Sailors table
Step 1: Figure out which table(s) you need to build the result. Tables go in the FROM clause.
Step 2: Figure out which column(s) you need in the result. They go in the SELECT clause.
These two steps should be old hat by now. We can use the column wildcard for all columns FROM the Sailors table.
SELECT * FROM Sailors WHERE
Step 3. The WHERE clause is optional. If you need all rows, then you’re done.
We only need certain rows. We move on to Step 4 for the first time!
Step 4. What condition determines which rows you keep for your result? That’s your WHERE clause.
Like the introduction to this section describes, the rating column is the one whose data is used to filter rows. If the value in the rating column is equal to 7, we keep that row for our result. If it is not equal to 7, we do not keep that row. Let’s write it like Java pseudocode:
start with an empty table to hold the result for row in the Sailors table: let rating be the value stored in the rating column for this row if rating equals 7 add the row to the result else do not add the row to the result
That loop over rows shows that whether or not a row is in our result boils down to a TRUE/FALSE test. Rows that evaluate to TRUE for that test are in our result, and those that do not are not. That’s how the WHERE clause works. We put the TRUE/FALSE test in the WHERE clause. We use the name of a column as a shorthand for the value stored in that column for each row. The DBMS loops overs rows and uses the values in the columns in that row to do the TRUE/FALSE test, adding the rows that pass the test to our result. So we can now finish our first SELECT FROM WHERE query with a WHERE clause by recognizing the TRUE/FALSE test is rating = 7. Note that to test for equality in SQL we only use one equals sign in contrast to Java where we use two equals signs.
SELECT * FROM Sailors WHERE rating = 7
The text “rating = 7” is called a comparison. The column name is used to stand in the for the value stored in that column for each row. Because the rating is either equal to 7 or it is not in the Sailors table, the comparison will be either TRUE or FALSE. Those are written in ALL CAPS to emphasize they are part of SQL. The database will go row-by-row through the table in the FROM clause and evaluate the test/comparison in the WHERE clause. Those that evaluate to true are added to the result table. The equals sign in the comparison is called the comparison operator. There are many comparison operators in SQL, but we’ll only use a few of them in this course.
Operator | Operation |
---|---|
= | Equals |
<> | Not Equals |
> | Greater than |
< | Less than |
>= | Greater than or equals |
<= | Less than or equals |
Comparison operators are called binary operators because they have two operands. Operand here means “value in the comparison”. We have to have two values to compare them, and binary means two. Either operand can be a column name, in which case the value in each row is substituted in for the column name. Either operand can also be a constant value independent of the row, like 7 is here. So “rating = 7” or “7 = rating” will both work fine because either operand can be the column name. The one we saw first where the column name appears on the left of the comparison is the typical way to write this, though. Likewise “7 = 8” would be valid because either or both operands can be constants.
Although 7 = 8 is valid, why isn’t it useful to compare two constants?
The table above indicates the comparison operators work for other ordered types, too. In addition to INTEGERs (whole numbers), SQL includes FLOATs (decimal numbers), and CHAR (strings or words) types. All of these are ordered because INTEGER and FLOAT are both numbers and have numeric ordering and CHAR types have alphabetic ordering. The only thing different about CHAR types is that we need to enclose the value in single or double quotes.
Information need: Find the rating of all sailors named Horatio.
We need to use the Sailors table again, we only want the rating column, and we only want certain sailors (rows) back. We write:
SELECT rating FROM Sailors WHERE sname = 'Horatio'
or
SELECT rating FROM Sailors WHERE sname = "Horatio"
Boolean Logical Operators and Order of Operations
Information Need: Find the names of all sailor under 30 with a rating of 7.
In this one we need two things to be TRUE about any sailors we get back in the result table. They need to have a rating of 7 AND their age needs be under (less than, <) 30. In the previous examples we only had one comparison in the WHERE clause, but now we need more than one. INFO students have programmed before and are likely fine with this information need. Not all HIM students may have seen these concepts, so this sections provides some more details.
We can combine multiple comparisons with the Boolean logical operators AND or OR. Either one expects something that can be TRUE or FALSE (like a comparison) on each side of the operator. The definition of AND is that AND returns TRUE if and only if the value on both sides is TRUE. AND returns FALSE otherwise. OR is more flexible and returns TRUE if any value on either side is TRUE. OR only returns FALSE when both values are FALSE. This behavior can be summarized in the truth tables below.
X | Y | X AND Y | X OR Y |
TRUE | TRUE | TRUE | TRUE |
TRUE | FALSE | FALSE | TRUE |
FALSE | TRUE | FALSE | TRUE |
FALSE | FALSE | FALSE | FALSE |
In a query, we don’t use the constant values TRUE and FALSE. Instead we use comparisons or combinations of comparisons. X and Y are just abstract symbols above that we can interpret as “value of a comparison”. So if X is “rating = 7” and Y is “age < 30” we end up with
SELECT sname FROM Sailors WHERE rating = 7 AND age < 30
where we had to use AND because the information need implicitly stated that both tests needed to be satisfied.
A lot of students want to surround each comparison in (). Don’t do that! It makes your query harder to read. It’s such a bad pratice, that I won’t even type out what it looks like; simply note that no () appear in the query.
Exercises
Try to write SQL for each of these information needs. Expand each one to see the solution.
Find the ages of all Sailors that are not named Dustin.
SELECT age FROM Sailors WHERE sname <> 'Dustin'
Find the names and ratings of all Sailors that have a rating below 9 or an age of 25 or more.
SELECT sname, rating FROM Sailors WHERE rating < 9 OR age >= 25
We’ve seen the basic SELECT FROM WHERE syntax that lets us build a result set in response to an information need. The result set is another name for the table returned by a SELECT FROM WHERE query. Here we will see some ways to customize that result set by computing new values from existing data and controlling the order and/or amount of data.