We’ve seen that a single record is a collection of related attributes, each of which contains values. We’ve also seen that storing groups of records is more typical, and that the records may have relationships between themselves. So a single record is like a set or list of values where each value is named or labeled by its attribute:
It’s easy to take this picture of a single record and extend it to multiple records. When we do that we end up with this:
In database terminology, there’s a fancy name for this object–a table. A table is generically something that has rows and columns where any particular row and column define a cell that can hold a value. You can create these kinds of tables in MS Word or soup them up with formulas and action in an Excel spreadsheet. Many databases store data in tables, and most databases that do this are called relational databases. We’ve seen the terminology relational data model before; we’re learning about the building blocks of relational databases, tables, right now. After learning about tables, we’ll deal with the full relational model.
In a relational database, a table consists of rows and columns, but it’s more structured than that. Each row is a record, so all the values in a row are related to each other because they all belong to the same record. The table itself will have a name, and the attribute names will act like column headings in the table. Each “cell” in the table denotes a particular attribute’s value (labeled by the column name) for the record in that row. Cell is in quotes in the previous sentence because it’s not really database terminology. It’s spreadsheet terminology that has become commonly-used with generic tables, too, but not so much with database tables.
There are other sets of vocabulary that are commonly used with database tables. They are summarized in the table (how ironic) below.
Table | Row | Column |
Record | Value | |
Relation | Tuple | Attribute |
Entity Set | Entity | Attribute |
Here is a database table as shown in the web app we’ll use this semester. With a view like this it’s easy to see the columns names and values for each record in the table.
Here is the same data as viewed from the console (terminal) using the popular database MySQL. It’s more old-timey looking, but it has the same information in it. One thing this view illustrates which will be very important in i308 is that databases return information only in response to precisely-written statements. The one here is “SELECT * FROM Sailors”. A database administrator using a DBMS directly may type those statements in while a program or web app like we looked at the first day of the course would send exactly the same statement to the database in a message in order to request its response. We’ll see much more about these statements in the next section.
mysql> SELECT * -> FROM Sailors; +-----+---------+--------+------+ | sid | sname | rating | age | +-----+---------+--------+------+ | 22 | Dustin | 7 | 45 | | 29 | Brutus | 1 | 33 | | 31 | Lubber | 8 | 55.5 | | 32 | Andy | 8 | 25.5 | | 58 | Rusty | 10 | 35 | | 64 | Horatio | 7 | 35 | | 71 | Zorba | 10 | 16 | | 74 | Horatio | 9 | 35 | | 85 | Art | 3 | 25.5 | | 95 | Bob | 3 | 63.5 | +-----+---------+--------+------+ 10 rows in set (0.00 sec)
Database Queries
Now that we’ve seen tables, we can begin to carry out one of the most important database tasks–querying the database. Query means to ask a question, and when we ask the database a question, it gives us data as the answer. We’ll call the question asked the information need, although we won’t always write it so it’s grammatically a question. Sometimes it will be a statement. For example, the statement “Find the number of sailors.” is an equivalent information need to the question “How many sailors are there?”
I308 would be a lot easier if we could give the database the English-language information need directly, but we can’t. Relational databases understand a language called SQL (pronounced like the 3 letters and NOT pronounced like “sequel”) which is short for the Structured Query Language. It contains different parts called sublanguages, the three most relevant of which are shown below. “Sublanguage” makes them sound more complicated than they are. Each sublanguage consists of only a few statements, so it’s basically just a way to categorize the different things SQL includes to let the database user create, change, or query tables. SQL is a very old language, and a common convention is to put SQL in ALL CAPS, as you can see in the table, too.
SQL sublanguage | Purpose | SQL statements |
DDL – Data Definition Language | Define table names, column names, and constraints Add, change, or remove column names and/or constraints | CREATE TABLE ALTER TABLE |
DML – Data Manipulation Language | Add data to tables Remove data from tables Change data in tables | INSERT INTO DELETE FROM UPDATE |
Query Language | Return responses to information needs | SELECT FROM WHERE |
More specifically, databases return tables as responses to queries. Based on the SQL statements column in the table above, getting information from a database involves:
- Determining the information need in plain English
- Converting or translating that need to an SQL SELECT FROM WHERE statement
- The database executes the SELECT FROM WHERE query
- A table is returned in response to the query
Before we see this in action, let’s take a look at some examples of the first (information need) and last (result table) parts of that process without worry about the SQL yet. A good exercise is to look back at the table above and make sure you can see what the answer should be before revealing it below.
Information Need: Find all information about all sailors.
Information Need: Find the names and ages of all sailors.
Information Need: Find all information about all sailors whose rating is 7.
Information Need: Find the names and ages of sailors whose rating is 7.
These 4 illustrate all the possible ways in which a results table can be built. A query can:
- Return an entire table (all rows, all columns)
- Return a subset of a table’s columns and all rows (some columns, all rows)
- Return all columns but a subset of the rows (all columns, some rows)
- Return a subset of the columns for a subset of the rows (some columns, some rows)
The SQL query language gives us ways to describe the information need that essentially involve us telling the database what kind of table to construct and return to us in response. When you read an information need and are aware of what the results should look like (which rows, which columns), it’s significantly easier to write an SQL query than when you’re flying blind.
The sublanguage table above shows that the basic SQL query statement is SELECT FROM WHERE. Each of the parts SELECT, FROM, and WHERE are known as clauses (“the SELECT clause”, “the FROM clause”, “the WHERE clause”). There are other optional clauses we’ll see later, too. Here is a skeleton of what it looks like:
SELECT comma-separated list of column name(s) FROM table [WHERE row condition is true]
The [] above are a common convention when describing programming language syntax (the rules that must be followed for your program to be valid). They mean [stuff inside here is optional]. So the WHERE clause is optional, but it’s important enough to be included in the name (people say “SELECT FROM WHERE” queries much more often than they say “SELECT FROM” queries) and most queries include the WHERE clause anyway.
For the example information needs shown above, the Sailors table contained all the information needed to generate the response. Queries where all the information is contained in a single table are unsurprisingly called single-table queries, and that’s what we’ll learn first. For a single table query, once you know which table contains the information needed for the result, you use that table’s name in the FROM clause:
SELECT FROM Sailors WHERE
If you need all the column names, then you can use a wildcard character * to mean “all the columns”. So for the first example information need “Find all information about all sailors.” we know that all rows and all columns of the sailors table are needed. Since all columns are needed, we can use SELECT * and since we’re using the Sailors table, we can use FROM sailors. So far we have
SELECT * FROM Sailors
In fact, that’s all we need here because when we want all the rows from a table in our response, we omit the WHERE clause. The purpose of the WHERE clause is to filter out rows so that only some of them make the cut and appear in the response. So the query above is our first successful SELECT FROM WHERE query even though it has no WHERE clause.
It’s also possible to write this one as
SELECT sname, age, rating FROM Sailors
but this form should not be recommended because we’ve seen above that tables can change in a database. If the Sailors table is modified so that column names change, columns are added, or columns are removed, then this query would no longer return all information about all sailors. It’s safer to use the wildcard because it will always work for this table. SQL queries are often saved and ran over and over again, so writing them to be robust and failsafe is a necessity.
The second information need was “Find the names and ages of all sailors.” We noted it also returned all rows but only some columns. Since we still don’t need the WHERE clause (all rows returned, so no WHERE clause), we’ve also seen enough to write this query. Since the column names go in a comma-separated list in the SELECT clause as shown above, this will work:
SELECT sname, age FROM Sailors
The WHERE Clause
The WHERE clause is complex enough to get its own section. We’ve already seen that its purpose is to filter rows so only some rows in a table appear in the result. The way it filters rows is to deal with 1 row at a time and test the row using the data in that row. If the test is passed, then the row is in the result. If the test is failed, then it is not. The test in the both the sample information needs above is for sailors with a rating of 7. Note that this test is based on the values stored in the record for each sailor. If the sailor’s rating is equal to 7, they are in the answer. If the rating is not, the sailor is not. So for all ten sailors in the table, the database would go row-by-row (i.e. sailor by sailor because each row is a record) and compare the value in the rating column to 7. If they are equal, that row is added to the result table. If they are not equal, that row is not added to the result table. Note also that even though there are 10 sailors, we can use the column name to represent the value in that column for each sailor. Column names are the building blocks of the WHERE clause.
SELECT * FROM Sailors WHERE rating = 7
The text “rating = 7” is called a comparison. The column name is used to stand in the for the value stored in that column for each row. Because the rating is either equal to 7 or it is not, the comparison will be either TRUE or FALSE. Those are written in ALL CAPS to emphasize they are part of SQL. The database will go row-by-row through the table in the FROM clause and evaluate the test/comparison in the WHERE clause. Those that evaluate to true are added to the result table. The equals sign in the comparison is called the comparison operator. There are many comparison operators in SQL, but we’ll only use a few of them in I308.
Operator | Operation |
---|---|
= | Equals |
<> | Not Equals |
> | Greater than |
< | Less than |
>= | Greater than or equals |
<= | Less than or equals |
Comparison operators are called binary operators because they have two operands. Operand here means “value in the comparison”. We have to have two values to compare them, and binary means two. Either operand can be a column name, in which case the value in each row is substituted in for the column name. Either operand can also be a constant value independent of the row, like 7 is here. So “rating = 7” or “7 = rating” will both work fine because either operand can be the column name. The one we saw first where the column name appears on the left of the comparison is the typical way to write this, though. Likewise “7 = 8” would be valid because either or both operands can be constants.
Although 7 = 8 is valid, why isn’t it useful to compare two constants?
The table above incidates the comparison operators work for other ordered types, too. In addition to INTEGERs (whole numbers), SQL includes FLOATs (decimal numbers), and CHAR (strings or words) types. All of these are ordered because INTEGER and FLOAT are both numbers and have numeric ordering and CHAR types have alphabetic ordering. The only thing different about CHAR types is that we need to enclose the value in single or double quotes.
Information need: Find the rating of all sailors named Horatio.
We only want the rating column, and we only want certain sailors (rows) back. We write:
SELECT rating FROM Sailors WHERE sname = 'Horatio'
or
SELECT rating FROM Sailors WHERE sname = "Horatio"
Boolean Logical Operators and Order of Operations
Information Need: Find the names of all sailor under 30 with a rating of 7.
In this one we need two things to be TRUE about any sailors we get back in the result table. They need to have a rating of 7 AND their age needs be under (less than, <) 30. In the previous examples we only had one comparison in the WHERE clause, but now we need more than one. INFO students have programmed before and are likely fine with this information need. Not all HIM students may have seen these concepts, so this sections provides some more details.
We can combine multiple comparisons with the Boolean logical operators AND or OR. Either one expects something that can be TRUE or FALSE (like a comparison) on each side of the operator. The definition of AND is that AND returns TRUE if and only if the value on both sides is TRUE. AND returns FALSE otherwise. OR is more flexible and returns TRUE if any value on either side is TRUE. OR only returns FALSE when both values are FALSE. This behavior can be summarized in the truth tables below.
X | Y | X AND Y | X OR Y |
TRUE | TRUE | TRUE | TRUE |
TRUE | FALSE | FALSE | TRUE |
FALSE | TRUE | FALSE | TRUE |
FALSE | FALSE | FALSE | FALSE |
In a query, we don’t use the constant values TRUE and FALSE. Instead we use comparisons or combinations of comparisons. X and Y are just abstract symbols above that we can interpret as “value of a comparison”. So if X is “rating = 7” and Y is “age < 30” we end up with
SELECT sname FROM Sailors WHERE rating = 7 AND age < 30
where we had to use AND because the information need implicitly stated that both tests needed to be satisfied.
A lot of students want to surround each comparison in (). Don’t do that! It makes your query harder to read. It’s such a bad pratice, that I won’t even type out what it looks like; simply note that no () appear in the query.
Exercises
Try to write SQL for each of these information needs. Expand each one to see the solution.
Find the names and ratings of all Sailors that have a rating below 9 or an age of 25 or more.
SELECT sname, rating FROM Sailors WHERE rating < 9 OR age >= 25
Find the ages of all Sailors that are not named Dustin.
SELECT age FROM Sailors WHERE sname <> 'Dustin'