Let’s look at some columns from the Member table from the Golf Club database. Every golfer seems to have a MemberID and LastName. Not every golfer has a handicap or coach. Attributes with missing data or no values in a database hold a special value named NULL.

MemberIDLastNameHandicapCoach
118McKenzie30153
138Stone30NULL
153Nolan11NULL
176BranchNULLNULL
178BeckNULLNULL
Some tables have empty cells or missing data represented by NULL values

These NULL values can mean different things depending on where they appear. For the Handicap column it may mean the value hasn’t been entered yet or that the golfer is not keeping score and cannot have their handicap calculated. NULL values in the handicap column aren’t really meaningful. For the Coach column, a NULL value means something more precise; a golfer with a NULL in the Coach column does not have a coach. Remember also that Coach is a FOREIGN KEY referencing MemberID, the PRIMARY KEY of this table. Not every golfer has a coach (partial participation we saw way back with the ERD for the golf club), so the NULL values reflect that partial participation in the FOREIGN KEY column. If every golfer was required to have a coach (total participation) we would not allow these NULL values in this column; we would have to add a constraint when creating the database that requires values in each row of the Coach column.

It’s not safe to assume that every relationship with partial participation results in a FOREIGN KEY that can be NULL. The “Enters” relationship in “Member Enters Tournament” is a many-to-many relationship with partial participation. Not every golfer must enter a tournament, and not every tournament must have entries associated with it. Because of the many-to-many cardinality, this relationship in the ERD became a table in the table. Every row in that Entry table reflects a connection (relationship) between a member and a tournament. It doesn’t make sense for either the MemberID or the TourID FOREIGN KEYs to be NULL in this case. There’s no concept of a connection if both values are not present. A table resulting from a many-to-many relationship cannot have NULL FOREIGN KEY values because they would break the referential integrity of the relationship. Every row must refer to participants in the relationship; values cannot be missing.

Using NULL in SELECT FROM WHERE

If our information need is “find the ID of golfers whose handicap is known” or equivalently “find the ID of golfers with a handicap”, we need to filter rows to remove those golfers whose Handicap value is NULL. Or said another, we want to return golfers whose handicap column does not have a NULL in it. It’s tempting to use = and <> but that doesn’t work. In SELECT FROM WHERE, we use IS NULL and IS NOT NULL to check for NULL values in the WHERE clause. The query that answers the information need is

SELECT MemberID
FROM Member
WHERE Handicap IS NOT NULL

The opposite information need is “find the ID of golfers who do not have a handicap”. For this one we still need to filter rows, but we want to return the rows we did not return for the first one. We need to return the rows that do have NULL values:

SELECT MemberID
FROM Member
WHERE Handicap IS NULL

Note carefully that you use IS NOT NULL to find golfers that have handicaps and IS NULL to find golfers who do NOT have handicaps. The NOT is often the opposite of what students expect if they think through the problem too quickly.

We can write the same kinds of IS and IS NOT NULL queries using the Coach column. Try these out on your own first.

Find the last names of golfers that do not have coaches.
SELECT LastName
FROM Member
WHERE Coach IS NULL

Here we wanted golfers without a coach, so we need rows with a NULL value in the Coach column. We test for NULL instead of NOT NULL.

Find the last names of golfers that have coaches.
SELECT LastName
FROM Member
WHERE Coach IS NOT NULL

Here we wanted golfers with a coach. These golfers will have their coach’s memberID in their coach column, so we need rows with an actual value (NOT a NULL value) in the Coach column. We test for IS NOT NULL in the WHERE clause.

Three-valued logic

Above we saw the right way to write queries testing for IS NULL or IS NOT NULL. Why can’t we use = or <> to check? Go to our database front end and try out queries on the Member table with WHERE Coach = NULL or WHERE Coach <> NULL. They don’t work! They both return the empty table instead of the correct list of golfers. Never use = or <> when testing for NULL.

We’ve seen that the only rows we get back in a query that contains the optional WHERE clause are rows that evaluate to TRUE using the logical condition in the WHERE clause. So Coach = NULL must never evaluate to TRUE for any row. But also Coach <> NULL must never evaluate to TRUE for any row. If neither case evaluates to TRUE, then neither case evaluates to FALSE either. What’s going on?

The what’s going on is that databases use three-valued logic. TRUE and FALSE are only 2 of 3 possible outcomes of a comparison. The third outcome is UNKNOWN; the database doesn’t know if a comparison is TRUE or FALSE. A NULL value in column means the value for that column is not known. It may be because there is no participation as with the Coach column’s NULLs, or it may be that the data have simply not been entered yet. In either case, the value is UNKNOWN. If we have a test like WHERE Coach = 153, that test will evaluate to TRUE for any rows (like the first row in the Member table) that have a value of 153. It will evaluate to FALSE for any rows that have a different numeric value in the Coach column. It will evalulate to UKNOWN for rows that have a NULL value in the Coach column. The database takes the point of view that the value might be 153, but it also might not. The database can’t tell because the value is unknown, so the result is UNKNOWN. UNKNOWN is not the same as TRUE, so we do not get rows back that have a NULL value for Coach and we do not get rows back that have a value different from 153. Here is an illustration:

MemberIDCoachCoach = NULLCoach <> NULLCoach = 153Coach <> 153Coach IS NULLCoach IS NOT NULL
461235UUFTFT
415235UUFTFT
331153UUTFFT
235153UUTFFT
487NULLUUUUTF
469NULLUUUUTF
Possible comparisons in three-valued logic. T represents TRUE, F False, and U UNKNOWN. Only the IS and IS NOT comparisons yield simple results. The other comparisons can yield UNKNOWN when NULL values are possible.

The information above is pretty technical and precise. We don’t always have to think that precisely about NULL values. If we compare column values against a specific value with = or <>, we just don’t get rows with NULL back in our results. If we need to specifically find which columns do or do not have NULL values, then we must use IS NULL or IS NOT NULL in the WHERE clause instead of = or <>.

When we have multiple comparisons in the WHERE clause with AND or OR and NULL values are possible, we need to be a little bit careful. That third possibility complicates things. Remember we have seen how AND and OR behave in earlier notes that showed their “truth tables”. With three-valued logic, the truth tables are really a little bit more complicated.

The truth tables for three-valued logic are shown below. The way we reason about these is that an UNKNOWN value is not known. It may be TRUE or it may be FALSE. Since either outcome is possible, we have to ask how it affects the logical operation. A test like NOT UKNOWN would be TRUE if the UNKNOWN value were FALSE, but we just don’t know whether it is or not. So the value of the test is UNKNOWN. With AND, the result is only TRUE if both values are TRUE. If we have TRUE AND UNKNOWN or UNKNOWN AND TRUE, it’s possible the result could be TRUE or FALSE depending on what the UNKNOWN test is. We can’t tell the result, so the result is UNKNOWN. If we have FALSE AND UNKNOWN or UNKNOWN and FALSE, it doesn’t matter what the result of the UNKNOWN test is. The one FALSE is enough to guarantee a FALSE result. Similar reasoning can be used for OR, and you can work that out on your own.

XYNOT XX AND YX OR Y
TRUETRUEFALSETRUETRUE
TRUEFALSEFALSEFALSETRUE
TRUEUNKNOWNFALSEUNKNOWNTRUE
FALSETRUETRUEFALSETRUE
FALSEFALSETRUEFALSEFALSE
FALSEUNKNOWNTRUEFALSEUNKNOWN
UNKNOWNTRUEUNKNOWNUNKNOWNTRUE
UNKNOWNFALSEUNKNOWNFALSEUNKNOWN
UNKNOWNUNKNOWNUNKNOWNUNKNOWNUNKNOWN
Truth tables for three-valued logic. Focus on the bottom three rows since they summarize the new behavior with UNKNOWN values.

The good news is that in I308 NULL values mixing with ANDs or ORs won’t cause any problems. That’s not always the case. This pagehttps://learnsql.com/blog/understanding-use-null-sql/ shows an example where problems arise when computing with NULL values in a table. Later we’ll see other subtle problem that can arise with NULL, but right now the main thing to remember is to use IS NULL or IS NOT NULL to test their values.

NOT NULL constraints

When creating a table in a database, it’s possible to specify column-by-column whether NULL values are allowed in a column. PRIMARY KEY columns don’t need these specification because they can never be NULL. If the column(s) in the PRIMARY KEY must be unique, you need a value to check that uniqueness. Non-key columns by default can have NULL values unless you specify a NOT NULL. The reason NOT NULL is the default is because it’s the least restrictive. Constraints should be specified and not applied by default. If we required that every golfer have a MemberID, FirstName, and LastName while any other column could have NULL values, we only have to constraint FirstName and LastName. All the columns with NULL allowed need no extra constraints, and the PRIMARY KEY constraint on MemberID already enforces “no NULLs allowed in this column”.

CREATE TABLE Member (
  MemberID INT PRIMARY KEY,
  FirstName VARCHAR(20) NOT NULL,
  LastName VARCHAR(20) NOT NULL,
  Handicap INT,
  Coach INT,
  -- ... other columns omitted for brevity
  FOREIGN KEY (Coach) REFERENCES (MemberID)
);