We’ve seen the INNER JOIN before where two tables are “combined” based on a primary-foreign key pairing. The foreign key references the primary key, and the primary key uniquely identifies an entire row. In that way, a row containing the foreign key is combined with the row containing the primary key. In the Sailors Reserve Boats database, that works out to be possible only between the Sailors/Reserves pairing and the Boats/Reserves pairing. Let’s switch gears and look at the golf club database since it has a little more potential for different pairings. In particular, it has the three tables representing Members Enter Tournaments from the ERD, and it has that nifty property that Members Coach other Members. Members Enter Tournaments is exactly the same as Sailors Reserve Boats, so any query we can write for sailors and boats can be rewritten for golfers and tournaments. That doesn’t mean we can find the color of a tournament a golfer enters the same we can find the color of a boat. It means we join the tables together in the same way, filter by row (WHERE clause), and then return the necessary columns (SELECT) for the final answer. They’re both many-to-many relationships in either database, so they’re “structurally” or conceptually the same.
A special case of the INNER JOIN is a self join. As far as I know, SQL doesn’t have a keyword for it, so it’s written lower case. When the foreign key references a primary key in the same table, we have to use two copies of the table for the join, so the table is joined with itself. We shouldn’t be scared of two copies of the same table now. We’ve seen it with nested and intersection queries before. If our query is “Find the names of all coaches,” we’re in a bit of a pickle. We can pretty easily find the names of all members, but coaches is a little tougher. To match up a golfer with a coach we use the coach foreign key and join it with the memberID primary key of a second copy of the same Member table. Let’s call one copy S for students and C for coaches, so our query would be something like
SELECT * FROM Member S INNER JOIN Member C ON S.coach = C.MemberID
where this would give us all the information about golfers combined with their coaches. Notice that students have the coach property and we use the list of coaches for the primary key. Now we can deduce that S.LastName gives us the names of students, and C.LastName gives us the names of coaches. So our query would be something like
SELECT C.FirstName, C.LastName FROM Member S INNER JOIN Member C ON S.Coach = C.MemberID
That’s pretty much all their is to a self join, but relationships-to-self (unary relationships from the ERD) in databases lead to other queries. For instance, “Find the names of golfers with coaches.” Not every golfer has a coach, so this query is really asking us for which golfers (which rows in the Member table) participate in the self join above. We actually don’t need the second table for coaches to answer that question. If a golfer has a coach, their coach attribute has a value, if not, it is NULL. We are being asked which golfers do not have a NULL there.
SELECT FirstName, LastName FROM Member WHERE Coach IS NOT NULL
That’s all! What about the opposite, “Find the names of golfers that do not have coaches.”
SELECT FirstName, LastName FROM Member WHERE Coach IS NULL
In either case, this last query asks us for golfers that do NOT participate in that self join. We’ve seen that there’s another special way we can get that information using a LEFT JOIN. So, if we do our self-join query above but replace INNER JOIN with LEFT JOIN we have
SELECT S.LastName AS StudentName, C.LastName AS CoachName FROM Member S LEFT JOIN Member C ON S.Coach = C.MemberID
and get the result
StudentName | CoachName |
---|---|
McKenzie | Nolan |
Stone | NULL |
Nolan | NULL |
Branch | NULL |
Beck | NULL |
Burton | Nolan |
Cooper | Nolan |
Spence | NULL |
Olson | NULL |
Pollard | Cooper |
Sexton | Cooper |
Wilcox | NULL |
Schmidt | Nolan |
Bridges | Cooper |
Young | NULL |
Gilmore | Nolan |
Taylor | Cooper |
Reed | Cooper |
Willis | NULL |
Kent | NULL |
which illustrates how we get all the golfers in the list as students, and only those golfers with coaches have a CoachName value appearing in the answer.