{"id":860,"date":"2024-05-24T15:02:12","date_gmt":"2024-05-24T15:02:12","guid":{"rendered":"https:\/\/enter77.ius.edu\/cjkimmer\/?page_id=860"},"modified":"2024-05-24T15:13:54","modified_gmt":"2024-05-24T15:13:54","slug":"c112-aggregation-and-computation-in-the-select-clause","status":"publish","type":"page","link":"https:\/\/enter77.ius.edu\/cjkimmer\/c112-aggregation-and-computation-in-the-select-clause\/","title":{"rendered":"C112: Aggregation and Computation in the SELECT clause"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\">Computing new values<\/h3>\n\n\n\n<p>In the SELECT clause we can compute new values from existing data. As a reminder we&#8217;ve seen that the SELECT clause determines which column we return in the query result. The columns come from the table that we build using the FROM and WHERE clauses. More specifically, in a query<\/p>\n\n\n\n<ol>\n<li>The FROM clause specifies the table we will use<\/li>\n\n\n\n<li>The WHERE clause filters rows from that table by keeping only ones that meet certain conditions<\/li>\n\n\n\n<li>The SELECT clause is used to specify which columns should be returned<\/li>\n<\/ol>\n\n\n\n<p>The third step above is worded very generally. We&#8217;ve seen that we can specify columns by name (or by the * wildcard) to return in the result set, but we can also make new columns in the SELECT clause using existing columns. We can create these new columns using arithmetical operations like +, -, *, and \/ for add, subtract, multiply, and divide, respectively. Let&#8217;s imagine in the Sailors table that the best sailors are the youngest ones with a higher rating, and we want to quantify that by dividing the rating by the age (because dividing the same rating by the smaller age will yield a bigger rating than dividing by the age of an older sailor). We can do by<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">SELECT rating\/age\nFROM Sailors<\/pre>\n\n\n\n<p>and if we would like to name that column, we can do so as well:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">SELECT rating\/age AS SailorYears\nFROM Sailors<\/pre>\n\n\n\n<p>That&#8217;s pretty much it for customizing in the SELECT clause. We can mix and match new columns with existing one by separating with commas, and we can renames columns one by one with AS:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">SELECT rating\/age AS SailorYears, age, sname AS SailorAge\nFROM Sailors\nWHERE age &lt; 40<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Aggregation Queries<\/h2>\n\n\n\n<p>To aggregate is to combine many things into one thing. In databases, there are tools known as <strong>aggregate functions<\/strong> or <strong>aggregation functions<\/strong> that aggregate a column of data; they combine many rows into one row for a particular column or group of columns. We&#8217;ve seen before that a column of data is interchangeable with a list or tuple of values, and that correspondence gives us a good way to give examples of aggregation. If we have a column of grades with values (80.0, 90.0, 95.0, 100.0), there are several ways to aggregate those many values into one value. We can find the largest value in the list (100.0) or the smallest (80.0). We could also add up the values (365.0), count them (4), or compute their average (365.0\/4.0 = 91.25). These are all examples of common aggregation functions we may use in this course.<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table><thead><tr><th>Aggregate Function<\/th><\/tr><\/thead><tbody><tr><td>COUNT(<em>column<\/em>) or COUNT(*)<\/td><\/tr><tr><td>SUM(<em>column<\/em>)<\/td><\/tr><tr><td>MIN(<em>column<\/em>)<\/td><\/tr><tr><td>MAX(<em>column<\/em>)<\/td><\/tr><tr><td>AVG(<em>column<\/em>)<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">These are the five aggregation functions we&#8217;re responsible for memorizing in this course<\/figcaption><\/figure>\n\n\n\n<p>Although we only deal with these five to learn the concept of aggregation, there are <a rel=\"noreferrer noopener\" href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/aggregate-functions.html\" target=\"_blank\">many other available ones<\/a> in a typical DBMS. Aggregation queries are queries that use aggregation functions to roll up a table of data into a single (aggregate) row of data, so it\u2019s a way to simplify information or compute new information that depends on 1 or more rows in a table. All semester long we&#8217;ve seen that the FROM and WHERE clause build a table that gets passed to the SELECT clause. Aggregate functions appear in the SELECT clause and condense the information in that table into a single row.<\/p>\n\n\n\n<p>For instance, in the Sailors Reserving Boats database, each sailor has a rating, but together ALL sailors have an average rating. The average rating depends on all the rows in the sailor table, so it\u2019s an aggregate quantity. If we use the Sailors table in the FROM clause but omit the WHERE clause, then the SELECT clause will receive all the rows in the Sailors table.  To compute it we write in SQL<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">SELECT AVG(rating)\nFROM Sailors<\/pre>\n\n\n\n<p>and see a single number (6.6 as a 1 x 1 table) returned to us with the answer. Note that we passed the name of a single column to the aggregate function AVG. All 5 accept a single column name as their argument. The only one that accepts anything else is COUNT() which also accepts the * wildcard. More on this later. <\/p>\n\n\n\n<p>We can use aggregation with all the SQL bells and whistles we\u2019re used to: \u201cFind the average rating of sailors older than 20 years old.\u201d In this example, since we&#8217;ll need a WHERE clause, the SELECT clause does not receive the entire Sailors table. It only receives certain rows. The SELECT clause computes the average of whatever data it receives, so when it receives only some ratings instead of all ratings, we should expect a different average value from the query.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">SELECT AVG(rating)\nFROM Sailors\nWHERE age &lt; 20<\/pre>\n\n\n\n<p>Indeed in this case the average rating is returned in a 1&#215;1 table as 10.0, which is greater than the 6.6 average of all sailors&#8217; rating. <\/p>\n\n\n\n<p>One thing to be careful of with aggregation is that&nbsp;<strong>you can\u2019t write a query like<\/strong><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">SELECT sname, AVG(rating)\nFROM Sailors<\/pre>\n\n\n\n<p>ever because there are many sname values in the sailors table, but only one average rating. You can\u2019t make a table in SQL that has many rows in the first (sname) column and a single row in the next column. It\u2019s not shaped like a table!<\/p>\n\n\n\n<p><strong>Information Need<\/strong>: \u201cFind the number of sailors.\u201d <\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">SELECT COUNT(*)\nFROM Sailors<\/pre>\n\n\n\n<p>We just care how many rows are in the table because this table holds records for Sailors. Each row is a sailor. We don\u2019t even need to care about columns. We can thus use * with COUNT instead of the name of any particular column. The interpretation of COUNT(*) is &#8220;count rows&#8221; without worrying about the values in a particular column. Because we do not have to extract those values and work with them, COUNT(*) can be faster or more efficient, but really it&#8217;s just a good practice to use COUNT(*) whenever you know that it&#8217;s indeed the rows you&#8217;re counting and not particular values. <\/p>\n\n\n\n<p><strong>Information Need: <\/strong>\u201cFind the number of sailors\u2019 names.\u201d <\/p>\n\n\n\n<p>For this one, we need to be a little careful, for we have to worry about the duplicate snames like \u2018Horatio\u2019. If we just COUNT(*) we would count Horatio twice, and if we COUNT(sname), we count Horatio twice. We can add the keyword DISTINCT before the column name to remove duplicates:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">SELECT COUNT(DISTINCT sname)\nFROM Sailors<\/pre>\n\n\n\n<p>This one is straightforward<\/p>\n\n\n\n<p><strong>Information Need:<\/strong> \u201cFind the highest rating for any sailor.\u201d<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">SELECT MAX(rating)\nFROM Sailors<\/pre>\n\n\n\n<p>Aggregation queries return a single row, guaranteed. If we have a single column in that single row, it\u2019s a 1\u00d71 table, so we can just use rating = and then put a single-column aggregation query in the () to generate a single value!<\/p>\n\n\n\n<p>Notice I was careful above to say that aggregation returns a single row. We can make multiple-column aggregation queries easily: \u201cFind the average, highest, and lowest ratings for all sailors under 20 years old.\u201d<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"sql\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">SELECT AVG(rating) AS avgRating, MAX(rating) as highRating, MIN(rating) AS lowRating\nFROM Sailors\nWHERE age &lt; 20<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Computing new values In the SELECT clause we can compute new values from existing data. As a reminder we&#8217;ve seen that the SELECT clause determines which column we return in &#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"featured_image_src":null,"featured_image_src_square":null,"_links":{"self":[{"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/pages\/860"}],"collection":[{"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/comments?post=860"}],"version-history":[{"count":3,"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/pages\/860\/revisions"}],"predecessor-version":[{"id":865,"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/pages\/860\/revisions\/865"}],"wp:attachment":[{"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/media?parent=860"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}