{"id":30,"date":"2020-08-16T22:11:10","date_gmt":"2020-08-16T22:11:10","guid":{"rendered":"https:\/\/enter77.ius.edu\/cjkimmer\/?page_id=30"},"modified":"2020-08-31T20:17:11","modified_gmt":"2020-08-31T20:17:11","slug":"records-and-data-models","status":"publish","type":"page","link":"https:\/\/enter77.ius.edu\/cjkimmer\/records-and-data-models\/","title":{"rendered":"Records and Data Models"},"content":{"rendered":"\n<p>If an application is using a DBMS for persistence, the app and the database must be able to communicate with each other. This communication can be categorized as 4 different kinds of operations:<\/p>\n\n\n\n<ul><li><strong>CREATE<\/strong> &#8211; add new information <\/li><li><strong>READ <\/strong>&#8211; retrieve information<\/li><li><strong>UPDATE<\/strong> &#8211; change existing information<\/li><li><strong>DELETE <\/strong>&#8211; remove information<\/li><\/ul>\n\n\n\n<p>Many apps do nothing but variations on these 4 things, and these kinds of apps are called <strong>CRUD <\/strong>apps. If you are an INFO major and become a software developer, your future is likely to be writing these apps until the end of time!<\/p>\n\n\n\n<p>Canvas can be thought of as a CRUD app. If you access the I308 Canvas site to hand in an assignment, Canvas will READ the I308 course material to show you the assignment page. Canvas will CREATE a new assignment submission when you hand in. If you hand in a new version later, canvas will UPDATE your submission, and so on. <\/p>\n\n\n\n<p>Other apps act the same way. Billable events (the most important kind of events) in a medical office have <a rel=\"noreferrer noopener\" href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/Rules\/Billable_Specific_Codes#:~:text=The%20following%2072%2C186%20ICD%2D10,of%20specificity%20under%20each%20code.\" target=\"_blank\">ICD-10 codes<\/a> that describe the nature of the event. Part of a patient&#8217;s record may include a list of these events which can be updated or deleted as the situation warrants. Any display of the record on the screen is a read of the record, and new patients or visits to the medical office correspond to the creation of new records in the system.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Records and Relationships<\/h3>\n\n\n\n<p>To be able to do these wonderful CRUD things, an app must have an idea of what kind of data it will keep track of. Canvas has a notion of users, courses, assignments, assignment submissions, comments, pages, &#8230; Each one of these objects must be stored in a database, and each one of these objects will likely consists of a collection of related information. A user will have a username, password, email address, etc. A course will have a course number, section number, title, semester, &#8230; Assignments will have a title, content, due date, &#8230; Such groups of related information is called a <strong>record<\/strong>, and records are one of the fundamental concepts in a database.  The items of information that are related are often called <strong>attributes<\/strong>. In the Canvas user example, we would say there is a username attribute, password attribute, etc.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/enter77.ius.edu\/cjkimmer\/wp-content\/uploads\/2020\/08\/IMG_0177.png\" alt=\"\" class=\"wp-image-100\" width=\"500\"\/><figcaption>A record is a collection of related attributes. Each of these records has the same attributes (username, SSN, etc.) but may hold different values for each one.<\/figcaption><\/figure>\n\n\n\n<p>Moreover, there may be <strong>relationships<\/strong> that connect records together. An assignment is related to a course (because different courses have different assignments). An assignment submission is related to the assignment and to the user who submitted it. Relationships are connections between these objects. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Three Views of Data &amp; Data Models<\/h3>\n\n\n\n<p>A description of an app&#8217;s information needs in terms of records and relationships is called a <strong>data model<\/strong>. There are different ways to describe the data model at different levels of detail, and these different ways are often referred to as &#8220;<strong>views of data<\/strong>&#8221; or as a specific kind of data model. Typically, three different levels of detail are used:<\/p>\n\n\n\n<ol><li>The <strong>conceptual view<\/strong> of data with its corresponding <strong>conceptual data model<\/strong>. This is the least detailed, highest-level view.<\/li><li>The <strong>logical view<\/strong> of data with a corresponding <strong>logical data model<\/strong> describes the way a DBMS presents the data to the app or a user. Different DBMS&#8217;s use different logical data models.<\/li><li>A <strong>physical view<\/strong> of data or a <strong>physical data model<\/strong> provides the lowest-level, most detailed view of data.<\/li><\/ol>\n\n\n\n<p>In other INFO courses like I210 or I211, two programming design approaches are considered: top-down and bottom-up. <strong>Top down design<\/strong> means to start with the big picture first and gradually incorporate more details as a plan is developed. <strong>Bottom-up design<\/strong> is the opposite approach. It means to begin programming right away without a clear plan in place and let the big-picture strategy work itself out. Top-down design is considered a best practice while bottom-up design is not. Similarly with <strong>data modeling<\/strong> (the process of determining what information and organization needs to stored), it&#8217;s considered a best practice to develop a strategic approach and worry about the details last. The details involve actually storing the data on the computer, so it&#8217;s typical to first write down the information needs and develop a description of what information will be stored. This first part of the process is conceptual data modeling. Later in the course, we&#8217;ll create <strong>Entity-Relationship Diagrams<\/strong> that are a conceptual description of the data.<\/p>\n\n\n\n<p>Once the conceptual description is created, it&#8217;s appropriate to consider a lower-level of detail. There are many different kinds of databases, and each usually has a supported way of organizing the data. This way of organizing doesn&#8217;t necessarily describe how the computer will store the data. It instead describes how the programmer, developer, or records specialist can think about the data and communicate with the DBMS. This view of data is the logical level, and it is described in more detail in the next section.<\/p>\n\n\n\n<p>Many different DBMSs may use the same logical view of the data, but they may take radically different approaches to actually storing it. The details of how it&#8217;s actually stored comprise the physical view of the data. For instance, SQLite is a single-file database. All the records are stored in one file (up to a limit of 1 PB or so) in SQLite. Most other DBMSs take a different approach and store records in multiple files and often optimize this storage depending on the server environment in which they are running. All these use the same logical model (one called the relational model) while difering in their physical details. <\/p>\n\n\n\n<p>An advantage of the three-view approach is that each level is independent of the other. The same conceptual data model can be converted into different logical data models without affecting the conceptual view. Similarly, the same logical data model can be implemented with many different physical models.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Logical Data Models<\/h3>\n\n\n\n<p>We will learn about the four logical data models listed below in this course, but we will only deal with the first three right now, saving the fourth one for later:<\/p>\n\n\n\n<ol><li>The <strong>flat model<\/strong> is a data model with one kind of record where there is no particular relationship between the records.<\/li><li>A <strong>hierarchical data model<\/strong> is characterized by having a single kind of record with a parent-child (one-to-many) relationship between records.<\/li><li>The <strong>network data model<\/strong> allows any record to be related to any other, and this is called a many-to-many relationship.<\/li><li>The <strong>relational data model<\/strong> is the model used by most modern databases. It is sufficiently important and complicated that we will define it separately later in the course. <em>Most of I308 involves learning about the relational data model and its consequences.<\/em><\/li><\/ol>\n\n\n\n<p>The order given above is roughly chronological. The flat model is the oldest and the relational model is the newest. The relational model is not that new, though, for it was invented in the late 1960&#8217;s. Each of the first three was used in early DBMS&#8217;s that were developed in the 1960&#8217;s.<\/p>\n\n\n\n<p>To distinguish between the first three kinds of data models, we can note that they each differ in terms of the relationships that are allowed between records. If we identify the record used in the data model and the allowed kind of relationship, then we can determine which data model is being used.  To start off, if there is no relationship between the records, then it&#8217;s called the flat model. When there is a relationship between the records, we need to determine which kind it is.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/enter77.ius.edu\/cjkimmer\/wp-content\/uploads\/2020\/08\/IMG_0176.png\" alt=\"\" class=\"wp-image-101\" width=\"500\"\/><figcaption>Schematic showing the flat (no relationship between records), hierarchical (parent-child or one-to-many), network (many-to-many) data models<\/figcaption><\/figure>\n\n\n\n<p>A relationship always involves connections between pairs of records. The two kinds of relationships described above (one-to-many and many-to-many) differ in their <strong>cardinality<\/strong> (how many records are involved). A one-to-many relationship is also often called a parent-child relationship. Parent and child here is used in the computing context and not the context of a family of humans. In computing, there can be only parent, but the parent can have many children. One parent can have many children, but a child can have at most one parent. The &#8220;at most one&#8221; in the previous sentence is because hierarchies (also called <strong>trees<\/strong> in computing) have one special node called the root that begins the whole thing. In the Hiearchical diagram above, record 1 is the root node, and the records are organized to show the root at the top, the children of the root one level beneath, and the grandchildren a further level down. <\/p>\n\n\n\n<p>A hierarchical data model is the same because not every record must have a parent, but if a record does, it can have only one&#8211;a record can have at most one parent. A parent record in the hierarchical data model can have many children, so a hierarchical data model looks like a tree if it has one root record or a <strong>forest<\/strong> if it has more than one. This is the definition of a one to many relationship between records. <\/p>\n\n\n\n<p>The network data model has a more general kind of relationship. Any record can be connected to any other, and any record can be connected to more than one record. &#8220;Many&#8221; in database-speak means &#8220;more than 1&#8221;, so a many-to-many relationship is one where any record can be connected to more than one record. Since there is no &#8220;at most one parent&#8221; requirement, a network data model doesn&#8217;t have to have a root. In the diagram above, there is no possible way to move the records around while maintaining the connections between them to show that a parent-child relationship holds for each record, so it must be a network. In real life situations, one would know what constraints the relationship must obey, and from those constraints one can determine what kind of relationship is possible between the records. Some examples are shown at the end of these notes.<\/p>\n\n\n\n<p>Although hierarchical and network data models are allowed to have relationships between records, they don&#8217;t have to. So either of those models can be used to describe a set of records without relationships; i.e. they are both general enough to incorporate the flat data model. Remember the flat data model is the oldest, so the hierarchical data model was an improvement on it that was &#8220;backwards compatible&#8221; and could still hold all the original kinds of data. <\/p>\n\n\n\n<p>In the same sense, a network data model can be used to hold a hierarchy, for any hierarchy is a kind of network. In other words, the hierarchical data model generalizes the flat one, and the network data model generalizes the hierarchical one. Later we&#8217;ll see that the relational data model is the most general one of all.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Exercises<\/h3>\n\n\n\n<div class=\"wp-block-atomic-blocks-ab-accordion ab-block-accordion\"><details><summary class=\"ab-accordion-title\">We are storing data about employees including name, SSN, date of birth, and address. What logical data model is appropriate?<\/summary><div class=\"ab-accordion-text\">\n<p>Since there is no indication that employee records have any kind of connection, there is no relationship between records. The flat model is appropriate.<\/p>\n<\/div><\/details><\/div>\n\n\n\n<div class=\"wp-block-atomic-blocks-ab-accordion ab-block-accordion\"><details><summary class=\"ab-accordion-title\">We are storing data about employees including name, SSN, date of birth, address, and supervisor. A supervisor is an employee that several employees may report to. An employee has at most one supervisor. What logical data model is appropriate?<\/summary><div class=\"ab-accordion-text\">\n<p>The supervisor supervises many employees. An employee has at most one supervisor. The relationship described is a parent-child one, so the hierarchical data model is most appropriate.<\/p>\n<\/div><\/details><\/div>\n\n\n\n<div class=\"wp-block-atomic-blocks-ab-accordion ab-block-accordion\"><details><summary class=\"ab-accordion-title\">We are storing data about employees including name, SSN, date of birth, address, and projects. An employee works on projects with other employees. An employee may work on more than one project, and multiple employees tend to work on the same project. What logical data model is appropriate?<\/summary><div class=\"ab-accordion-text\">\n<p>Any two employees may be connected by virtue of working on a project together, so the network data model works best for this many-to-many relationship.<\/p>\n<\/div><\/details><\/div>\n","protected":false},"excerpt":{"rendered":"<p>If an application is using a DBMS for persistence, the app and the database must be able to communicate with each other. This communication can be categorized as 4 different &#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"featured_image_src":null,"featured_image_src_square":null,"_links":{"self":[{"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/pages\/30"}],"collection":[{"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/comments?post=30"}],"version-history":[{"count":18,"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/pages\/30\/revisions"}],"predecessor-version":[{"id":282,"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/pages\/30\/revisions\/282"}],"wp:attachment":[{"href":"https:\/\/enter77.ius.edu\/cjkimmer\/wp-json\/wp\/v2\/media?parent=30"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}