In this class, we just finished a thorough introduction to Java, and now we will work gaining a similarly complete introduction to databases. Before we leave Java behind for good, we can look back at how our ability to manage data in a program progressed. At first, we only used variables that could hold one value at a time, but we progressed to being able to use arrays which could hold multiple values. Once we had arrays, we could manage many values at once, and we could do more complicated number-crunching and data analytics like recording histograms, keeping track of collections of values, and so on.
If we keep on writing more sophisticated Java programs and approach the information needs of programs in the real world, we will start to run into these kinds of issues:
- Our Java array only holds data of the same type. So we can keep track of a list of integers just fine, but keeping track of multiple integers, Strings, and doubles all at once is more of a challenge.
- When our Java program stops running, our data is gone. To be able to re-use our data the next time the program runs, we’d have to come up with a way to store it.
- If the power goes out while our Java program is running, our data would vanish, too, unless we’re very careful every time we store it.
- If we want to operate on the same data as another program, we have to be very careful coordinating how everyone accesses the same data.
We started to address problem 1 above by learning about classes and objects. We can write classes to store multiple types of data about instance variables, and problem solved! The other issues above are much more complicated though. To have persistence, the ability for our data to be available from one run of our program to another, we have to do more work. The common quick-and-dirty programming approach to manage persistence is to use files to save, modify, and load data. If that data is shared between programs, then the different programs also have to agree on the file format used to store that data, and that can get complicated, too. It’s not a lot of work to save data in a file in Java, but as soon as the programmer worries about more subtle problems like the power going out while saving data, dealing with files becomes much more complicated. Programs have to be very careful to protect the integrity of the data so it doesn’t get corrupted when disasters or simpler errors occur.
A final, separate observation about data is that it’s important. Aside from what program(s) are creating, using, or storing that data, businesses or other organizations recognize that data is valuable. They may like to learn from that data separately from the other programs using it, or they may like to carefully store it to protect it for future use. When we start to think about collections of data in this way, we arrive at the concept of a database.
Database Management Systems
The definition above indicates that the word database usually implies the collection of data is stored on a computer. A common computing solution that allows for persistence of data, guarantees disasters like power outages won’t corrupt it, and protects the value invested in data is to let one sophisticated program handle all these responsibilities. This program that manages and guarantees the integrity of the data is the Database Management System (DBMS). Most of the time DBMSs are referred to as “databases” or just “the database”.
This last part of C112 is all about databases and database management systems. We will cover how databases store information, and more importantly, we will learn how to design them and how to retrieve data from them. The table below shows a few commonly used DBMSs that you will run across if you work in software development or database systems:
DBMS | Interesting INFO |
---|---|
DB2 | Made by IBM who is one of the earliest vendors of DBMSs |
Oracle | Oracle is IBMs primary competitor for enterprise DBMSs |
MySQL | A popular free database that is now owned by Oracle |
MariaDB | A fork of MySQL that originated when Oracle took ownership of MySQL |
PostgresSQL | Another popular open source DBMS |
Microsoft SQL Server | Microsoft’s DBMS for enterprise systems |
Microsoft Access | Microsoft’s DBMS for personal and small-business use. |
SQLite | Last but not at all least is another free DBMS. SQLite differs from the others in being dead simple to deploy making it ideal for quick projects while still scaling well to more complex ones. |
Using a DBMS to centralize management of important data provides a lot of built-in advantages:
- The DBMS guarantees data integrity. Disasters will not corrupt the data, and changing or deleting data in a transaction with the database is guaranteed. These guarantees are called the ACID guarantees, and you can read more about them here if you’re interested.
- Multiple programs can access the database, and the DBMS will coordinate and synchronize all this access.
- There is little need to deal with file formats or worrying too much about how the data is stored. The DBMS will manage the details and provide a relatively simple yet incredibly powerful interface for interacting with the data.
Although there is more than one kind of DBMS, the vast majority of them are called relational databases, and they use the relational data model to organize the data. The Structured Query Language (SQL) provides a common way for external programs to communicate with the DBMS, no matter what programming language the external app happens to be written in. Moreover, each of the DBMSs support very similar dialects of SQL, so it’s relatively straightforward to interact with all of them the same way without worrying too much about the details of each DBMS.
In C112, we will learn about the relational data model and the basics of SQL. In addition to these details that allow us to understand and use DBMSs, we will learn how to carry out data modeling to understand, describe, and design how we might store the information we care about in a database.