Everyone has seen objects and classes before in I210 & I211. Those courses teach Java which is an object-oriented programming (OOP) language. Python is object-oriented, too, but in some different and important ways than Java.
We’ve already seen collections (lists, sets, tuples, and dictionaries) in Python. A collection is a group of related data, and each different collection type provides its own rules for accessing that data (lists by index, dictionaries by key, etc.). There are other ways to group data together in a program, and there are other ways to define rules for how data can be operated on. Creating a class lets us design a solution for doing either of these things in our own program. Our first definition for a class in an OOP language like Python is that a class is a collection of data and methods (functions) that operate on that data. The methods provide the rules for accessing or transforming that data. We’ve seen that list is a class in Python. Objects of type list can be accessed by index ([]), but we can also transform the list using .append(). These are two examples of how the list class defines the ways in which we can interact with the data it manages. The data managed by a list object is the collection of objects–the items in the list.
A class is a collection of data and functions that operate on that data.
In the same way, str is a class in Python. An object of type str holds the data which is what we typically things of as the value of a str variable. But it’s really an object because it provides us with methods like .lower() and split() that work on that data and return new data to us as their result. A str object is different than a list object in one important way. We can modify the data stored in list by adding or removing elements or by changing elements. Lists are mutable because they can be changed. String objects are immutable because their value doesn’t change. If we assign a new value to a string object, we are creating a new one instead of changing it. Java strings have the same property.
All the language above refers to str objects as “objects of type str” and list objects as “objects of type list”. This notion gives us a second definition for a class. A class is a user-defined datatype. We can create our own classes that act just like the built-in types in Python. We can change our objects data (if they’re mutable), and we can invoke methods on our objects just like invoking .append() on a list. The objects we create from our classes act just like a built-in datatype. Classes are user-defined datatypes. If we can customize the types in our search engine to be specific to the domain of information retrieval, then our program can be easier to read, write, and think about because the types match the concepts of the domain. This conceptual mapping is a strength of object-oriented programming approaches.
A class is a user-defined datatype.
If a class is a type, an object is variable of that type. Said more formally, an object is an instance of a class. A class is a description; it describes what kind of data belongs in the class and what kind of functions can operate on that data. The class lays down the rules. To use those rules, we usually need an actual example that holds actual data, and then we can invoke the functions to do stuff with that data. An object is that concrete example of the class’ description in action. Different objects of the same class (same user-define type) may hold different data, but they all obey the same rules for operating on that different data. The differing data illustrates that each is a different example, a different instance, of the same basic class description. For example, we’ve already seen that we might need 100 different str objects to hold the text of a 100-document corpus. Each of the 100 objects is a str; they all provide .split() and .lower() methods, but each holds and manages their own data. Each is a different instance of the str class.
Creating our own classes and objects in Python
As with most syntax in Python, declaring classes and instantiating objects in Python is somewhat easier than it is in more verbose languages like Java, C#, or C++. The class keyword is followed by the declared name of the class then a : which we should now recognize will indicate the beginning of a semantically-indented block. In that block we can define the class’ member functions and member data. These parts of the class are of course the data and methods operating on that data the object can manage. In Java they are called instance methods and instance data. In contrast to the Python programming style we’ve seen so far where underscore-separated lowercase is the norm, CamelCase with an initially capitalized first letter is the norm for a class name, and we should follow that style guideline in I427.
class MyFirstPythonClass: def __init__(self): # member functions have an implicit first argument self self.my_data = "I exist. I am an object!" # create member data for the class. self represents the object on which this member function is invoked def append(self, extra_data): self.my_data += extra_data def show_my_data(self): return self.my_data my_object = MyFirstPythonClass() print(my_object.show_my_data()) # prints "I exist. I am an object!" my_object.append(" Big deal!") print(my_object.show_my_data()) # prints "I exist. I am an object! Big deal!"
This code demonstrates the idea of related functions and data in a class. The constructor ( __init__()
) lets us initialize data the class can hold and other functions (the member functions) in the class can access that data. Because member functions can accept arguments, they can operate on data passed in to them as well as on the data that is defined by the class. We can see that with the .append()
function above which modifies the object’s data but does not return a value. (It could both modify the data and return a value if we needed it to, but here it doesn’t need to.)
When we invoke a method on an object like .append()
or .show_my_data()
, we need to know which object invoked the method so that the correct data is used. In Java, there is a this keyword that represents the object on which the instance methods are being invoked. In Python, self is more typically used to represent that idea. You can see it in action with self.my_data
being used throughout the class. In Java, this is optional, but in Python self
is required. If we just used my_data
instead, it would be a local variable in each function rather than the data that belongs to the object. Python also requires that we list self as the first argument to every function in a class that an object can invoke. So a member function that takes no arguments like .show_my_data()
is actually declared with one argument, self
. Python automatically populates a value for self
when the method is invoked. In my_object.show_my_data()
, my_object is the the value stored in self
. Finally, note that self
is a convention. Any non-reserved word works; we could use this
just like in Java. Don’t do it. self
is a Python convention, and it’s important to follow Python coding conventions because 1) people who know the conventions will find your code easier to digest 2) people hiring for Python jobs will think you know Python 3) them’s the rules in I427. You’re not allowed to learn bad grammar when learning a new language; you’re not allowed to learn bad coding practices when learning a new programming language.
As far as syntax, there are some other key features to note when declaring a class. The entire body of the class is indented; this is analogous to the class definition in Java being inside a pair of {}. Member functions are preceded by the def
keyword just as the non-member functions we’ve seen so far. As mentioned above member functions in a class in Python have a first argument which is named self
representing the object invoking the function (compare to this in Java). After that first argument, function arguments are listed with or without default values (which we’ll see soon). Since Python is dynamically typed, we do not have to declare member data in a formal fashion like in Java or C++. We instead simply assign to an object that belongs to self
(i.e. follows self. as in the sample code). Member data can be instantiated in any member function. It doesn’t have to be the constructor like is shown here, but it should be so that those reading your code can at a glance see what kind of data it manages.
Python’s trusting, dynamic nature has the interesting consequence that objects of the same class may have different member data. If two objects are instantiated and one has a member function invoked that contains the assignment self.just_my_data = 4
while the other does not have that member function invoked, only one object will contain member data called just_my_data
. Dynamically-typed languages are complicated! You can imagine that instantiating different data in different methods can make it confusing to keep track of everything. A good practice is to initialize all member data in one place rather than “as needed” throughout your class. If a value cannot be assigned at initialization time, initialize it to None
to indicate it holds no data yet.
Initialize all member data in one place in your class–in the constructor function.
Destructors
Some classes will do complicated things like manage connections to database servers. Generally, these connections are valuable resources that are somewhat “expensive” to create and maintain, and the number of them the server will support may be limited. As a consequence, it’s good for that connection to be deleted whenever the object is no longer in use. In a similar vein, an object that holds a lot of data may use too much memory so that it’s data should be cleared from memory when not in use. Destructors are methods that get called when an object is destroyed, and classes may use this part of the code to “clean up after itself”. In Python the destructor is called del () .
class DatabaseConnection: def __init__(self): self.con = # ... create database connection here self.con.open() # open that connection # whatever member functions we need go here def __del__(self): self.con.close() # close it to clean up after ourselves my_db = DatabaseConnection() # construct an instance of the class # do whatever we need to do with it del my_db # delete it! This will call the constructor!
In the example above, we use del to explicitly delete the object. When we do this, the destructor gets called and the object closes its connection. We don’t have to always manually delete objects. If an object is created in a function, then it may be deleted when the function returns to its caller if the object is not returned by that function. In this case, the destructor is called when the end of the scope of the object is reached. Finally, note that our first class did not have a destructor while the second one did. They are optional. Constructors are, too. Classes have default ones that simply do nothing. If we don’t need them, we don’t declare them and just use the default ones.