We’ve seen that it’s not necessary to know about object-oriented (OO) programming to write useful tools using Python. However, Python is an OO language and knowing something about how object-orientation works and is implemented in Python will help you understand many features of the language. It is sometimes useful to write your programs in an OO style. More commonly, you will make use of a library that provides an OO interface. Knowing how to write your own OO interface will help you make use of other peoples designs.

In this session you will explore how to make use of modules from the standard library that expose OO interfaces. You will then modify the rainfall analysis code you wrote on Tuesday so that each weather station’s data is represented by an object. This will allow you to produce more complex analysis easily without producing excessively complex code. First we will look at some objects from the standard library

Exercise 1: Random numbers and dates – objects from the outside

It is often useful to be able to generate (pseudo-) random numbers for use in numerical programs. These can be used for Monte-Carlo simulation or error analysis. Random number generators can be modelled as objects. The way that numbers are generated is hidden in a "black box". Methods are provided to set up (seed) the generator and return a random number drawn from some distribution.

Write a simple Python program to calculate the average of 100 random numbers drawn from the range [0.0, 1.0). You will need to import the random module, create an instance of the random.Random class, use the instance’s seed() method to set up the generator, and then use the random() method to generate each random number. In this case, neither the class constructor, random() or seed() need arguments.

Note

Remember that instances of any class look like variables and methods can be called by appending a dot and the method name (with arguments in brackets) to the instance. That is, a new instance of the class Thing is created by the code this_thing = Thing(constructer_argument). Methods and attributes of the instance this_thing can then be accessed as this_thing.method(method_argument) and this_thing.attribute.

Another class provided by the standard library creates date objects. This is available from the datetime module. Unlike the random number class, the date class constructor accepts arguments, specifically three integers giving the year, month and day of the date to be represented. Start up a Python interpreter and import the datetime module. Create two date objects, one representing February 3, 1700 and one representing October 4, 1701. What is the day of the week of these two dates? You can use the weekday() method to find out (this returns 0 for Monday, 1 for Tuesday and 6 for Sunday). How many days are there between the two dates? (Note that arithmetic with date objects return a timedelta object, these have the attribute days.) Repeat the exercise for February and October 2000 and 2001. Is the number of days between the two dates the same? If not, why not?

Exercise 2: Compressed data input – using objects for flexibility

Data files are often compressed and it can be helpful if your analysis code can read (and write) to compressed files. As well as being convenient (you can avoid explicitly compressing or decompressing files before and after you run the analysis script) this can also improve execution time (which is sometimes limited just by the process of getting data to and from the disk). In common with all data types, we have seen that Python’s built in file type is a class. An instance of this type is returned by the built-in open() function and each instance supports a range of methods including iteration. The standard library includes the module gzip which implements the GzipFile class with methods that simulate most of those provided by the built-in file type. There is also a bz2 module, which does a similar thing for bzip2 compression. The directory metoffice_data contains more Met Office data like that you worked with on Tuesday. However, some files have been compressed using either bzip2 or gzip. Modify your final data file reading function to automatically accept compressed files. You can assume that compressed files will always have the .gz or .bz2 file extension.

Note

The most obvious OO way to instantiate an object to represent a compressed file is to make direct use of the GzipFile class as follows:

import gzip
f = gzip.GzipFile('filename', 'rb')
# Do something with f
f.close()

However, this is different to the way that Python generates built-in file objects using the open() function. A more natural approach is probably to use the gzip module’s open() function thus:

import gzip
f = gzip.open('filename', 'rb')
# Do something with f
f.close()

In this approach, gzip.open() is nothing more than a convenience wrapper around the GzipFile class constructor. The bz2 module lacks an equivalent function so you will have to make use of the BZ2File class directly.

Does your modified code work with uncompressed files as well as files compressed with both gzip and bzip2?

Tip You will need to be able to decide if the filename argument represents a compressed file. One way to do this is to use string methods to examine the file name argument. A better (easer and more portable) approach is to use the splitext() function from the os.path module. This takes a string as an argument returns a tuple who’s second item is the final file extension (.txt, .gz or .bzip2).

Exercise 3: Rainfall – an object from the inside

The "dictionary of lists" data structure you used to hold rainfall data in Practical 2 works, but is unsatisfactory in a number of ways. The functions you wrote were tied to the data structure so knowledge of the data structure ended up being spread around your code making refactoring difficult. It would sometimes be useful to generate a different "view" of the data, say by providing a list of rainfall for each January in the data set to make it easy to calculate the average rainfall in that month. In this exercise you should create a Rainfall class to represent a weather station and the rainfall it has measured.

Note

Remember that the way to define a class in Python is to make use of the Class statement and define functions for methods and variables for attributes. Functions take an extra self argument and you need an __init__ method which gets called when you create an instance. The basic code looks like this:

class Thing:
    def __init__(self, constructer_argument):
        # These values are created
        # when the class is instantiated.
        self.attribute = constructer_argument/7.65
        self.counter = 0

    def method(self, method_argument):
        self.counter = self.counter + method_argument
        print self.counter

The first step is to create the bare bones of your Rainfall class reusing some of the code you wrote on Tuesday. Start off with the class definition and __init__ method. __init__ should take an argument specifying the filename where data will be read from. What other argument is needed? Give your class attributes for the weather station name, its latitude and longitude. Store the data away inside the class as a dictionary of lists.

Now add methods. An annual_rainfall method could return a dictionary with years as the keys and the total measured rainfall in that year as the values.

  • Can you use your new class to calculate the distance between pairs of weather stations?

  • Can you calculate the difference in average annual rainfall for two weather stations?

You are going to need a more flexible method of extracting subsets of the rainfall information from your xi`Rainfall` objects. Create a new method called list_data() that returns two lists: a list of floating point numbers giving the rainfall in each month and a list of date objects corresponding to the first day of the month referred to in the first list. Add three optional arguments to the list_data() method. New start and end arguments should take date objects and (if the arguments are present) no data before start or after end should be returned. The third new argument months should be a list to allow only a subset of the months to be returned. Once this has been done, you should be able to extract the rainfall data for the winter in Camborne during the 1990s by doing:

# Read data from file and set up object
r = Rainfall("cambornedate.txt.gz")
# extract winter (Dec, Jan, Feb) data for the 1990s
rain, months = r.list_data(start=date(1990,1,1),end=(1999,12,31),months=[12,1,2])
  • Is spring wetter than autumn in Camborne?

  • Are winters in Cambridge drier than summers in Yeovilton?

Finally, you may want to reexamine how the Rainfall class you have constructed handles missing data. So far the simple approach of representing missing months of rainfall as having zero rainfall seems to have worked. Is there a better built in type (with a single value) that could be used? Can you modify your Rainfall class to handle missing data more correctly?