Practical 4: NumPy, SciPy and Matplotlib

In the previous three practicals you have been introduced to the fundamentals of the Python language and begun to learn how to use it to create tools for data analysis and simulation. In this practical you will look at three closely related extension modules – modules that are not part of the Python standard library but are commonly used when writing software for science, engineering and mathematics. SciPy provides a many tools for scientific programming. These include a wide range of ready to use functions for statistics, optimisation and minimisation, numerical integration, curve fitting, linear algebra, Fourier analysis, image and signal processing, and more. SciPy makes use facilities provided by NumPy. Chief amongst these is a high performance data object called an array. This is multi-dimensional and has elements of a fixed type, corresponding to a fixed storage size in memory. This property makes operations on the whole array much more efficient than operations on the built in Python data types. Many of the internals of NumPy and SciPy are implemented in C (with some Fortran) and this also yields high performance. NumPy and SciPy are often used with Matplotlib, a graphing library for Python. Data drawn on graphs in Matplotlib is provided in the form of NumPy array objects (but these can sometimes be generated internally by Matplotlib from other data types).

In this practical you will first familiarise yourself with the array object and some of the fundamental NumPy methods and functions. You will then learn how to generate simple graphs and maps based on the arrays. In the second part of the practical you will make use of your Rainfall class from practical 3 along with functions from SciPy to calculate and graph some properties of the historical rainfall record in the south of Britain.

Some aspects of this practical need extra Python modules installed beyond those which make up the standard library. In particular, the exercises make use of NumPy, SciPy, Matplotlib and Basemap. If you are using your own computer basic installation instructions can be found here.

Exercise 1: The `array` object

Start a Python interpreter and import the numpy module. I generally find it easer to rename the module np by opting to import numpy as np.

If l = [1.0, 2.3, 4.74, 7] what is type(l)? What about type(l[0]) and type(l[3])?
NumPy arrays can be constructed using the array class. If a=np.array(l), what is a?
What is type(a)? What about type(a[0]) and type(a[3])? How do arrays differ from lists?
Arrays can also be returned from other functions. If b = np.zeros((4)), and c=np.ones((5)) what are b and c? What are type(b) and type(c)?
Are NumPy arrays mutable? Can the types of the elements be changed? Can you change c[2] to be equal to 3 and c[4] to be equal to the string "one"?
What are np.ones((3,3)) and np.ones((3,3,3,3))? What are the .size and .shape of these objects?
Can you iterate through a NumPy array? What is i in each iteration of for i in np.ones((3,3)):?
What is a + np.ones((4))?
What is np.ones((4))*3? What about (np.ones((3,3))*3)*(np.ones((3,3))*2)? Is this matrix multiplication or not?
What is a[0:2]? What about a[0::2]? Does array indexing work in the same way as list and string indexing?

If you have used Matlab, you should find that NumPy arrays remind you of Matlab arrays. There are some important differences (square brackets rather than parenthesises, element wise operations rather than operators from linear algebra). There is even a short guide to NumPy for Matlab users.

If theta = np.arange(0, 2*np.pi, 0.1), what is theta?
What is np.sin(theta)? Assign np.sin(theta) and np.cos(theta) to the variables sintheta and costheta.
We want to plot some graphs, so import pylab.
Create a plot by using the pylab.plot() function thus: pylab.plot(theta, sintheta).
Display the graph with pylab.show().
Close the window. Draw the graph of $\cos(\theta)$.
Can you put both lines on the same graph?

Matplotlib has two main interfaces. One, which we will examine later, takes an object-oriented approach with everything in a graph being modelled as various types of object. This makes it easy for your application to reuse bits of code when drawing graphs – you can produce a graphical representation of something that can sometimes be placed together with other graphs in a figure. The second interface is provided by the pylab module (part of Matplotlib). This takes a state-machine approach to plotting graphs that is similar to the approach used by Matlab.

Exercise 2: Maps

You will find a simple Python script called draw_marble.py in the practical4 directory. Run it. What is displayed? What city is marked? The code is reproduced with some notes below.

from mpl_toolkits.basemap import Basemap 
import matplotlib.pyplot as plt
import numpy as np

# Set up the basemap
m = Basemap(projection='nsper',lon_0=-10,lat_0=40,resolution='l',
           satellite_height=15000000) 

# draw parallels and meridians.
m.drawparallels(np.arange(-90.,120.,30.),'white') 
m.drawmeridians(np.arange(0.,420.,60.),'white')
# Draw the image
m.bluemarble()

# Plot a city
x, y = m(-2.58, 51.54) 
m.plot(x,y,'wo')

# Show the image.
plt.show()

	We need to import pyplot, numpy and basemap.
	This creates a new instance of the Basemap object. The arguments define the type of map to produce. `"nsper"` stands for "near sided perspective" (the view a satellite would see located over `lon_0`, `lat_0` and at a height of `satellite_height`). The instance `m` knows how to convert from geographical position to a location on the image that will be produced.
	This section adds lines of latitude and longitude and the background image. Notice that the image is distorted to correctly fit over the map. It is possible to distort any image to correctly fit over any map projection – you just need to know the geographical positions of the corners of the image.
	This section plots a point on the map. We first need to convert from geographical coordinates to a location on the plotting plane. When used as a function, the `Basemap` instance `m` can do this. The two arguments are the longitude and latitude of a location and it returns the x and y coordinates of the point to be drawn. These can be used as arguments to locate the point
	Displaying the resulting map works the same as displaying any graph. Note that you can also put the figure in an image file using the `plt.savefig()` function giving the file name as an argument.

Modify the program in order to:

Plot an additional city on the map. For example, Paris is located at 48.86 degrees N and 2.35 degrees E.
Change the map projection. The Robinson projection (projection="robin") was once used for global maps. The arguments of the constructor change with the projection type. The Robinson projection only needs lon_0, the longitude at the centre of the map, to be specified

The file uki_map.py produces a basemap of the British Isles. Using this file as a template, write a Python script that reads weather station data files (provided as arguments), reads them, and plots the location of the weather stations on the map. You can make use of the modules you wrote in Practical 2 or the Rainfall class you wrote in Practical 3 to help. The map plotting should be moved into a subroutine (what should the arguments be?) and the resulting file should be importable as a module.

Modify your program to accept any number of arguments corresponding to a list of files of weather station data files. Plot all weather stations on the map. Change the limits of the map to only show the region where you have data.

It seems to rain more in Bristol than in London. If you cannot see the Pennines from Old Trafford you know it’s raining, if you can see the hills you know it’s going to rain. Is the west really wetter than the east? You can generate a contour plot to try and find out.

NOTE:

Basemap objects have contour() and contourf() methods for adding line contours and filled contours of geographical data on a regular grid or a general irregular triangulation. Both methods have three compulsory arguments (which must be numpy array objects and take an optional argument tri=True which must be provided if you want Matplotlib to generate the contours from unstructured data. The first three arguments are arrays of x and y positions of each point (converted from the geographical coordinates of the weather stations using the Basemap object as a function) and a list of values (the rainfall at each weather station).

Using uki_map.py as a template, create a script that accepts a month and a list of weather station data files as arguments and plots a contour map of the average rainfall across Britain in the chosen month. You should avoid colouring the continents and oceans (remove the fillcontinents and drawmapboundary lines).

Exercise 3: Time series data and statistics

As well as simple x-y plots (see Exercise 2) Matplotlib is also able to plot time series data. The key to doing this is to provide a list of date objects for one of the axes of the graph. These are then appropriately spaced. In this exercise you will plot a graph of two time series representing the rainfall in, say, June at Yeovilton and Cambridge between 1980 and 2010. You will then look to see if the data is correlated (are wet summers in Yeovilton also wet summers in Cambridge) using the scipy.stats module. In doing this, you will take a look at the object oriented interface to Matplotlib.

Write a script that accepts five arguments, one to specify the month of interest, one to specify year to start the time series analysis (start the analysis on the 1 January of that year), one to specify the end year of the analysis (end on the 31 December of that year) and two arguments providing the location of the weather station data files. The script should produce a plot of the rainfall on the chosen month for each year of interest. Try to use the Matplotlib object oriented interface.

NOTE:

Matplotlib’s OO interface breaks graphs down into a number of components. The two most important are the Figure and the Axes. A Figure represents the graph paper while the Axes represent the area where a graph will be plotted. A Figure can have more than one set of Axes. Assuming you have imported matplotlib.pyplot as plt, a new figure can be generated using the function plt.figure() that accepts an (optional) figure name and returns a figure object. This object has several methods that can be used to add axes. The most useful is probably .add_subplot(111) (the numbers specify how to arrange multiple axes in the figure). An axes instance has methods equivalent to most of the graphing functions provided by Matplotlib. One useful way of using the OO interface is to write a method for your data class that accepts an empty axis as an argument and adda the data to the graph and returns the axis so that it can be plotted, but the basic way to plot a graph is shown below.

fig = plt.figure()
ax = fig.add_subplot(111)
# Maybe accept ax as an argument of a function
ax.plot(x_data, y_data)
ax.set_ylabel('Y axis name (units)')
ax.set_xlabel('X axis name (units)')
# Maybe return ax
fig.autofmt_xdate() # If you have dates.
plt.show()

As far as I can see it seems that monthly rainfall is correlated across the country. Wet months tend to be wet everywhere. One way to test this is to calculate the correlation coefficient between two time series. The scipy.stats module has functions to do this. Use the scipy.stats.pearsonr function (which takes two numpy arrays as arguments) to calculate the Pearson correlation coefficent and confidence interval for the correlation. Is the correlation positive? Is it significant?

Practical 4: NumPy, SciPy and Matplotlib

Exercise 1: The array object

Exercise 2: Maps

Exercise 3: Time series data and statistics

Exercise 1: The `array` object