Plotting spatio-temporal data with Python

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • How can I create maps with python?

Objectives
  • Learn how to plot spatio-temporal data with python.

This tutorial is based from the basemap tutorial.

What is matplotlib? Why should we learn to use it?

While python offers a large range of python packages for plotting spatio-temporal data, we will focus here on the most generic python interface to create maps. Most of other python packages used for plotting spatio-temporal data are based on matplotlib.

The main principle of matplotlib

First, matplotlib has two user interfaces:

When plotting with matplotlib, it is very important to know and understand that there are two approaches even though the reasons of this dual approach is outside the scope of this lesson.

When searching for help on the internet, you will find solutions where the two approaches are mixed which may be very confusing for you. The best for new codes is to stick to the object-oriented interface and this is what we will show in this lesson.

Key Point

New matplotlib users should learn and use the object oriented interface.

What is a Figure and an Axes?

The main idea with object-oriented programming is to have objects that one can apply functions and actions on, and no object or program states should be global (such as the MATLAB-like API). The real advantage of this approach becomes apparent when more than one figure is created, or when a figure contains more than one subplot as we will see in this tutorial.

To use the object-oriented API we need to create a new figure instance and store it in a variable; we call this variable fig but you can choose any name. Actually this is what we will need to do when we create more than one figure!

A figure can contain several plots (also called sub-plots) so the first thing to do is to choose how many sub-plots your figure will contain. This is done with the add_subplot method of an object of type figure:

 

The resulting figure is:

figures and axes

Here we created one subplot and one axes only. We will give another example afterwards to explain the arguments of the function add_subplot in more details.

Then we use the ax object to customize our subplot and finally plot data. For instance, we can add:

x
 

figures and axes

Simple plots - Timeseries

Our goal is to plot a timeserie for North Atlantic Oscillation (NAO) index. We take first sample data i.e. NAO indices for 10 days only from the 1st of June 2001 to the 10th of June 2001:

 

Then we plot with matplotlib:

 

figures and axes

date formatting

You can fully control how your dates appear in your plot:

 

figures and axes

DateFormatter: use strftime() format strings

What is basemap?

Basemap is an extension of matplotlib; one of the most common plotting library for Python. You may have heard or will hear about other python packages for plotting spatio-temporal data (for instance pandas, geopandas, pynio & pyngl, pyqgis, plotly, bokeh, cartopy, iris, scikit-learn, seaborn, etc.); many of them are using matplotlib/basemap underneath for plotting and are data specific. We think it is important to understand matplotlib/basemap and learn how to use it effectively in your scientific workflow. At the end of the workshop, we will talk about other packages to visualize your data and we hope that this lesson will make it easier.

Create simple maps with basemap

As explained in lesson Intro to Coordinate Reference Systems & Spatial Projections, the projection you choose for plotting your maps is very important and depends on what you want to visualize and over which region of the globe.

Plotting raster and vector data with basemap

plotting raster data

In our first example, we wish to plot the Vorticity fields from ECMWF ERA-Interim reanalysis for summer 2001 (June, July and August). ECMWF ERA-Interim is a global atmospheric reanalysis from 1979, continuously updated in real time and this dataset is freely available. However, you would need to register to fully access their public dataset. If you wish to download your own set of data in python (various parameters and dates) you can install the python package called ECMWF WEB-API. This part is out of scope of this tutorial and we provide a sample data set in netCDF format EI_VO_Summer2001.nc.

Inspect EI_VO_Summer2001.nc (check metadata):

 

What can you say about this file?

Solution

This file contains one variable only called Vorticity (relative), in s**-1, stored on a lat/lon gaussian grid for one level only and 368 times (UNLIMITED dimension which means we can potentialy add more times).


Let’s plot it on a world map:

 

figures and axes

And to zoom over a user-defined area:

 

figures and axes


Overlay another field

We can read another field “Mean Sea level pressure” and overlay it on top of the preceding plot:

 

figures and axes

Inset locators

We will zoom on a chosen area on our map and plot this additional map on the top right of our figure. The location of the zoomed box can be specified with an integer value:

 

Then we can make the same plot as before but for the zoomed area:

 

We can also mark our zoomed area in the original plot:

 

track plot with zoomed area

Challenge

We would like to add another plot on top of the previous figure in the zoomed area. The file tracks_20010601.csv contains storm tracks stored as a list of locations (longitude, latitude) and associated date/time.

  1. Add these locations using scatter or plot methods and label each point with its corresponding time (0h, 6h, 12h, 18h).
    datetime,lon,lat
    2001-06-01 00:00:00,-29.241394,61.386189
    2001-06-01 06:00:00,-23.598511,60.872208
    2001-06-01 12:00:00,-15.4021,59.676353
    2001-06-01 18:00:00,-7.580597,57.688072
    

This file can be read with the pandas python package:

 

Solution

track plot with zoomed area

Overlay GeoTIFF

A simple way to plot a GeoTIFF image and eventually overlay additional field/information is to use the same projection as the GeoTIFF file:

 

geotiff plot on cylindrical map

And if we would like to use pcolormesh instead of imshow:

 

geotiff plot on cylindrical map

As we can see on the figure above, using the projection of the GeoTIFF file is not always a good approach. To choose a better projection, we need to convert the coordinates taking the projection of our raster (here GeoTIFF) as the source and the projection defined in our Basemap object as the target coordinate system.

 

geotiff plot on mercator map

Remark

You can use the Pandas library to do read a wide range of data formats (including netCDF) and for instance to do statistics.

  • Pandas is a widely-used Python library for statistics, particularly on tabular data.
  • Borrows many features from R’s dataframes: * A 2-dimenstional table whose columns have names and potentially have different data types.
  • In our preceding example, we used it to load tracks_20010601.csv
  • Read a Comma Separate Values (CSV) data file with pandas.read_csv. * Argument is the name of the file to be read. * Assign result to a variable to store the data that was read. For more on pandas, see the Software Carpentry lesson Plotting and Programming in Python.

Plotting vector data (shapefile)

We will see how to visualize GeoJSON file in our last chapter when publishing on the web. Here we learn how to plot shapefile data only.

Shapefiles can be complex to visualize depending on what type of shapefile (points, lines, etc.). However, matplotlib has a very simple way to handle shapefiles with lines or polygons. Let’s take back an example from our chapter on data formats.

 

vector plot

We had nothing to do but reading the shapefile with readshapefile method from a Basemap object. The first argument is the shapefile name with its full path (but without .shp file extension) and the second argument is the name (to be found in the file). You can customize your plot, for instance, by specifying the color and line width:

 

vector plot

Often, this approach is not flexible enough…

Let’s take an example:

 

vector plot

Source: http://www.diva-gis.org/gdata

For instance, if we wish to fill polygons with different colors depending on the county, we can go through our shapefile and choose the color for each county. In our shapefile NOR_adm county can be identified by their names (NAME_1) or their associated identifier (ID_1):

 

Now we can assign a color for each county and make a plot where ploygons are filled according to the county name:

 

vector plot

or using gal ogr:

 

vector plot

Save your plots

When using a jupyter notebook, it is easy to save your figures:

However, when running a python script or when you have many figures to generate and save, the best is to use savefig:

 

To change the format, simply change the extension like so:

 

Optimize your scientific workflow: isolate computations and plotting

If you need to perform “heavy” computations prior to plotting, it is a good idea to separate the computational part from the plotting part and for instance, create separate python scripts for the data processing part and plotting. The best to save information from the computation scripts to pass it to the plotting script is to use “standard” formats, either text files or netCDF, HDF, GEO-TIFF or shapefiles for storing spatio-temporal data. The advantage of doing this is that it is easier to tweak the plotting script without re-running the computation every time.

Summary

  • Keep in mind the graphic below from the matplotlib faq to remember the terminology of a matplotlib plot i.e. what is a Figure and an Axes.

  • Always use the object-oriented interface. Get in the habit of using it from the start of your analysis.
  • For some inspiration, check out the matplotlib example gallery which includes the source code required to generate each example.
  • Get colornames from matplotlib here.

Key Points

  • matplotlib and basemap