This is an intermediate-level course that teaches you how to use Python for creating charts, plots, animations, and maps.
The code examples in this class use a variety of datasets. All the required datasets and Jupyter notebooks are supplied to you in the python_dataviz.zip
file. Unzip this file to a directory - preferably to the <home folder>/Downloads/python_dataviz/
folder.
Not enrolled in our instructor-led class but want to work through the material on your own? Get free access to the data package
This class requires installing Anaconda and creating a new environment named `python_dataviz
. Please review the Anaconda Installation Guide for step-by-step instructions.
Once you have created and activate the python_dataviz
environment, install the following packages from the conda-forge
channel.
geopandas
rasterio
matplotlib
jupyterlab
xarray
contextily
cartopy
Open the command prompt/terminal and run the following conda commands for installation.
conda create --name python_dataviz
conda activate python_dataviz
conda install -c conda-forge geopandas rasterio matplotlib jupyterlab xarray contextily cartopy -y
Open the notebook named 01_matplotlib_basics.ipynb
.
Before we start using matplotlib
inside a Jupyter notebook, it is useful to set the matplotlib backend to inline
. This setting makes the matplotlib graphs included in your notebook, next to the code. Learn more about different plotting backends.
We use the magic function %matplotlib
to achieve this.
import os
%matplotlib inline
import matplotlib.pyplot as plt
It is important to understand the 2 matplotlib objects
We create a figure and a single subplot. Specifying 1 row and 1 column for the subplots()
function create a figure and an axes within it. Even if we have a single plot in the figure, it is useful to use tthis logic of intialization so it is consistent across different scripts.
= plt.subplots(1, 1)
fig, ax 5,5)
fig.set_size_inches( plt.show()
First, let’s learn how to plot a single point using matplotlib. Let’s say we want to display a point at the coordinate (0.5, 0.5).
= (0.5, 0.5) point
We display the point using the plot()
function. The plot()
function expects at least 2 arguments, first one being one or more x coordinates and the second one being one or more y coordinates. Remember that once a plot is displayed using plt.show()
, it displays the plot and empties the figure. So you’ll have to create it again.
Reference: matplotlib.pyplot.plot
= plt.subplots(1, 1)
fig, ax 5,5)
fig.set_size_inches(0], point[1], color='green', marker='o')
ax.plot(point[ plt.show()
Note: Understanding *args
and **kwargs
Python functions accept 2 types of arguments. - Non Keyword Arguments: These are referred as *args
. When the number of arguments that a function takes is not fixed, it is specified as *args
. In the function plot()
above, you can specify 1 argument, 2 arguments or even 6 arguments and the function will respond accordingly. - Keyword Arguments: These are referred as **kwargs
. These are specified as key=value
pairs and usually used to specify optional parameters. These should always be specified after the non-keyword arguments. The color='green'
in the plot()
function is a keyboard argument.
One problematic area for plotting geospatial data using matplotlib is that geospatial data is typically represented as a list of x and y coordinates. Let’s say we want to plot the following 3 points defined as a list of (x,y) tuples.
= [(0.1, 0.5), (0.5, 0.5), (0.9, 0.5)] points
But to plot it, matplotlib
require 2 separate lists or x and y coordinates. Here we can use the zip()
function to create list of x and y coordinates.
= zip(*points)
x, y print(x)
print(y)
(0.1, 0.5, 0.9)
(0.5, 0.5, 0.5)
Now these can be plotted using the plot()
method. We specify the 3rd argument to the function is the fmt
string - specifying the format of the symbols. It can be used to specify a string describing the symbol as [marker][line][color]
sequence. In the following example, we specify it as og
- meaning [round markers][no line][green color]
= plt.subplots(1, 1)
fig, ax 5,5)
fig.set_size_inches('og')
ax.plot(x, y, plt.show()
You can save the figure using the savefig()
method. Remember to save the figure before calling plt.show()
otherwise the figure would be empty.
= plt.subplots(1, 1)
fig, ax 5,5)
fig.set_size_inches('og')
ax.plot(x, y,
= 'output'
output_folder = os.path.join(output_folder, 'simple.png')
output_path
plt.savefig(output_path)
plt.show()
Matplotlib provides many specialized functions for different types of plots. scatter()
for Scatter Charts, bar()
for Bar Charts and so on. You can use them directly, but in practice they are used via higher-level libraries like pandas
. In the next section, we will see how to create such charts.
Change the plot below to display the markers in a triangle shape.
Run all previous cells before attempting the exercise.
= plt.subplots(1, 1)
fig, ax 5,5)
fig.set_size_inches('og')
ax.plot(x, y, plt.show()
Open the notebook named 02_creating_charts.ipynb
.
Pandas allows you to read structured datasets and visualize them using the plot()
method. By default, Pandas uses matplotlib
to create the plots.
In this notebook, we will take work with open dataset of crime in London.
import pandas as pd
import os
import glob
%matplotlib inline
import matplotlib.pyplot as plt
We have 12 different CSV files containing crime data for each month of 2020. We can use the glob
module to find all files matching a pattern.
= 'data'
data_pkg_path = 'crime'
folder = '2020-*.csv'
file_pattern = os.path.join(data_pkg_path, folder, file_pattern)
file_path_pattern
= []
file_list for file in glob.glob(file_path_pattern):
file)
file_list.append( file_list
['data/crime/2020-05-metropolitan-street.csv',
'data/crime/2020-12-metropolitan-street.csv',
'data/crime/2020-01-metropolitan-street.csv',
'data/crime/2020-02-metropolitan-street.csv',
'data/crime/2020-11-metropolitan-street.csv',
'data/crime/2020-06-metropolitan-street.csv',
'data/crime/2020-09-metropolitan-street.csv',
'data/crime/2020-04-metropolitan-street.csv',
'data/crime/2020-08-metropolitan-street.csv',
'data/crime/2020-07-metropolitan-street.csv',
'data/crime/2020-10-metropolitan-street.csv',
'data/crime/2020-03-metropolitan-street.csv']
It will be helpful to merge all these files into a single dataframe. We can use pd.concat()
to merge a list of dataframes.
= []
dataframe_list
for file in file_list:
= pd.read_csv(file)
df
dataframe_list.append(df)
= pd.concat(dataframe_list) merged_df
Let’s create a pie-chart showing the distribution of different types of crime. Pandas groupby()
function allows us to calculate group statistics.
= merged_df.groupby('Crime type').size()
type_counts type_counts
Crime type
Anti-social behaviour 415105
Bicycle theft 23517
Burglary 61044
Criminal damage and arson 50923
Drugs 51629
Other crime 10066
Other theft 81924
Possession of weapons 5763
Public order 53458
Robbery 27269
Shoplifting 34588
Theft from the person 31084
Vehicle crime 108344
Violence and sexual offences 227208
dtype: int64
We now uses the plot()
method to create the chart. This method is a wrapper around matplotlib
and can accept supported arguments from it.
Reference: pandas.DataFrame.plot
= plt.subplots(1, 1)
fig, ax 15,7)
fig.set_size_inches(='pie', ax=ax, wedgeprops={'linewidth': 1.0, 'edgecolor': 'white'}, label='')
type_counts.plot(kind
plt.tight_layout()'Crime Types', fontsize = 18)
plt.title(
plt.show()
We can also chart the trend of crime over the year. For this, let’s group the data by month.
= merged_df.groupby('Month').size()
monthly_counts monthly_counts
Month
2020-01 90979
2020-02 86984
2020-03 87409
2020-04 109951
2020-05 114008
2020-06 100198
2020-07 103657
2020-08 104782
2020-09 99633
2020-10 99471
2020-11 96914
2020-12 87936
dtype: int64
= plt.subplots(1, 1)
fig, ax 15,7)
fig.set_size_inches(='bar', ax=ax)
monthly_counts.plot(kind plt.show()
We can make the chart more informating by stacking the chart with information about crime type.
= merged_df.groupby(['Month', 'Crime type']).size()
counts_by_type counts_by_type
Month Crime type
2020-01 Anti-social behaviour 17548
Bicycle theft 1172
Burglary 6889
Criminal damage and arson 4374
Drugs 4282
...
2020-12 Robbery 2021
Shoplifting 2690
Theft from the person 3075
Vehicle crime 7758
Violence and sexual offences 17836
Length: 168, dtype: int64
The result is not in a suitable format for plotting. We call unstack()
to create a dataframe.
= counts_by_type.unstack()
counts_df counts_df
Crime type | Anti-social behaviour | Bicycle theft | Burglary | Criminal damage and arson | Drugs | Other crime | Other theft | Possession of weapons | Public order | Robbery | Shoplifting | Theft from the person | Vehicle crime | Violence and sexual offences |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Month | ||||||||||||||
2020-01 | 17548 | 1172 | 6889 | 4374 | 4282 | 832 | 9497 | 562 | 4025 | 3263 | 3853 | 4256 | 11975 | 18451 |
2020-02 | 16975 | 1044 | 6216 | 4220 | 3818 | 757 | 9729 | 452 | 3842 | 3152 | 3845 | 4570 | 10405 | 17959 |
2020-03 | 23014 | 1078 | 5362 | 4392 | 3657 | 813 | 7531 | 483 | 3966 | 2711 | 2996 | 3414 | 9621 | 18371 |
2020-04 | 62763 | 1060 | 3661 | 3496 | 4978 | 751 | 3884 | 460 | 3464 | 1101 | 1691 | 677 | 6327 | 15638 |
2020-05 | 58502 | 1768 | 3886 | 3906 | 6427 | 823 | 4443 | 533 | 4250 | 1293 | 1956 | 795 | 7277 | 18149 |
2020-06 | 39584 | 2548 | 4320 | 4353 | 4665 | 882 | 5387 | 463 | 4966 | 1705 | 2400 | 1194 | 8102 | 19629 |
2020-07 | 35588 | 2833 | 4928 | 4692 | 4569 | 892 | 6977 | 453 | 5584 | 2168 | 3099 | 2072 | 8811 | 20991 |
2020-08 | 35842 | 3019 | 4995 | 4710 | 3534 | 780 | 7647 | 451 | 5490 | 2530 | 3006 | 2542 | 8919 | 21317 |
2020-09 | 30863 | 3078 | 5195 | 4274 | 3541 | 964 | 7516 | 503 | 5167 | 2599 | 3060 | 2696 | 9829 | 20348 |
2020-10 | 31186 | 2619 | 5618 | 4214 | 4124 | 812 | 7248 | 500 | 4577 | 2440 | 3222 | 3225 | 10148 | 19538 |
2020-11 | 33863 | 1985 | 5209 | 4205 | 4410 | 981 | 5734 | 511 | 4239 | 2286 | 2770 | 2568 | 9172 | 18981 |
2020-12 | 29377 | 1313 | 4765 | 4087 | 3624 | 779 | 6331 | 392 | 3888 | 2021 | 2690 | 3075 | 7758 | 17836 |
Now we can create the stacked bar chart. Instead of the default legend, we create a horizontal legend with a frame using the legend()
function.
= plt.subplots(1, 1)
fig, ax 20,10)
fig.set_size_inches(='bar', stacked=True, ax=ax, colormap='tab20')
counts_df.plot(kind='upper center', ncol=5, frameon=True, bbox_to_anchor=(0.5, 1.1), fancybox=True, shadow=True)
plt.legend(loc'right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['Year', size = 15)
plt.xlabel('Number of Incidents', size = 15)
plt.ylabel('Crime in London (2020)', size = 18, y=1.1)
plt.title(= 'output'
output_folder = os.path.join(output_folder, 'stacked_chart.jpg')
output_path
plt.savefig(output_path) plt.show()
Plot the trend of Bicycle thefts as a line chart.
Hint: Select the column ‘Bicycle theft’ from the counts_df
dataframe and use the plot()
function on the result.
Open the notebook named 03_creating_maps.ipynb
.
Similar to Pandas, GeoPandas has a plot()
method that can plot geospatial data using Matplotlib.
We will work with census data to create a choropleth map of population density. We will start with a shapefile of census tracts, and join it with tabular data to get a GeoDataframe with census tract geometry and correponding populations.
import geopandas as gpd
import os
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
Read the census tracts shapefile as a GeoDataframe
= 'data'
data_pkg_path = 'census'
folder = 'tl_2019_06_tract.shp'
filename = os.path.join(data_pkg_path, folder, filename)
file_path = gpd.read_file(file_path) tracts
The plot()
method will render the data to a plot.
Reference: geopandas.GeoDataFrame.plot
= plt.subplots(1, 1)
fig, ax 10,10)
fig.set_size_inches(=ax)
tracts.plot(ax plt.show()
The shapefile does not have any demographic data columns. But we have a CSV file containing a variety of population statistics for each tract. We read this file as a Pandas DataFrame.
= 'data'
data_pkg_path = 'census'
folder = 'ACSST5Y2019.S0101_data.csv'
filename = os.path.join(data_pkg_path, folder, filename)
file_path = pd.read_csv(file_path, skiprows=[1]) table
To join this DataFrame with the GeoDataFrame, we need a column with unique identifiers. We use the GEOID
column and process the id so they match exactly in both datasets.
= table[['GEO_ID','NAME', 'S0101_C01_001E']]
filtered = filtered.rename(columns = {'S0101_C01_001E': 'Population', 'GEO_ID': 'GEOID'})
filtered
'GEOID'] = filtered.GEOID.str[-11:] filtered[
Finally, we do a table join using the merge
method.
= tracts.merge(filtered, on='GEOID') gdf
For creating a choropleth map, we must normalize the population counts. US Census Bureau recommends calculating the population density by dividing the total population by the land area. The original shapefile contains a column ALAND
with the land area in square kilometers. Using it, we compute a new column density
containing the persons per square kilometer.
'density'] = 1e6*gdf['Population']/gdf['ALAND'] gdf[
For a choropleth map, we can specify a color ramp and classification scheme.
References: - Matplotlib Colormaps - Mapclassify Classification Schemes
= plt.subplots(1, 1)
fig, ax 10,10)
fig.set_size_inches(=ax, column='density', cmap='RdYlGn_r', scheme='quantiles', k=10, legend=True)
gdf.plot(ax
ax.set_axis_off() plt.show()
Instead of the class breaks being determined by the classification scheme, we can manually specify the ranges.
= plt.subplots(1, 1)
fig, ax 10,10)
fig.set_size_inches(=ax, column='density', cmap='RdYlGn_r', scheme='User_Defined',
gdf.plot(ax=True, classification_kwds=dict(bins=[1,10,25,50,100, 250, 500, 1000, 5000]))
legend
ax.set_axis_off()'California Population Density (2019)', size = 18)
plt.title(
= 'output'
output_folder = os.path.join(output_folder, 'california_pop.png')
output_path =300)
plt.savefig(output_path, dpi plt.show()
Plot the census tracts geodataframe tracts
with just outlines and no fill color.
Hint: The styling options as specified as keyword arguments to the plot()
function. Set the facecolor
option to 'none'
for no fills. Check the style_kwds parameter of the plot()
method for more details.
= plt.subplots(1, 1)
fig, ax 10,10)
fig.set_size_inches(=ax)
tracts.plot(ax plt.show()
Open the notebook named 04_using_basemaps.ipynb
.
Creating geospatial visualizations oftern require overlaying your data on a basemap. Contextily is a package that allows you to fetch various basemaps from the internet and ad them to your plot as static images.
We will learn how to take a shapefile showing the path of the 2017 Solar Eclipse and create a map with a topographic map.
import contextily as cx
import geopandas as gpd
import os
%matplotlib inline
import matplotlib.pyplot as plt
= 'data'
data_pkg_path = 'eclipse'
folder = os.path.join(data_pkg_path, folder, 'upath17.shp')
upath_file = gpd.read_file(upath_file) path_gdf
We can show the GeoDataFrame using the plot()
method.
= plt.subplots(1, 1)
fig, ax 15,7)
fig.set_size_inches(=ax)
path_gdf.plot(ax plt.show()
The visualization is not useful as it is missing context. We want to overlay this on a basemap to understand where the eclipse was visible from. We can choose from a variety of basemap tiles. There are over 200 basemap styles included in the library. Let’s see them using the providers
property.
= cx.providers providers
For overlaying the eclipse path, let’s use the OpenTopoMap basemap. We need to specify a CRS for the map. For now, let’s use the CRS of the original shapefile.
= plt.subplots(1, 1)
fig, ax 15,7)
fig.set_size_inches(=ax, facecolor='#cccccc', edgecolor='#969696', alpha=0.5)
path_gdf.plot(ax
=path_gdf.crs, source=cx.providers.OpenTopoMap)
cx.add_basemap(ax, crs plt.show()
The web tiles for the basemap are in the Web Mercator CRS (EPSG:3857). When you request them in a different CRS, they are warped to the requested CRS. This may cause the labels to not be legible in some cases. Instead, we can request the tiles in their original CRS and reproject our data layer to its CRS.
= path_gdf.to_crs('epsg:3857')
path_reprojected = plt.subplots(1, 1)
fig, ax 15,7)
fig.set_size_inches(=ax, facecolor='#cccccc', edgecolor='#969696', alpha=0.5)
path_reprojected.plot(ax
=path_reprojected.crs, source=cx.providers.OpenTopoMap)
cx.add_basemap(ax, crs
ax.set_axis_off()'2017 Total Solar Eclipse Path', size = 18)
plt.title(
= 'output'
output_folder = os.path.join(output_folder, 'eclipse_path.png')
output_path =300)
plt.savefig(output_path, dpi plt.show()
Instead of the OpenTopoMap, create a visualization using the Toner basemap from Stamen.
= plt.subplots(1, 1)
fig, ax 15,7)
fig.set_size_inches(=ax, facecolor='#cccccc', edgecolor='#969696', alpha=0.5)
path_gdf.plot(ax
=path_gdf.crs, source=cx.providers.OpenTopoMap)
cx.add_basemap(ax, crs plt.show()
Open the notebook named 05_animation_basics.ipynb
.
This notebook shows how to use the FuncAnimation
function from the matplotlib.animation
module to create animated plots.
import os
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation, PillowWriter
Let’s understand the basics of matplotlib animation with a simple example. We will define 3 positions and we will create an animation of a point moving between them.
= [(0.1, 0.5), (0.5, 0.5), (0.9, 0.5)] points
Then we use the FuncAnimation
class which makes an animation by repeatedly calling a function and saving the output as a frame in the animation.
We need to define a function that takes the frame number and generates a plot from it. Here we define a function animation
that takes the frame index and creates a plot from the point at the same index in the points
list. So at frame 0, it will display the first point, frame 1 the second point and so on.
= plt.subplots(1, 1)
fig, ax 5,5)
fig.set_size_inches(=[0, 0.03, 1, 0.95])
fig.tight_layout(rect
def animate(i):
ax.clear()# Get the point from the points list at index i
= points[i]
point # Plot that point using the x and y coordinates
0], point[1], color='green',
ax.plot(point[='original', marker='o')
label# Set the x and y axis to display a fixed range
0, 1])
ax.set_xlim([0, 1])
ax.set_ylim([= FuncAnimation(fig, animate, frames=len(points),
ani =500, repeat=True)
interval plt.close()
The animation is now contained in the ani
object. We can call save()
and save the result as an animated GIF. We need to specify a writer
that supports the output format.
= 'output'
output_folder = os.path.join(output_folder, 'simple_animation.gif')
output_path =300,
ani.save(output_path, dpi=PillowWriter(fps=1)) writer
We can also use the to_jshtml()
function to create an HTML representation of the animation and display in a Jupyter notebook.
from IPython.display import HTML
#HTML(ani.to_jshtml())
Open the notebook named 06_animating_maps.ipynb
.
We can now apply the same technique to animate the path of the solar eclipse.
import contextily as cx
import geopandas as gpd
import os
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation, PillowWriter
= 'data'
data_pkg_path = 'eclipse'
folder = os.path.join(data_pkg_path, folder, 'upath17.shp')
upath_file = gpd.read_file(upath_file)
path_gdf = os.path.join(data_pkg_path, folder, 'w_umbra17_1m.shp')
umbra_file = gpd.read_file(umbra_file) umbra_gdf
= path_gdf.to_crs('epsg:3857')
path_reprojected = umbra_gdf.to_crs('epsg:3857') umbra_reprojected
= plt.subplots(1, 1)
fig, ax 15,7)
fig.set_size_inches(=ax, facecolor='#cccccc', edgecolor='#969696', alpha=0.5)
path_reprojected.plot(ax=ax, facecolor='none', edgecolor='#636363', alpha=0.5)
umbra_reprojected.plot(ax
=path_reprojected.crs, source=cx.providers.OpenTopoMap)
cx.add_basemap(ax, crs
ax.set_axis_off()'2017 Total Solar Eclipse Path', size = 18)
plt.title(
plt.show()
= path_reprojected.geometry.unary_union
path_boundary = umbra_reprojected[umbra_reprojected.geometry.intersects(path_boundary)] umbra_subset
= plt.subplots(1, 1)
fig, ax 15,7)
fig.set_size_inches(=ax, facecolor='#cccccc', edgecolor='#969696', alpha=0.5)
path_reprojected.plot(ax=ax, facecolor='none', edgecolor='#636363', alpha=0.5)
umbra_subset.plot(ax
=path_reprojected.crs, source=cx.providers.OpenTopoMap)
cx.add_basemap(ax, crs
ax.set_axis_off()'2017 Total Solar Eclipse Path', size = 18)
plt.title(
plt.show()
= plt.subplots(1, 1)
fig, ax 15,7)
fig.set_size_inches(
plt.tight_layout()
def animate(i):
ax.clear()# Get the point from the points list at index i
= umbra_subset.iloc[i:i+1]
umbra_filtered =ax, facecolor='#cccccc', edgecolor='#969696', alpha=0.5)
path_reprojected.plot(ax=path_reprojected.crs, source=cx.providers.OpenTopoMap)
cx.add_basemap(ax, crs=ax, facecolor='#252525', edgecolor='#636363', alpha=0.5)
umbra_filtered.plot(ax
ax.set_axis_off()= dict(boxstyle='round', facecolor='wheat', alpha=0.5)
props = umbra_filtered.iloc[0].UTCTime
time = 'Time: {} UTC'.format(time)
text 0.05, 0.20, text, transform=ax.transAxes, fontsize=16,
ax.text(='top', bbox=props)
verticalalignment'2017 Total Solar Eclipse Path', size = 18)
ax.set_title(
= FuncAnimation(fig, animate, frames=len(umbra_subset),
ani =500, repeat=True, cache_frame_data=True)
interval plt.close()
import matplotlib as mpl
'animation.embed_limit'] = 2**128
mpl.rcParams[from IPython.display import HTML
#HTML(ani.to_jshtml())
= 'output'
output_folder = os.path.join(output_folder, 'solar_eclipse.gif')
output_path
#ani.save(output_path, writer=PillowWriter(fps=5))
Open the notebook named 07_visualizing_rasters.ipynb
.
RasterIO supports visualizing raster data using Matplotlib.
In this section, we will learn how to visualize a DEM raster and annotate it with some information.
import glob
import os
import rasterio
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
We have 4 different SRTM tiles in the directory. We get a list of them using the glob
module.
= 'data'
data_pkg_path = os.path.join(data_pkg_path, 'srtm', '*.hgt')
srtm_path = glob.glob(srtm_path)
all_files all_files
['data/srtm/N28E086.hgt',
'data/srtm/N28E087.hgt',
'data/srtm/N27E087.hgt',
'data/srtm/N27E086.hgt']
Let’s open the first tile and read it using rasterio.
= all_files[0]
file1 = rasterio.open(file1)
dataset = dataset.read(1)
band = dataset.transform
transform dataset.close()
The band
variable is a Numpy Array. Matplotlib can render this as an image using the imshow()
method.
= plt.subplots(1, 1)
fig, ax 7,7)
fig.set_size_inches(
='Greys_r')
ax.imshow(band, cmap plt.show()
Notice that the X and Y axis displays the column/row numbers, not coordinates. To display the image with the correct georeference information, rasterio providers a plotting API that correctly transforms the image. Instead of matplotlib’s imshow()
, we use rasterio’s show()
method, which takes an additonal argument for the transform
.
from rasterio.plot import show
= plt.subplots(1, 1)
fig, ax 7,7)
fig.set_size_inches(
='Greys_r', ax=ax, transform=transform)
show(band, cmap plt.show()
So far, we have only created a single Axes within a Figure. But matplotlib allows you to create layouts that can contain multiple plots in a single figure. Let’s now visualize all 4 tiles together in a single figure. We first read all tiles and store the opened rasters in a list.
= []
datasets for file in all_files:
= os.path.join(srtm_path, file)
path = rasterio.open dataset
Create 1 row of 4 subplots using the subplots()
method.
Reference: - matplotlib.pyplot.subplots - Arranging multiple Axes in a Figure
= plt.subplots(1, 4)
fig, axes 15,3)
fig.set_size_inches( plt.tight_layout()
The axes
variable contains a list of 4 axes objects. We show 1 tile in each of the axes.
= plt.subplots(1, 4)
fig, axes 15,3)
fig.set_size_inches(
plt.tight_layout()
for index, file in enumerate(python-dataviz-output/all_files):
with rasterio.open(file) as dataset:
= dataset.read(1)
band = dataset.transform
transform = axes[index]
ax =ax, cmap='Greys_r', transform=transform)
show(band, ax= all_files[index]
filename
ax.set_title(os.path.basename(filename))
plt.show()
Since each tile represents a different region, a better visualization would be to merge all of them into a single raster.
from rasterio import merge
= []
dataset_list for file in all_files:
open(file))
dataset_list.append(rasterio.
= merge.merge(dataset_list) merged_data, merged_transform
Similarly, we can visualize the merged raster.
= plt.subplots(1, 1)
fig, ax 12, 12)
fig.set_size_inches(=ax, cmap='viridis', transform=merged_transform)
show(merged_data, ax'merged')
ax.set_title( plt.show()
The DEM is of the region surrounding Mt.Everest. Let’s try to find the coordinates of Mt. Everest by queriing this merged raster for the highest value. The merged_data
variable contains the numpy array. But it has an extra empty dimension. We use the squeeze()
method to remove the empty extra dimention and get a 2D array.
= merged_data.squeeze() merged_array
Next we obtain the coordinates of the highest pixel value in the array.
= np.where(merged_array == np.max(merged_array))
rows, cols = rows[0]
row = cols[0]
col = rasterio.transform.xy(merged_transform, row, col)
lon, lat print(lat, lon)
27.988888888888887 86.92555555555556
We can use the annotate()
method to add a label on the plot with a text.
= plt.subplots(1, 1)
fig, ax 12, 12)
fig.set_size_inches(=ax, cmap='viridis', transform=merged_transform)
show(merged_data, ax'^r', markersize=11)
ax.plot(lon, lat, "Mt. Everest",
ax.annotate(=(lon, lat), xycoords='data',
xy=(20, 20), textcoords='offset points',
xytext=dict(arrowstyle="->", color='black')
arrowprops
)
= 'output'
output_folder = os.path.join(output_folder, 'mt_everest.png')
output_path =300)
plt.savefig(output_path, dpi
plt.show()
Create a layout using the subplots()
method with 2 rows and 2 columns. Plot a different color marker at location (1,1) in each plot.
Hint: You can access the axes in a particular row/col using the index notation. axes[0,0]
will return the axes in the first row and first column.
Open the notebook named 08_mapping_gridded_datasets.ipynb
.
Many climate and meteorological datasets come as gridded rasters in data formats such as NetCDF and GRIB. XArray provides Plotting Functions based on Matplotlib.
In this section, we will take the Gridded Monthly Temperature Anomaly Data from 1880-2020 from GISTEMP and visualize the temperature anomaly for the year 2021.
import os
from matplotlib import pyplot as plt
import cartopy
import cartopy.crs as ccrs
import xarray as xr
= 'data'
data_pkg_path = os.path.join(data_pkg_path, 'gistemp','gistemp1200_GHCNv4_ERSSTv5.nc')
file_path
= xr.open_dataset(file_path) ds
The NetCDF file contains a grid of values for each month from 1880-2020. The time
dimension has a length of 1704.
ds
<xarray.Dataset> Dimensions: (lat: 90, lon: 180, time: 1704, nv: 2) Coordinates: * lat (lat) float32 -89.0 -87.0 -85.0 -83.0 ... 83.0 85.0 87.0 89.0 * lon (lon) float32 -179.0 -177.0 -175.0 -173.0 ... 175.0 177.0 179.0 * time (time) datetime64[ns] 1880-01-15 1880-02-15 ... 2021-12-15 Dimensions without coordinates: nv Data variables: time_bnds (time, nv) datetime64[ns] 1880-01-01 1880-02-01 ... 2022-01-01 tempanomaly (time, lat, lon) float32 ... Attributes: title: GISTEMP Surface Temperature Analysis institution: NASA Goddard Institute for Space Studies source: http://data.giss.nasa.gov/gistemp/ Conventions: CF-1.6 history: Created 2022-01-11 09:09:58 by SBBX_to_nc 2.0 - ILAND=1200,...
We can aggregate the data to yearly time-steps using the resample()
method, reducing the time
dimension.
= ds.resample(time='Y').mean()
yearly yearly
<xarray.Dataset> Dimensions: (time: 142, lat: 90, lon: 180) Coordinates: * time (time) datetime64[ns] 1880-12-31 1881-12-31 ... 2021-12-31 * lat (lat) float32 -89.0 -87.0 -85.0 -83.0 ... 83.0 85.0 87.0 89.0 * lon (lon) float32 -179.0 -177.0 -175.0 -173.0 ... 175.0 177.0 179.0 Data variables: tempanomaly (time, lat, lon) float32 nan nan nan nan ... 3.729 3.729 3.729
We extract the tempanomaly
variable and use the indexing method isel()
to extract the latest time slice.
= yearly['tempanomaly']
anomaly = anomaly.isel(time=-1) anomaly2021
We can now plot this data using the imshow()
method from xarray.
from xarray.plot import imshow
imshow(anomaly2021)
<matplotlib.image.AxesImage at 0x17a743130>
To create more informative map visualization, we need to reproject this grid to another projection. CartoPy supports a wide range of projections and can plot them using matplotlib. CartoPy creates a GeoAxes object and replaces the default Axes
with it. This allows you to plot the data on a specified projection.
Reference: CartoPy List of Projections
= plt.axes(projection=ccrs.Orthographic(0, 40))
ax
ax.coastlines()= plt.gcf()
fig 5,5)
fig.set_size_inches( plt.show()
We can create a GeoAxes with a custom Orthographic projection and plot the temperature anomaly data on it. The transform
argument specifies the CRS of the original dataset.
= plt.axes(projection=ccrs.Orthographic(0, 30))
ax
ax.coastlines()=ax,
anomaly2021.plot.imshow(ax=-4, vmax=4, cmap='coolwarm',
vmin=ccrs.PlateCarree())
transform
= plt.gcf()
fig 5,5)
fig.set_size_inches(
plt.tight_layout() plt.show()
We can further customize the map by adjusting the colorbar.
Reference: matplotlib.pyplot.colorbar
= {
cbar_kwargs 'orientation':'horizontal',
'location': 'bottom',
'fraction': 0.025,
'pad': 0.05,
'extend':'neither'
}
= plt.axes(projection=ccrs.Orthographic(0, 30))
ax
ax.coastlines()
anomaly2021.plot.imshow(=ax,
ax=-4, vmax=4, cmap='coolwarm',
vmin=ccrs.PlateCarree(),
transform=False,
add_labels=cbar_kwargs)
cbar_kwargs
= plt.gcf()
fig 10,10)
fig.set_size_inches('Temprature Anomaly in 2021 (°C)', fontsize = 14)
plt.title(
= 'output'
output_folder = os.path.join(output_folder, 'anomaly.jpg')
output_path =300)
plt.savefig(output_path, dpi plt.show()
Display the map in an Equal Earth projection.
This course material is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to use the material for any non-commercial purpose. Kindly give appropriate credit to the original author.
If you would like to utilize these materials as part of a commercial offering, you can purchase a Trainer License for a small fee.
Please contact us for pricing and terms.
© 2022 Ujaval Gandhi www.spatialthoughts.com
This course is offered as an instructor-led online class. Visit Spatial Thoughts to know details of upcoming sessions.