This is an advanced-level course that teaches you how to use open-source Python libraries to process earth observation dataset using cloud and parallel-computing technologies. We have designed this course to help you build the required skills in a structured way and offer flexibility in where your data is stored and how it is processed. The course is divided into 3 sections.
In this section, we will take a single Sentinel-2 L1C scene downloaded from the Copernicus Browser and learn how to read it using XArray, visualize it and compute spectral indices.
The following blocks of code will install the required packages and download the datasets to your Colab environment.
%%capture
if 'google.colab' in str(get_ipython()):
!pip install rioxarray
!pip install jupyter-server-proxy
import os
import matplotlib.pyplot as plt
import xarray as xr
import rioxarray as rxr
import zipfile
import geopandas as gpd
data_folder = 'data'
output_folder = 'output'
if not os.path.exists(data_folder):
os.mkdir(data_folder)
if not os.path.exists(output_folder):
os.mkdir(output_folder)
def download(url):
filename = os.path.join(data_folder, os.path.basename(url))
if not os.path.exists(filename):
from urllib.request import urlretrieve
local, _ = urlretrieve(url, filename)
print('Downloaded ' + local)
s2_scene = 'S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE.zip'
data_url = 'https://storage.googleapis.com/spatialthoughts-public-data/'
download(data_url + 's2/' + s2_scene)
aoi = 'bangalore.geojson'
download(data_url + aoi)
We first unzip the zip archive and create a XArray Dataset from the individual band images.
s2_filepath = os.path.join(data_folder, s2_scene)
with zipfile.ZipFile(s2_filepath) as zf:
zf.extractall(data_folder)
Sentinel-2 images come as individual JPEG2000 rasters for each band.
The image files are located in the
GRANULE/{SCENE_ID}/IMG_DATA/
subfolder. We find the files
and read them using rioxarray
.
import glob
s2_folder = s2_filepath[:-4]
band_files = {}
for filepath in glob.glob(
os.path.join(s2_folder, 'GRANULE', '*', 'IMG_DATA', '*B*.jp2')):
filename = os.path.basename(filepath)
# Extract the part of the filename containing band name such as 'B01'
band_name = os.path.splitext(filename)[0].split('_')[-1]
band_files[band_name] = filepath
band_files
{'B05': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B05.jp2',
'B8A': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B8A.jp2',
'B07': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B07.jp2',
'B01': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B01.jp2',
'B12': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B12.jp2',
'B10': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B10.jp2',
'B08': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B08.jp2',
'B06': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B06.jp2',
'B04': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B04.jp2',
'B09': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B09.jp2',
'B02': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B02.jp2',
'B11': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B11.jp2',
'B03': 'data/S2A_MSIL1C_20230212T050931_N0509_R019_T43PGQ_20230212T065641.SAFE/GRANULE/L1C_T43PGQ_A039913_20230212T051514/IMG_DATA/T43PGQ_20230212T050931_B03.jp2'}
Different Sentinel-2 bands have different spatial resolutions. To put them in the same array, their dimensions must match. So we combine bands of similar resolutions and resample others to match them.
b4 = rxr.open_rasterio(python-remote-sensing-output/band_files['B04'], chunks=True)
b3 = rxr.open_rasterio(python-remote-sensing-output/band_files['B03'], chunks=True)
b2 = rxr.open_rasterio(python-remote-sensing-output/band_files['B02'], chunks=True)
b8 = rxr.open_rasterio(python-remote-sensing-output/band_files['B08'], chunks=True)
stack1 = xr.concat([b4, b3, b2, b8], dim='band').assign_coords(
band=['B04', 'B03', 'B02', 'B08'])
b5 = rxr.open_rasterio(python-remote-sensing-output/band_files['B05'], chunks=True)
b6 = rxr.open_rasterio(python-remote-sensing-output/band_files['B07'], chunks=True)
b7= rxr.open_rasterio(python-remote-sensing-output/band_files['B07'], chunks=True)
b8a = rxr.open_rasterio(python-remote-sensing-output/band_files['B8A'], chunks=True)
b11 = rxr.open_rasterio(python-remote-sensing-output/band_files['B11'], chunks=True)
b12 = rxr.open_rasterio(python-remote-sensing-output/band_files['B12'], chunks=True)
stack2 = xr.concat([b5, b6, b7, b8a, b11, b12], dim='band').assign_coords(
band=['B05', 'B06', 'B07', 'B8A', 'B11', 'B12'])
Now we reproject the bands to match the resolution of the 10m bands
using reproject_match()
Both band stacks now have the same resolution and hence the same X
and Y dimensions. We can now combine them across the band
dimension. The Sentinel-2 scenes come with NoData value of 0. So we set
the correct NoData value before further processing.
We will clip the scene to the geometry of the GeoJSON file. We must ensure that the projection of the GeoDataFrame and the XArray DataArray match.
aoi_path = os.path.join(data_folder, aoi)
aoi_gdf = gpd.read_file(aoi_path)
geometry = aoi_gdf.to_crs(scene.rio.crs).geometry
clipped = scene.rio.clip(geometry)
clipped
Sentinel-2 pixel values need to be converted to reflectances using the following formula:
ρ = (L1C_DN + RADIO_ADD_OFFSET) / QUANTIFICATION_VALUE
QUANTIFICATION_VALUE
for all S2 scenes is
10000 and starting from January 22,2023, all new scenes
have a RADIO_ADD_OFFSET
of -1000
We can visualize a 3-band composite image.
Let’s visualize the results.
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(5,5)
ndvi.plot.imshow(
ax=ax,
cmap='Greens',
robust=True,
cbar_kwargs=cbar_kwargs)
ax.set_title('NDVI')
ax.set_axis_off()
plt.show()
green = scaled.sel(band='B03')
swir1 = scaled.sel(band='B11')
mndwi = (green - swir1)/(green + swir1)
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(5,5)
mndwi.plot.imshow(
ax=ax,
cmap='Blues',
robust=True,
cbar_kwargs=cbar_kwargs)
ax.set_title('MNDWI')
ax.set_axis_off()
plt.show()
/usr/local/lib/python3.10/dist-packages/distributed/client.py:3160: UserWarning: Sending large graph of size 328.52 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
warnings.warn(
Calculate the Normalized Difference Built-Up Index (NDBI) for the image and visualize the built-up area using a ‘red’ palette
Hint: NDBI = (SWIR1 – NIR) / (SWIR1 + NIR)
In this section, we will take a single Landsat 8 scene downloaded from the USGS EarthExplorer and learn how to read it using XArray, visualize it and mask clouds.
The following blocks of code will install the required packages and download the datasets to your Colab environment.
%%capture
if 'google.colab' in str(get_ipython()):
!pip install rioxarray
!pip install jupyter-server-proxy
import os
import matplotlib.pyplot as plt
import xarray as xr
import rioxarray as rxr
import zipfile
import numpy as np
data_folder = 'data'
output_folder = 'output'
if not os.path.exists(data_folder):
os.mkdir(data_folder)
if not os.path.exists(output_folder):
os.mkdir(output_folder)
def download(url):
filename = os.path.join(data_folder, os.path.basename(url))
if not os.path.exists(filename):
from urllib.request import urlretrieve
local, _ = urlretrieve(url, filename)
print('Downloaded ' + local)
landsat_scene = 'LC08_L2SP_144052_20190209_20200829_02_T1.zip'
data_url = 'https://storage.googleapis.com/spatialthoughts-public-data/'
download(data_url + 'landsat/' + landsat_scene)
We first unzip the zip archive and create a XArray Dataset from the individual band images.
landsat_filepath = os.path.join(data_folder, landsat_scene)
with zipfile.ZipFile(landsat_filepath) as zf:
zf.extractall(data_folder)
Sentinel-2 images come as individual GeoTIFF rasters for each band.
The image band files cotaining data from the OLI sensor are named such
as *_SR_B1.TIF
, *_SR_B2.TIF
etc. We find the
files and read them using rioxarray
.
import glob
band_files = {}
for filepath in glob.glob(
os.path.join(data_folder, '*SR_B*.TIF')):
filename = os.path.basename(filepath)
# Extract the part of the filename containing band name such as 'B01'
band_name = os.path.splitext(filename)[0].split('_')[-1]
band_files[band_name] = filepath
band_files
We read the bands using rioxarray
and combine them into
a single DataArray.
bands = []
for band_name, band_path in band_files.items():
ds = rxr.open_rasterio(band_path, chunks=True, masked=True)
ds.name = band_name
bands.append(ds)
scene = xr.concat(bands, dim='band').assign_coords(
band=list(python-remote-sensing-output/band_files.keys()))
scene
Landsat Collection 2 surface reflectance has a scale factor of
0.0000275
and an additional offset of -0.2
per
pixel. We apply the scale and offset to get the reflectance values
We can visualize a 3-band composite image.
Most satellite providers store pixel quality information as bitmasks. This allows the providers to store a lot more information in an efficient way. Bitmasks can be hard to understand. Please read our post on Working with QA Bands and Bitmasks in Google Earth Engine to understand the concepts. Here we apply the same concept using a Python implementation of bitmasking adapted from Google Earth Engine.
Read the QA band which comes with a filename ending with
*QA_PIXEL.TIF
.
for qa_filepath in glob.glob(
os.path.join(data_folder, '*QA_PIXEL.TIF')):
break
pixel_qa = rxr.open_rasterio(qa_filepath, chunks=True, masked=True)
pixel_qa.name = 'pixel_qa'
pixel_qa
The 16-bit pixel value is determined by setting each of the bits to 0 or 1.
Since we are interested in clouds, the first 5 bits are relevant
Bit | Description |
---|---|
0 | Fill |
1 | Dilated Cloud |
2 | Cirrus |
3 | Cloud |
4 | Cloud Shadow |
We create a mask using the np.bitwise_and()
function
against the binary number 11111
. The result will be 0 only
if all the bits are set to 0 indicating clear conditions.
mask = xr.full_like(pixel_qa, int('11111', 2))
qa_mask = np.bitwise_and(pixel_qa, mask) == (0)
qa_mask
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(5,5)
scene_masked.sel(band=['B4', 'B3', 'B2']).plot.imshow(
ax=ax,
robust=True)
ax.set_title('RGB Visualization')
ax.set_axis_off()
plt.show()
Let’s write the resulting masked as a multi-band GeoTIFF file using rioxarray. We convert the DataArray to a DataSet with each band being a separate variable. This results in a much nicer GeoTIFF that is compatible with desktop software.
In this section, we will work with MODIS Vegetation Indices Version 6.1 data which are generated every 16 days at 250 meter (m) spatial resolution. This dataset provides Normalised Differnece Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) processed from the MODIS data. The source data was downloaded and processed via AρρEEARS which allows for pre-processing of the data into user-specified AOIs.
Here we will working with GeoTIFF files in EPSG:4326 projection and clipped to the state boundary of Karnataka, India. We will learn how to load all the files into a single XArray Dataset, apply a cloud-mask and extract a time-series of NDVI and EVI values.
The following blocks of code will install the required packages and download the datasets to your Colab environment.
%%capture
if 'google.colab' in str(get_ipython()):
!apt install libspatialindex-dev
!pip install fiona shapely pyproj rtree
!pip install geopandas
!pip install rioxarray
import datetime
import glob
import os
import re
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
import xarray as xr
import rioxarray as rxr
import zipfile
data_folder = 'data'
output_folder = 'output'
if not os.path.exists(data_folder):
os.mkdir(data_folder)
if not os.path.exists(output_folder):
os.mkdir(output_folder)
def download(url):
filename = os.path.join(data_folder, os.path.basename(url))
if not os.path.exists(filename):
from urllib.request import urlretrieve
local, _ = urlretrieve(url, filename)
print('Downloaded ' + local)
filename = 'modis_vegetation_indices_2020.zip'
url = 'https://storage.googleapis.com/spatialthoughts-public-data/' + filename
download(url)
Downloaded data/modis_vegetation_indices_2020.zip
First we unzip and extract the images to a folder.
zipfile_path = os.path.join(data_folder, filename)
with zipfile.ZipFile(zipfile_path) as zf:
zf.extractall(data_folder)
The timestamp for the image is part of the filename. We write a function to parse the filename and extract the timestamps.
def path_to_datetimeindex(filepath):
filename = os.path.basename(filepath)
pattern = r'doy(\d+)'
match = re.search(pattern, filepath)
if match:
doy_value = match.group(1)
timestamp = datetime.datetime.strptime(doy_value, '%Y%j')
return timestamp
else:
print('Could not extract DOY from filename', filename)
timestamps = []
filepaths = []
files = os.path.join(data_folder, 'modis_vegetation_indices_2020', '*.tif')
for filepath in glob.glob(files):
timestamp = path_to_datetimeindex(filepath)
filepaths.append(filepath)
timestamps.append(timestamp)
unique_timestamps = sorted(set(timestamps))
Iterate through each file and read it using XArray. There are 3 files
for each time-step. One file each for NDV
, EVI
and DetailedQA
information. We extract the first 2-bits of
information from the DetailedQA
band into a new band named
SummaryQA
. These 4 bands are combined into a single array
for each time-step and then combine all the arrays into a single array
with a time
dimension.
scenes = []
for timestamp in unique_timestamps:
ndvi_filepattern = r'NDVI_doy{}'.format(timestamp.strftime('%Y%j'))
evi_filepattern = r'EVI_doy{}'.format(timestamp.strftime('%Y%j'))
qa_filepattern = r'VI_Quality_doy{}'.format(timestamp.strftime('%Y%j'))
ndvi_filepath = [filepath for filepath in filepaths if re.search(ndvi_filepattern, filepath)][0]
evi_filepath = [filepath for filepath in filepaths if re.search(evi_filepattern, filepath)][0]
qa_filepath = [filepath for filepath in filepaths if re.search(qa_filepattern, filepath)][0]
ndvi_band = rxr.open_rasterio(ndvi_filepath, chunks={'x':512, 'y':512})
ndvi_band.name = 'NDVI'
evi_band = rxr.open_rasterio(evi_filepath, chunks={'x':512, 'y':512})
evi_band.name = 'EVI'
qa_band = rxr.open_rasterio(qa_filepath, chunks={'x':512, 'y':512})
qa_band.name = 'DetailedQA'
# First 2 bits are MODLAND QA Bits with summary information
# Extract the value of the 2-bits using bitwise AND operation
summary_qa = qa_band & 0b11
qa_band.name = 'SummaryQA'
# Pixel values need to be scaled with the scale_factor
scale_factor = 0.0001
scaled_bands = [ndvi_band * scale_factor, evi_band * scale_factor, qa_band, summary_qa]
scene = xr.concat(scaled_bands, dim='band')
scenes.append(scene)
As described in the MODIS VI User Guide MOD13Q1/A1 QA band contains pixel values representing overall pixel quality.
Value | Summary QA | Description |
---|---|---|
-1 | Fill/No Data | Not Processed |
0 | Good Data | Use with confidence |
1 | Marginal data | Useful, but look at other QA information |
2 | Snow/Ice | Target covered with snow/ice |
3 | Cloudy | Target not visible, covered with cloud |
We select all pixels with values 0 or 1 and mask all pixels with snow or cloud.
We extract the time-series at a specific X,Y coordinate.
This time-series has cloudy pixels which show up as anomalous NDVI and EVI values.
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15, 7)
time_series.plot.line(ax=ax, x='time', marker='o', linestyle='-', linewidth=1)
plt.show()
Let’s plot the cloud-masked time-series.
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15, 7)
time_series_masked.plot.line(ax=ax, x='time', marker='o', linestyle='-', linewidth=1)
plt.show()
This looks much better, but we have gaps in the time-series due to masked cloudy-pixels. Replace them with linearly interpolated values.
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15, 7)
time_series_masked_interpolated.plot.line(ax=ax, x='time', marker='o', linestyle='-', linewidth=1)
plt.show()
We can set the series colors to our choice of colors using Matplotlib Cycler.
from cycler import cycler
import matplotlib as mpl
colorlist = ['#006837', '#78c679']
custom_cycler = cycler(color=colorlist)
fig, ax = plt.subplots(1, 1)
ax.set_prop_cycle(custom_cycler)
fig.set_size_inches(15, 7)
time_series_masked_interpolated.plot.line(ax=ax, x='time', marker='o', linestyle='-', linewidth=1)
plt.show()
We can use to_pandas()
method to convert the DataArray
into a Pandas DataFrame. The masked pixels are NaN values that we can
fill with a NoData value of -9999.
Save the DataFrame as a CSV file.
Create another CSV file interpolated_time_series.csv
containing the linearly interpolated NDVI and EVI values.
Google Earth Engine (GEE) is a cloud-based platform that has a large public data catalog and computing infrastructure. The XEE extension makes it possible to obtain pre-processed data cube directly from GEE as a XArray Dataset. In this section, we will learn how to process the GEE data using XArray and Dask on local compute infrastructure using the time-series processing capabilities of XArray.
We will obtain a datacube of cloud-masked Sentinel-2 images for a year over a chosen region and apply temporal aggregation to obtain mean monthly NDVI images.
Note: You must have a Google Earth Engine account to complete this section. If you do not have one, follow our guide to sign up.
The following blocks of code will install the required packages and download the datasets to your Colab environment.
%%capture
if 'google.colab' in str(get_ipython()):
!pip install --upgrade xee
!pip install rioxarray
import ee
import xarray
import rioxarray as rxr
import matplotlib.pyplot as plt
import pandas as pd
import os
import datetime
import numpy as np
We initialize Google Earth Engine API.You must have a Google Cloud
Project associated with your GEE account. Replace the
cloud_project
with your own project from Google Cloud Console. Here
we are using the High
Volume Endpoint which is recommended when working with XEE.
We start with the Sentinel-2 collection, apply a cloud mask and compute NDVI using the Google Earth Engine API.
s2 = ee.ImageCollection('COPERNICUS/S2_HARMONIZED')
geometry = ee.Geometry.Polygon([[
[82.60642647743225, 27.16350437805251],
[82.60984897613525, 27.1618529901377],
[82.61088967323303, 27.163695288375266],
[82.60757446289062, 27.16517483230927]
]])
filtered = s2 \
.filter(ee.Filter.date('2019-01-01', '2020-01-01')) \
.filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 30)) \
.filter(ee.Filter.bounds(geometry))
# Write a function for Cloud masking
def maskS2clouds(image):
qa = image.select('QA60')
cloudBitMask = 1 << 10
cirrusBitMask = 1 << 11
mask = qa.bitwiseAnd(cloudBitMask).eq(0).And(
qa.bitwiseAnd(cirrusBitMask).eq(0))
return image.updateMask(mask).multiply(0.0001) \
.select('B.*') \
.copyProperties(image, ['system:time_start'])
filtered = filtered.map(maskS2clouds)
# Write a function that computes NDVI for an image and adds it as a band
def addNDVI(image):
ndvi = image.normalizedDifference(['B8', 'B4']).rename('ndvi')
return image.addBands(ndvi)
# Map the function over the collection
withNdvi = filtered.map(addNDVI)
Now we have an ImageCollection that we want to get it as a XArray Dataset. We define the region of interest and extract the ImageCollection using the ‘ee’ engine.
scale
in XEE behaves differently than in GEE API. The
value of the scale
is in the units of the CRS. So to get
the data at 10 meters resolution, we must also specify a CRS that has
the unit of meters.
ds = xarray.open_dataset(
withNdvi.select('ndvi'),
engine='ee',
crs='EPSG:3857',
scale=10,
geometry=geometry,
)
Since we want to work with coordinates in latitude and longitude, we
reproject the raster to EPSG:4326
. This requires renaming
and reordering the dimentions to those expected by rioxarray.
ds_reprojected = ds\
.rename({'Y': 'y', 'X': 'x'}) \
.transpose('time', 'y', 'x') \
.rio.reproject('EPSG:4326')
ds_reprojected
Select the ndvi
variable. Split the datacube into
‘chunks’ to allow parallel processing using Dask.
We have a irregular time-series with a lot of noise. Let’s create a
monthly NDVI time-series by computing the monthly average NDVI. We can
use XArray’s resample()
method to aggregate the time-series
by Months and coompute the mean. We specify time as
MS
to indicate aggregating by month start and
label as left
to indicate we want the start of the
month as our series labels.
Reference Pandas Offset Aliases
time_series_aggregated = original_time_series.resample(time='MS', label='left').mean()
time_series_aggregated
Plot and Extract the Time-Series at a Single Location
original_ts = original_time_series.interp(x=82.607376, y=27.164335).compute()
aggregated_ts = time_series_aggregated.interp(x=82.607376, y=27.164335).compute()
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15, 7)
original_ts.plot.line(
ax=ax, x='time',
marker='^', color='#66c2a4', linestyle='--', linewidth=1, markersize=2)
aggregated_ts.plot.line(
ax=ax, x='time',
marker='o', color='#238b45', linestyle='-', linewidth=1, markersize=4)
ax.set_title('Original vs. Monthly NDVI Time-Series', size = 18)
plt.show()
Save the processed monthly NDVI images using rioxarray
as GeoTIFF files.
for time in time_series_aggregated.time.values:
image = time_series_aggregated.sel(time=time)
date = np.datetime_as_string(time, unit='M')
output_file = f'{date}.tif'
output_path = os.path.join(output_folder, output_file)
image.rio.to_raster(output_path, driver='COG')
print(f'Created file at {output_path}')
Aggregate the time-series to 15-day interval. Apply gap-filling to fill missing values.
Hint: Refer to Reference Pandas Offset Aliases for appropriate string value for the time parameter.
Zonal Statistics is used to summarises the values of a raster dataset within the zones of a vector dataset. Here we select all Admin1 units of a country and calculate a sum of nighttime light pixel intensities over multiple years. This is a large computation that is enabled by cloud-hosted NightTime Lights data in the Cloud-Optimized GeoTIFF (COG) format and parallel computing on a local dask cluster.
The following blocks of code will install the required packages and download the datasets to your Colab environment.
%%capture
if 'google.colab' in str(get_ipython()):
!apt install libspatialindex-dev
!pip install fiona shapely pyproj rtree
!pip install geopandas
!pip install rioxarray
!pip install regionmask
import os
import glob
import pandas as pd
import geopandas as gpd
import numpy as np
import xarray as xr
import rioxarray as rxr
import matplotlib.pyplot as plt
from datetime import datetime
import dask
import regionmask
import warnings
import rasterio
warnings.filterwarnings("ignore", category=rasterio.RasterioDeprecationWarning)
from dask.distributed import Client, progress
client = Client() # set up local cluster on the machine
client
dask.config.set({"array.slicing.split_large_chunks": False})
dask.config.set({'array.chunk-size': '32MiB'})
data_folder = 'data'
output_folder = 'output'
if not os.path.exists(data_folder):
os.mkdir(data_folder)
if not os.path.exists(output_folder):
os.mkdir(output_folder)
def download(url):
filename = os.path.join(data_folder, os.path.basename(url))
if not os.path.exists(filename):
from urllib.request import urlretrieve
local, _ = urlretrieve(url, filename)
print('Downloaded ' + local)
admin1_zipfile = 'ne_10m_admin_1_states_provinces.zip'
admin1_url = 'https://naciscdn.org/naturalearth/10m/cultural/'
download(admin1_url + admin1_zipfile)
First we will read the GDL shapefile and filter to a country. The
‘adm1_code’ column contains a unique id for all the counties present in
the state, but it is of object
type. We need to convert it
to int
type to be used in xarray.
country_code = 'LK'
admin1_file_path = os.path.join(data_folder, admin1_zipfile)
admin1_df = gpd.read_file(admin1_file_path)
zones = admin1_df[admin1_df['iso_a2'] == country_code][['adm1_code', 'name', 'iso_a2', 'geometry']].copy()
zones['id'] = zones.reset_index().index + 1
zones
Next we read the NTL files and create an XArray object.
These files were download from Harvard Dataverse and converted to Cloud-Optimized GeoTIFFs using GDAL.
Example command
gdalwarp -of COG 2021_HasMask/LongNTL_2021.tif 2021.tif \
-te -180 -90 180 90 -dstnodata 0 \
-co COMPRESS=DEFLATE -co PREDICTOR=2 -co NUM_THREADS=ALL_CPUS
The resulting files are now hosted on a Google Cloud Storage bucket.
ntl_folder = 'https://storage.googleapis.com/spatialthoughts-public-data/ntl/npp_viirs_ntl'
da_list = []
for year in range(start_year, end_year + 1):
cog_url = f'{ntl_folder}/{year}.tif'
da = rxr.open_rasterio(
cog_url,
chunks=True).rio.clip_box(*bbox)
dt = pd.to_datetime(year, format='%Y')
da = da.assign_coords(time = dt)
da = da.expand_dims(dim="time")
da_list.append(da)
Now we will extract the sum of the raster pixel values for every admin1 region in the selected countrey.
First, we need to convert the GeoDataFrame to a XArray Dataset. We
will be using the regionmask
module for that. It takes
geodataframe, it’s unique value as integer and converts the geodataframe
into a xarray dataset
having dimension and coordinates same
as of given input xarray dataset
# Create mask of multiple regions from shapefile
mask = regionmask.mask_3D_geopandas(
zones,
ntl_datacube.x,
ntl_datacube.y,
drop=True,
numbers="id",
overlap=True
)
Finally, we save the result to disk.
Change the code to calculate the Zonal Statistics for all admin1 units in your chosen country.
This course material is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to re-use and adapt the material but are required to give appropriate credit to the original author as below:
Cloud-basd Remote Sensing with Python Course by Ujaval Gandhi www.spatialthoughts.com
This course is offered as an instructor-led online class. Visit Spatial Thoughts to know details of upcoming sessions.
© 2023 Spatial Thoughts www.spatialthoughts.com
If you want to report any issues with this page, please comment below.