This workshop introduces participants to the modern approach to working with large datasets in QGIS. Modern data formats - such as Cloud-Optimized GeoTIFFs (COG), Cloud-Optimized Point Clouds (COPC), and FlatGeoBuf (FGB) allow datasets to be streamed from cloud storage without having to download entire files. Spatial Temporal Asset Catalog (STAC) provides a standardized way to query cloud-hosted datasets. Combined with QGIS, these technologies allow users to visualize and analyze large datasets which was not possible before.
This workshop requires QGIS LTR version 3.34. Please review QGIS-LTR Installation Guide for step-by-step instructions.
We have created a data package containing checkpoint projects that will allow you to load the results of each section. You can download the data package from qgis_cloud_native_geospatial.zip. Once downloaded, unzip the contents to a folder on your computer.
OpenStreetMap
layer is loaded, go to your
region of interest. Now we will load a 8GB cloud-optimized geotiff file
stored on Google Cloud Storage and stream the pixels over this region.
Click the Open Data Source Manager button.Protocol: HTTP(S), cloud etc.
as the Source Type.
Enter the following URL as the URI. Click Add.https://storage.googleapis.com/spatialthoughts-public-data/ntl/viirs/viirs_ntl_2021_global.tif
viirs_ntl_2021_global
will be added to the
Layers panel. This is a global layer showing nighttime lights
intensity values recorded by the VIIRS instrument. Note that the file
loaded instantly and as you zoom in further, higher resolution pixels
are fetched dynamically from the file. While this file is located on a
remote location, you can use it just as if it was a regular GeoTIFF
file. Let’s visualize the layer using a color-ramp. Click the Open
Layer Styling Panel button on the Layers panel.Singleband pseudocolor
. Select a color ramp of your choice.
The layer rendering will be updated.Protocol: HTTP(S), cloud etc.
as the Source Type.
Enter the following URL as the URI. Click Add.https://github.com/spatialthoughts/cloud-native-geospatial/releases/download/hydrorivers/hydrorivers.fgb
hydrorivers -- rivers
will be added to the
Layers panel. This is a large line layer containing over 2.5
million features. QGIS will request and load only the features within
the current canvas extent. FlatGeoBuf format does not contain simplified
versions of features that can be loaded at lower zoom levels. So if you
zoom out to a large region, all the features within that region will be
requested. To prevent fetching unnecessarily large amounts of data - we
can enable scale dependent visibility. Right-click the
hydrorivers -- rivers
layer and select
Properties.1:1000000
. Click
OK.Your output should match the contents of the
cloud_native_geospatial_checkpoint1.qgz
file in your data
package.
There are many publicly accessible STAC catalogs available - with more providers making their data available through STAC. To explore all the available catalogs, you can visit the STAC Index. Click on Catalogs and explore the listings.
OpenStreetMap
layer is loaded,
go to your region of interest. Click the Open Data Source
Manager button.Protocol: HTTP(S), cloud etc.
as the Source Type.
Enter the URL copied in the previous step as the URI. Click
Add.https://s3.openlandmap.org/arco/pop.count_ghs.jrc_m_100m_s_20210101_20211231_go_epsg.4326_v20230620.tif
pop.count_ghs.jrc_m_100m_s_20210101_20211231_go_epsg.4326_v20230620
will be added to the Layers panel. This image represents the
population count in each 100m x 100m grid cell. The original data comes
with decimal values, but to optimize the storage space and transfer of
data, the image is saved with the data type UInt32 (Integers).
To preserve the full fidelity of the data, the original pixel values are
multiplied by 100. Hence if the the pixel value is 2068
-
it represents a population count of 20.68
. Keep this in
mind and we will appropriately scale our results with this factor.Protocol: HTTP(S), cloud etc.
as the Source Type.
Enter the following URL as the URI. Click Add.https://storage.googleapis.com/spatialthoughts-public-data/geoboundaries/admin2.fgb
admin2 -- adm2_polygons
will be added to
the Layers panel. This is a very large vector polygon layer and
the polygons for Admin2 regions within the current canvas extents will
be fetched and loaded. We will pick a region of interest. To make the
selection easier, let’s add some labels to the polygons. Click the
Open Layer Styling Panel button on the Layers
panel.Select the Labels tab. Select Single Labels
from the drop-down menu and select adm2_name
as the
Value. The name of the administrative region will be rendered
on the map.admin2 -- adm2_polygons
as the Input layer. Make
sure to check the Selected features only box. Select the
pop.count_ghs.jrc_m_100m_s_20210101_20211231_go_epsg.4326_v20230620
layer as the Raster layer. Enter population_
as
the Output column prefix. For the Statistics to
calculate, click the … button and select only
Sum
. Save the output as a file
zonal_stats.gpkg
and click Run.zonal_stats
will
be added to the Layers panel. This layer contains the result of
the Zonal Statistics that was computed directed from the cloud-hosted
dataset by fetching only required pixels and features - saving us time,
bandwidth and storage.zonal_stats
as the Input layer and enter
population
as the Field name. Enter the following
expression to scale the population value. Enter the output file name as
zonal_stats_final.gpkg
and click Run."population_sum" / 100
zonal_stats_final
is loaded, right-click the layer and
select Open Attribute Table. The field
population contains the total population within the
selected administrative region.Your output should match the contents of the
cloud_native_geospatial_checkpoint2.qgz
file in your data
package.
STAC API Browser
. Once you
find the plugin, click Install.Microsoft Planetary Computer STAC API
as the
Connections. Click Fetch collections to get all the
collections available through the API. Scroll down and select the
ESA WorldCover
collection.Land Cover Classes
asset. Check the
box Select to add as a layer and click Add select assets as
layers.Land Cover Classes
will be added to the
Layers panel and the landcover pixels for the region will be
streamed in. Let’s rename the layer so we can easily identify it later.
Right-click and select Rename Layer.2021
and press
Enter.Land Cover Classes
asset for the year 2020.2020
.We will now analyze the change in water class between the year 2020 and 2021 to locate all areas where surface water was lost.
2020
layer. Next click the Expression button."2020@1" = 80
water_2020
as the Output layer name and click
Run.water_2020
will be added to the
Layers panel. This is a boolean layer with pixel values 1 for
water and 0 for all other classes.water_2021
as the Output layer name.water_2020
and water_2021
layers. Next click
the Expression button.("water_2020@1" = 1) AND ("water_2021@1" != 1)
waterlost
as the Output layer name and click
Run.waterlost
will be added to the
Layers panel. Everything we have done so far is a dynamic
computation on the remote data layers. Let’s perform the actual
computation and save the results as a local raster. Right-click the
waterlost
layer and go to Export → Save
As….waterlost_export.tif
as the output file name. Expand the
Extent section and click the Map Canvas Extent button
to load the current extent. It is always a good idea to apply a lossless
compression to raster files. Check the Create Options box and
select Low Compression
as the Profile.0
as
From and To. This will only keep the pixels with value
1 as data values in the output. Click OK.waterlost_export
will be added to the
Layers panel.waterlost_export
is a local raster with pixels
showing regions that experienced loss in surface water. Let’s convert
this layer to polygons. From the Processing Toolbox, search and locate
the GDAL → Raster Conversion → Polygonize (raster to
vector) algorithm.waterlost_export
as the Input layer. Keep all
other parameters to their default values. Click … next to
Vectorized and browse to a folder on your computer. Enter the
filename as waterlost_polygons.gpkg
. Click
Run.waterlost_polygons
will be added to QGIS. Using a
cloud-native workflow, we were able to do a landcover change analysis
without downloading any data or incurring any cloud computation
cost.Your output should match the contents of the
cloud_native_geospatial_checkpoint3.qgz
file in your data
package.
In this section, we will learn how to query a STAC catalog of Sentinel-2 L2A data, load and create a RGB composite and compute a spectral index using a cloud-native approach.
Earth Search
from the Connections
dropdown. Click Edit to view the connection details.https://earth-search.aws.element84.com/v1
and click
OK.Sentinel-2 Level-2A
collection. If you do not see this
dataset listed, please go to the previous step and verify the URL is
correct.Blue (band 2)
,
Green (band 3)
, Red (band 4)
and
SWIR 1 (band 11)
. Click Add select assets as layers
(4).Blue (band 2)
, Green (band 3)
,
Red (band 4)
as Input layers and check Place
each input file into a separate band. Click the ...
button next to Virtual and save the output file as
rgb.vrt
. Next click Run.rgb
will be added to the Layers
panel. This layer contains references to the 3 different images. Note
that the order of the bands in alphabetical, so the mapping in the
virtual raster is as follows: Band 1 → Blue (band 2), Band
2 → Green (band 3) and Band 3 → Red (band 4). Open the
Layer Styling Panel and place Band 3
,
Band 2
and Band 1
as Red,
Green and Blue bands. Change the Min and
Max values to 0
and 3000
respectively. You will see the image now appear in natural colors...
button for Input layers.Green (band 3)
and the
SWIR 1 (band 11)
layers and click OK.("Green (band 3) - 10m@1" - "SWIR 1 (band 11) - 10m@1")
/ ("Green (band 3) - 10m@1" + "SWIR 1 (band 11) - 10m@1")
MNDWI
and click
Run.MNDWI
will be added to the Layers
panel. Click the open the Layer Styling Panel button. Change
the renderer to Singleband psuedocolor
. Set the
Min and Max values to 0
and
0.8
respectively. Choose the Blues
as the
Color ramp. You will see a visualization where all detected
water pixels are highlighted in the blue color.MNDWI
layer as the Input layers and
enter the expression shown below. Set the Output layer name to
Water
and click Run."MNDWI@1" > 0
Water
will be added to the
Layers panel. Click the open the Layer Styling Panel
button. Set the Min and Max values to 0
and 1
respectively. You will see all the extracted water
pixels in white. Remember this image is still a virtual image being
computed on-the-fly by streaming in the required data directly from the
cloud-hosted COGs. You can Zoom and Pan the map and the water pixels
will be extracted dynamically. We can save the results as a local
GeoTIFF. Right-click on the Water
layer and select
Export → Save As…...
button next to File name and save
the file as extracted_water.tif
. Expand the Extent
section and click the Map Canvas Extent button. Set both both
Horizontal and Vertical values for Resolution
to be 10
meters. Check the Create options box and
select Low Compression
as the Profile. Click
OK to start the export process. At this stage, QGIS will fetch
the required pixels from the cloud dataset at the required resolution,
perform the computation to calculate MNDWI and extract the water pixels.
Depending on how big is your region, this process may take a few
minutes.extracted_water
will be
added to the Layers panel. You were able visualize a satellite
image and perform a complex calculation to extract water from the image
without first having to download the entire scene. Saving time,
bandwidth and computation resources.Your output should match the contents of the
cloud_native_geospatial_checkpoint4.qgz
file in your data
package.
The easiest way to create data in a cloud-optimized format is to use the GDAL/OGR Command-Line Tools. Please check our course material for Mastering GDAL Tools course for step-by-step instructions for installation and usage.
You can use the gdal_translate
command to take any
raster data and create a Cloud-Optimized GeoTIFF by specifying the
-of COG
option.
gdal_translate -of COG viirs_ntl_2021_global.tif viirs_ntl_2021_global_cog.tif
Similarly, ogr2ogr
can convert any supported vector data
format to a cloud-otpmized data format. The command below converts a
GeoPackage to a FGB format using the -f FlatGeobuf
option.
ogr2ogr -f FlatGeobuf hydrorivers.fgb hydrorivers.gpkg
You can create the output dataset by specifying a subset of columns
using the -select
option.
ogr2ogr -f FlatGeobuf admin2.fgb admin2.gpkg -select "adm2_name,adm1_name,adm0_name,iso_2,iso_3"
The main advantage of cloud-native data formats is that they can be streamed directly from files, without needing specialized servers. This is achieved by making a HTTP-range request from the file. This means the file needs to be accessible by HTTP as static file objects. Most consumer cloud storage servies such as Google Drive, Microsoft OneDrive, Dropbox etc. do NOT allow direct access to file objects and cannot be used to serve files via HTTP. Below are some recommended options for storing files for cloud-native applications.
Cloud Object Stores
GitHub
Any Simple HTTP Server
Use any simple HTTP server that can serve static files.
This workshop material is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to re-use and adapt the material but are required to give appropriate credit to the original author as below:
Cloud Native Geospatial Workflows with QGIS by Ujaval Gandhi www.spatialthoughts.com
© 2024 Spatial Thoughts www.spatialthoughts.com
If you want to report any issues with this page, please comment below.