This class focuses on techniques for the automation of GIS workflows. You will learn techniques that will help you be more productive, create beautiful visualizations and solve complex spatial analysis problems. This class is ideal for participants who already use QGIS and want to take their skills to the next level.
Below are the topics covered in this class
Before we do the exercises, it will help you to learn a bit more about open-source and how the QGIS project operates.
This course requires QGIS LTR version 3.34.
Please review QGIS-LTR Installation Guide for step-by-step instructions.
The examples in this class use a variety of datasets. All the
required layers, project files etc. are supplied to you in the
advanced_qgis.zip
file along with your class materials.
Unzip this file to the Downloads
directory.
Download advanced_qgis.zip.
Note: Certification and Support are only available for participants in our paid instructor-led classes.
The course is accompanied by a set of videos covering the all the modules. These videos are recorded from our live instructor-led classes and are edited to make them easier to consume for self-study. We have 2 versions of the videos:
We have created a YouTube Playlist with separate videos for each section and exercise to enable effective online-learning. Access the YouTube Playlist ↗
We are also making combined full-length video for each module available on Vimeo. These videos can be downloaded for offline learning. Access the Vimeo Playlist ↗
QGIS 2.0 introduced a new concept called Processing Framework. Previously known as Sextante, the Processing Framework provides an environment within QGIS to run native and third-party algorithms for processing data. It is now the recommended way to run any type of data processing and analysis within QGIS - including tasks such as selecting features, altering attributes, saving layers etc. - that can be accomplished by other means. But leveraging the processing framework allows you to be more productive, fast, and less error-prone.
The Processing Framework consists of the following distinct elements that work together.
We will learn about each of these through hands-on exercises in the following sections.
This exercise aims is to show how a multi-step spatial analysis problem can be solved using a purely processing-based workflow. This exercise also shows the richness of available algorithms in QGIS that can do sophisticated operations that previously needed plugins or were more complex.
We will work with roads extracted from OpenStreetMap for the state of Karnataka in India. The admin boundary for the state and the districts come from DataMeet.
We will learn about some useful processing algorithms for basic data processing operations such as Extract, Transform and Load (ETL). Our goal for this section is to read a road layer and a boundary layer from a GeoPackage and calculate what is the total length of the national highways in a state. We will also see how to append the resulting layer to same GeoPackage.
karnataka.gpkg
.
Drag and drop the karnataka
,
karnataka_districts
and karnataka_major_roads
layers to the canvas.karnataka_major_roads
contain all major
roads, including national highways, state highways, major arterial roads
etc. Select the layer and use the keyboard shortcut F6
to open the attribute table.ref
column has information
about road designation. As we are interested in only national highways,
we can use the information in this column to extract relevant road
segments.ref
field starts with the letters
NH
.We are using Regular Expression (or RegEx) to match the field value to a specific pattern. Regular expressions are quite powerful and can be used for many complex data filtering operations. Here’s a good tutorial that explains the basics of regular expressions.
regexp_match("ref", '^NH')
Matching Features
in the
Layers Panel. Next, we want to calculate the length of each segment. You
can use the built-in algorithm Add geometry
attributeslength
will
be added to the Layers panel. The distances in this field are in meters.
Let’s convert them to kilometers. You may reach out to the trusty QGIS
field calculator to add a new field. That’s a perfectly valid way - but
as mentioned earlier, there is a processing way to do things
which is the preferred way. Search and open the Field
Calculator processing algorithm instead and enter the following
expression."length"/1000
length_km
will be added to
the Layers panel. Now we are ready to find out the answer. We just need
to sum up the values in the length_km
field. Use the
Basic Statistics for Fields algorithm.Calculated
as the Input layer and
length_km
as the Field to calculate statistics on.
Click Run.What do you think of the results? The resulting number may not be perfect because the OpenStreetMap database may have missing roads or are classified differently. But it is close to the number provided in the official statistics.
Calculated
and is a temporary
memory layer. Let’s save it to the disk so we can use it later. The
layer contains many fields which are not relevant to us, so let’s delete
some columns before saving them. The classic way to do this is
to toggle editing and use the Delete Column button from the
Attribute Table. If you wanted to rename/reorder certain fields, that
needed a plugin. But now, we have a really easy processing algorithm
called Refactor Fields that can add, delete, rename,
re-order and change the field types all at once. Delete fields that are
not required and save the result as the layer
national_highways
in the source
karnataka.gpkg
.national_highways
will be added to the Layers
panel.We have now finished the first part of this exercise. Your output
should match the contents of the
Processing_Tools_Checkpoint1.qgz
file in the
solutions
folder.
There are many features in the national_highways
layer
where the "name"
attribute is NULL. Find
an appopriate algorithm from the Processing Toolbox to create a new
layer with the features having NULL values removed.
We achieved the goal of the exercise, but we can explore the results a bit better if we can break down the results into a smaller administrative unit. Let’s try to calculate the length of national highways for each district in the state. This will require us to perform a spatial join and analyze the results using group statistics.
karnataka_districts
layer from the
karnataka.gpkg
file. This layer contains the geometry of
each district along with its name in the DISTRICT
field.national_highways
layer and By comparing
to as karnataka_districts
. This will join each feature
in the roads layer to the intersecting feature from the districts layer.
Select only the DISTRICT field to be added to the
output. For the Join type, you have 3 options. Here we want to
do a one-to-one join where the for each road feature, we want to select
one district feature. Select the Take attributes of the feature
with largest overlap only option.In our dataset, the are some road geometries that cross district boundaries - causing part of the length being counted as part of another district. You can instead use the Sun line lengths algorithm from the Processing Toolbox which gives you accurate line lenghts contained in each polygon.
Joined layer
now has the intersecting
district name in the DISTRICT field. We can now sum the
road lengths and group them for each district. You may recall that in
earlier versions of QGIS, you needed a plugin called Group Stats to do
this. But now we can do this via the built-in Statistic by
Categories algorithm. Select the Joined layer
as
the Input vector layer, length_km as the
Field to calculate statistics on. Select the
DISTRICT field as the Field(s) with
Categories.karnataka.gpkg
as a layer named
national_highways_by_districts
. Click Run.length_km
column for each district. The values in
the Sum column is the total length of national highways
in the district.Note that we did data processing, spatial analysis and statistical
analysis - all using just processing algorithms in a fast, reproducible
and intuitive workflow. We have now finish the first part of this
exercise. Your output should match the contents of the
Processing_Tools_Checkpoint2.qgz
file in the
solutions
folder.
Apply the following formatting changes on the
national_highways_by_districts
layer and save it to
karnataka.gpkg
as a new layer
national_highways_by_districts_formatted
.
round()
function.Hint: You can apply an expression on a field in the Refactor Fields algorithm by entering it in the Source Expression column.
So far we have run the algorithm on 1 layer at a time. But each processing algorithm can also be run in a Batch mode on multiple inputs. This provides an easy way to process large amounts of data and automate repetitive tasks.
The batch processing interface can be invoked by right-clicking any processing algorithm and choosing Execute as Batch Process.
We will take multiple country-level data layers and use the batch processing operation to clip them to a state polygon in a single operation.
state boundary
layer, and
click Zoom to Layer.places
layer. Go to the the Processing
menu and select toolbox. Now in Processing
Toolbox, search for clip, under Vector
overlay → Clip right-click and select Execute as Batch
Process...state_boundary
and
places
then click OK. The
places
layer is on the top and it gets selected by
default.We won’t select it manually to avoid repetition.state_boundary
layer.clipped_
. This is just the prefix of the name that you
will auto-complete in the next step. Select GPKG files
(.gpkg) in Save as type. Click
Save.clipped.gpkg
. Click Runclipped.gpkg
geopackage layer will be created
with all the clipped layers in it.Your output should match the contents of the
Batch_Processing_Checkpoint1.qgz
file in the
solutions
folder.
Reprojected all the clipped layers to a UTM projection. You can
reproject all the layers to the CRS WGS84 / UTM zone 43N
EPSG:32643. Save the reprojected layers to the same GeoPackage
clipped.gpkg
GIS Workflows typically involve many steps - with each step generating intermediate output that is used by the next step. If you change the input data or want to tweak a parameter, you will need to run through the entire process again manually. Fortunately, Processing Framework provides a model designer that can help you define your workflow and run it with a single invocation. You can also run these workflows as a batch over a large number of inputs.
National Geospatial-Intelligence Agency’s Maritime Safety Information portal provides a shapefile of all incidents of maritime piracy in the form of Anti-shipping Activity Messages. We can create a density map by aggregating the incident points over a global hexagonal grid.
The steps needed to create a hex-bin layer suitable for visualization is as follows:
We will now learn how to build a model that runs the above processing steps in a single workflow.
piracy_hexbin
as the Name of
the model. Note that there is a History panel in the bottom left, which
keeps track of all the work done in Model Builder. Click the Save
button.piracy_hexbin.model3
. Also, check the
folder in which it is being saved. Processing → models, from
this location QGIS will read the model by default.+ Base Layer
input and the
Reproject layer
algorithm. This connection indicates the
flow of our processing pipeline. The next step is to create a hexagonal
grid. Now search for the Create grid algorithm and drag
it to the canvas.ne_10m_land
and the
Input Points layer is ASAM_events
. The Grid Size
needs to be specified in the units of the selected CRS. Enter
100000
as the Grid Size. Grid size is in the unit of the
current CRS, which is meters. Click Run to start the processing
pipeline. Once the process finishes, click Close.Aggregated
loaded as the
result of the model. As you explore, you will notice the layer contains
an attribute called NUMPOINTS containing the number of piracy
incidents points contained within that grid feature.Your output should match the contents of the
Modeler_Checkpoint1.qgz
file in the solutions
folder.
Add a step to extract the grid with the highest piracy incidents.
Hint: The expression maximum("NUMPOINTS")
will give you
the highest value contained in the "NUMPOINTS"
field. Use
it to build your complete expression.
When you ran your model, you may have noticed a warning message No spatial index exists for the input layer, performance will be severely degraded. This is because certain spatial queries make use of a spatial index and QGIS warns you when having a spatial index can speed up your operations. PostGIS documentation has a good overview of spatial indexes and why they are important.
You can compare a spatial index to a book index. When you want to search for a particular term, rather than scanning each page sequentially, you can speed up your search by looking up the index and directly going to the pages where the word appears. Spatial indexes work in similarly. You spent the effort once to create the index and all subsequent operations can make use of it. When you create a spatial index, each feature’s bounding box is used to establish its relationship with other features. This is stored alongside the dataset and can be used by algorithms. When trying to determine spatial relationships, the algorithms speed-up the look-up using the following two-pass method:
For large datasets, this approach helps reduce the processing time significantly.
QGIS has built-in tools to create and use spatial indexes. Let’s see how we can create a spatial index for a layer and use it in our model.
piracy_hexbin
model from
Processing Toolbox, select ne_10m_land as
Base Layer, 200000 as Grid size and
ASAM_events as Inputs Points. Click Run.piracy_hexbin
model and select Edit Model…. Search Create spatial
index in Algorithms tab and drag it to the model
canvas.Create spatial index
algorithm is connected to the Create grid
algorithm. But
the workflow should be, Create grid → Create spatial index →
Extract from location.Extract by location
layer and select
Edit. In Extract features from select “Indexed
Layer” from algorithm “Create spatial index”.Create Grid
and
Extract by location
layer. Click Save model.piracy_hexbin
model.Your output should match the contents of the
Modeler_Checkpoint2.qgz
file in the solutions
folder.
Can you change the model so that instead of entering the grid size in meters, the user can enter the size in kilometers?
Hint: The Create Grid algorithm expects the size in meters, so you will have to convert the input to meters. Instead of using Model Input, use Pre-calculated Value to enter an expression.
GeoPackage is an open and flexible data format that makes data management simple. We saw how you can package multiple layers in a GeoPackage. We will now learn how to embed QGIS projects inside of a goepackage.
QGIS projects contain information of layer configuration, symbology, label positions and even models. We can make the reproducing your workflow even easier by embedding the QGIS Project in the same geopackage containing the source layers.
martime_piracy.gpkg
geopackage. Click Project → Save To → GeoPackage….martime_piracy
and click
OK.martime_piracy.gpkg
the
project is saved. You now have just 1 GeoPackage file that contains all
your input layers, styles, project and models. Sharing this 1 file will
enable anyone to reproduce your output using the embedded model and
layers.The following assignment is designed to help you practice the skills learnt so far in the course and explore automation using the model designer.
Your task is to create a model called Calculate Highway Statistics that will automate the workflow outlined in the previous section 1. Using Processing Toolbox.
Your model is expected to take following 3 inputs
karnataka_major_roads
.karnataka_districts
The model should take the input and calculate the total length of
highways in each admin polygon. The result should be a formatted table
containing 2 columns: district
and
total_length
. Test the model on the layers provided in the
karnataka.gpkg
file in the data package.
Tips and Best Practices:
ref
column values starting with
NH. If the selected type is State
Highway, you must extract the features with ref
column values starting with SH. You may use either
if
or the CASE
conditionals to build your
expression.Starter Model: The hardest part of the assignment is
to take the Enum input and write an expression to extract the
appropriate highway type. We encourage you to try to figure this out
yourself. If you struggle with this, you can download the assignment_hint.model3
file which has the first step configured. You will need to add the
remaining steps to complete the model. You can open this downloaded file
from the Processing Toolbox Menu → Models → Open Existing
Model.
Time is an important component of many spatial datasets. Along with location information, time providers another dimension for analysis and visualization of data. If you are working with a dataset that contains timestamps or have observations recorded at multiple time-steps, you can easily visualize it using the Temporal Controller in QGIS 3.14 or above.
ne_10_land
for map extent.Export your animation and create an animated GIF showing the changes in piracy hotspots over time.
GPS tracks have become ubiquitous in modern life. With GPS built-into most phones, many of us capture the tracks while running or biking outdoors. Cab companies use GPS tracks collected during the trip to determine fares. Delivery and logistics companies store and analyze millions of GPS tracks from their assets to derive location intelligence.
We will now learn how to collect a GPS track and create an animation from it. We will produce an animation that looks like shown below.
We encourage you to collect your own GPS track for this exercise. You can collect a GPS track from a smartphone and download it to your computer. The default format for storing GPS tracks is GPS Exchange Format (GPX). It is a XML-based text format that allows storing points, tracks and routes in a single file. You can use any of the options below to capture your track.
If you do not have your own data, you can use the
sample_gps_track.gpx
file from your data package.
We will use a plugin called QuickMapServices to select and load a basemap.
Get contributed pack
to
download 3rd-party basemaps.You will see a warning against using contributed services. Some of these services may have restrictions on their usage and/or attribution requirement that you need to follow. Please review them before using them in your project.
sample_gps_track.gpx
file. Drag and
drop it to the canvas.track_points
and tracks
layers. Click Add
Layers.track_points
layer by
un-checking the box next to it. Select the tracks
layer and
click Open the Layer Styling Panel. You can change the line
Color to Blue and Width to
0.2.track_points
layer and
drag it above the tracks
layer. This changes the rendering
order and the points will be drawn on top of the line layer. In the
Layer Styling Panel, select Simple marker symbol.
Change the point Size to 1
. Choose a lighter
shade of Blue as the Fill color and a
Transparent Stroke
as Stroke color.if(@id % 10 = 0, $geometry, NULL)
track_points
layer and choose Duplicate
Layer.track_points copy
will be added to the
Layers panel. Drag it on top of the stack in the
Layers panel. In the Layer Styling Panel for the
duplicated track_points copy
layer, choose bright neon as
the Color from the color picker and increase the size to
3. Check the Draw Effects option and click the
Effects button next to it.2.0
for both Spread and Blur
radius. Click Go Back.track points copy
layer and select
Properties.Single Field with Date/Time
as
the Configuration. Set time as the
Field. Click OK.track_points
layer
and select Properties.Your output should match the contents of the
GPS_Animation_Checkpoint1.qgz
file in the
solutions
folder.
In the previous exercise, we used the Title Label Decorations to display the time of the animation. There is another approach that offers more flexibility in label placement and styling. Let’s add a label to display the timestamp.
label
layer and click Open the Layer Styling Panel button. Switch to
the Labels tab and enter the following expression. GPS
timestamps are in UTC, so we uses the to_interval()
function to adjust the timestamp to the local timezone. Note the use of
the | character which will allow us to split the label
in multiple lines later on.format_date( @map_start_time + to_interval('5 hours 30 mins'), 'dd MMM, yyyy| hh:mm:ss')
label
layer and
click Properties. Switch to the Temporal tab. Check
the Dynamic Temporal Control button and select Redraw
Layer Only. This option merely refreshes the layer on every
frame of the animation.Your output should match the contents of the
GPS_Animation_Checkpoint2.qgz
file in the
solutions
folder.
Recent versions of QGIS include native support for 3D data. Using this feature, you can easily view, explore and animate 3D elevation data. Note that your computer must have a supported graphics card for this feature to work.
We will work with a 5m Digital Elevation Model (DEM) of Denali peak in Alaska and create a 3D visualization of the dataset.
denali_hillshade
and under Layer Rendering choose
Multiply as Blending mode. The selected layer
will be blended with the bottom layer. This is a great way to visualize
your elevation dataset.denali_dem
layer as the Elevation. Click OK.Your output should match the contents of the
3DAnimation_Checkpoint1.qgz
file in the
solutions
folder.
As we did in the previous section, you can use a service such as ezgif.com to create a GIF/Video from these frames.
QGIS expression engine has a powerful function called ‘summary aggregates’ that allows evaluating a feature’s geometry and attributes with those of another layer. Expressions can be used for static calculations as well as on-the-fly computations, such as labels, virtual fields, symbology etc. This enables some powerful use cases.
The summary aggregate function operates on all the values from a different layer, returning a single summary value. The syntax of the aggregate function is as follows
aggregate(
layer:='layer name or id',
aggregate:='aggegate type',
expression:='expression to aggregate',
filter:='optional filter expression,
concatenator:='optional string to use to join values',
order_by:='optional expression to order the features'
)
We will work with a land parcels data layer provided by the City of San Francisco. The goal of this exercise is to demonstrate the use of aggregate expression for on-the-fly computation when digitizing new features.
boundary
layer and click the Open Field
Calculator button.count
with the following
expression. The expression is reading the features from the
parcels
layer and giving an aggregate count of the
features. You will notice that the the result will be displayed at the
bottom of the window.aggregate(
layer:= 'parcels',
aggregate:='count',
expression:=fid
)
polygons
layer and
right-click it. Select Properties.count
and choose Text Edit as the
Widget Type. At the Default Value field at the
bottom, enter the following expression. Note that additional filter
value. Here the $geometry
refers to the geometry of the
parcels
layer and geometry(@parent)
refers to
the geometry of feature from the polygons
layer. Click
OK.aggregate(
layer:= 'parcels',
aggregate:='count',
expression:=fid,
filter:=intersects($geometry, geometry(@parent))
)
count
field.parcels
field. Enter the following expression as the Default
Value.aggregate(
layer:= 'parcels',
aggregate:='concatenate',
concatenator:=',',
expression:=to_string(fid),
filter:=intersects($geometry, geometry(@parent))
)
Your output should match the contents of
the Expressions_Checkpoint1.qgz
file in the
solutions
folder.
Add a new field in the parcels
layer called
neighbors with the count of neighboring polygons. Use
that field to find the parcel having the highest number of
neighbors.
Hint: You can use the aggregate()
function on the same
layer to compare each feature against all other features in the layer.
The @layer
variable contains the name of the current layer,
so you can use the below expression
aggregate(
layer:=@layer,
aggregate:='count',
expression:="fid",
filter:=touches($geometry, geometry(@parent)))
A key feature introduced in QGIS 3.14 was the ability for models to have a conditional branch. This allows models to check for a condition and execute a different part of the model based on the output of the condition. This is equivalent to If/Else statements in a programming language.
Let us develop a Conditional branch in the
piracy_hexbin
model covered in the previous section. We
will add a new check-box input Use Spatial Index - that allows
the user to specify whether to use a spatial index in the model. We will
add a conditional branch that takes the following actions:
You can download the
piracy_hexbin_conditional_branch.model3
file which has the complete model with the conditional branch developed as per instructions below. You can open this downloaded file from the Processing Toolbox Menu → Models → Open Existing Model.
piracy_hexbin
model and select
Edit Model….
6. All the model output and intermediate layers will be displayed in
Bold characters, double click on use_spatial_index. It
will add the variable @use_spatial_index
as the expression.
Click OK.
NOT @use_spatial_index
Click OK.Make sure to hit the Enter key after typing the Branch Name. The input may not be saved if you forget to do so.
Create spatial index
algorithm
in model canvas and click … next to Dependencies.@use_spatial_index
. We can use the
if
condition to return a layer based on user input. Use the
expression below. Remember to pick the variables from the list and
double-click them to enter them in your expression. If you had named the
algorithms or input differently, the variable names will be different
for you. Click OK.if( @use_spatial_index ,
@Extract_by_location__Spatial_Index__OUTPUT ,
@Extract_by_location__No_Spatial_Index__OUTPUT)
Extract by location (Spatial Index)
and
Extract by location (No Spatial Index)
and click
OK.To take your processing experience to the next-level, you can use the built-in Locator Bar. At the bottom-left of QGIS main window, there is a universal search bar that can do keyword-search across layers, settings, processing algorithms and more. You can open the locator bar using the keyboard shortcut Ctrl+K.
I find that rather than clicking-around the processing toolbox, you can just use the locator bar to search and open the algorithms. Type Ctrl+K, followed by a (to restrict search to algorithms), followed by a space and a few characters. Use the arrow keys to select and press Enter to open the algorithm.
Processing algorithms are designed to take inputs and produce outputs. The default behavior is to create a new layer after each operation. This is useful for many workflows, especially in an enterprise setting, where you may not have the ability to edit the source data. If your algorithms are altering the source data, that also means that the workflows cannot be reproduced easily. So you would want a setup where the algorithms read from source data and create modified outputs.
An exception to this workflow is when you are doing data editing. When your workflow involves creating new features or editing them - creating a new layer for every edit is undesirable. A recent QGIS crowd-funding campaign added the ability for processing algorithms to modify the features in-place and this functionality is available out-of-the-box in QGIS now.
This course material is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to re-use and adapt the material but are required to give appropriate credit to the original author as below:
Advanced QGIS Course by Ujaval Gandhi www.spatialthoughts.com
This course is offered as an instructor-led online class. Visit Spatial Thoughts to know details of upcoming sessions.
© 2021 Spatial Thoughts www.spatialthoughts.com
If you want to report any issues with this page, please comment below.