This class focuses on techniques for the automation of GIS workflows. You will learn techniques that will help you be more productive, create beautiful visualizations and solve complex spatial analysis problems. This class is ideal for participants who already use QGIS and want to take their skills to the next level.
Below are the topics covered in this class
Before we do the exercises, it will help you to learn a bit more about open-source and how the QGIS project operates.
This course requires QGIS LTR version 3.22.
Please review QGIS-LTR Installation Guide for step-by-step instructions.
The examples in this class use a variety of datasets. All the required layers, project files etc. are supplied to you in the advanced_qgis.zip
file along with your class materials. Unzip this file to the Downloads
directory.
Not enrolled in our instructor-led class but want to work through the material on your own? Get free access to the data package
QGIS 2.0 introduced a new concept called Processing Framework. Previously known as Sextante, the Processing Framework provides an environment within QGIS to run native and third-party algorithms for processing data. It is now the recommended way to run any type of data processing and analysis within QGIS - including tasks such as selecting features, altering attributes, saving layers etc. - that can be accomplished by other means. But leveraging the processing framework allows you to be more productive, fast, and less error-prone.
The Processing Framework consists of the following distinct elements that work together.
We will learn about each of these through hands-on exercises in the following sections.
This exercise aims is to show how a multi-step spatial analysis problem can be solved using a purely processing-based workflow. This exercise also shows the richness of available algorithms in QGIS that can do sophisticated operations that previously needed plugins or were more complex.
We will work with roads extracted from OpenStreetMap for the state of Karnataka in India. The admin boundary for the state and the districts come from DataMeet.
We will learn about some useful processing algorithms for basic data processing operations such as Extract, Transform and Load (ETL). Our goal for this section is to read a road layer and a boundary layer from a GeoPackage and calculate what is the total length of the national highways in a state. We will also see how to append the resulting layer to same GeoPackage.
karnataka.gpkg
. Drag and drop the karnataka
, karnataka_districts
and karnataka_major_roads
layers to the canvas.karnataka_major_roads
contain all major roads, including national highways, state highways, major arterial roads etc. Select the layer and use the keyboard shortcut F6 to open the attribute table.ref
column has information about road designation. As we are interested in only national highways, we can use the information in this column to extract relevant road segments.ref
field starts with the letters NH
.We are using Regular Expression (or RegEx) to match the field value to a specific pattern. Regular expressions are quite powerful and can be used for many complex data filtering operations. Here’s a good tutorial that explains the basics of regular expressions.
regexp_match("ref", '^NH')
Matching Features
in the Layers Panel. Next, we want to calculate the length of each segment. You can use the built-in algorithm Add geometry attributeslength
will be added to the Layers panel. The distances in this field are in meters. Let’s convert them to kilometers. You may reach out to the trusty QGIS field calculator to add a new field. That’s a perfectly valid way - but as mentioned earlier, there is a processing way to do things which is the preferred way. Search and open the Field Calculator processing algorithm instead and enter the following expression."length"/1000
length_km
will be added to the Layers panel. Now we are ready to find out the answer. We just need to sum up the values in the length_km
field. Use the Basic Statistics for Fields algorithm.Calculated
as the Input layer and length_km
as the Field to calculate statistics on. Click Run.What do you think of the results? The resulting number may not be perfect because the OpenStreetMap database may have missing roads or are classified differently. But it is close to the number provided in the official statistics.
Calculated
and is a temporary memory layer. Let’s save it to the disk so we can use it later. The layer contains many fields which are not relevant to us, so let’s delete some columns before saving them. The classic way to do this is to toggle editing and use the Delete Column button from the Attribute Table. If you wanted to rename/reorder certain fields, that needed a plugin. But now, we have a really easy processing algorithm called Refactor Fields that can add, delete, rename, re-order and change the field types all at once. Delete fields that are not required and save the result as the layer national_highways
in the source karnataka.gpkg
.national_highways
will be added to the Layers panel.We have now finish the first part of this exercise. Your output should match the contents of the Processing_Tools_Checkpoint1.qgz
file in the solutions
folder.
There are many features in the national_highways
layer where the "name"
attribute is NULL. Find an appopriate algorithm from the Processing Toolbox to create a new layer with the features having NULL values removed.
We achieved the goal of the exercise, but we can explore the results a bit better if we can break down the results into a smaller administrative unit. Let’s try to calculate the length of national highways for each district in the state. This will require us to perform a spatial join and analyze the results using group statistics.
karnataka_districts
layer from the karnataka.gpkg
file. This layer contains the geometry of each district along with its name in the DISTRICT field.national_highways
as the input layer and do a one-to-one join with the karnataka_districts
layer. Select only the DISTRICT field to be added to the output.Joined layer
now has the intersecting district name in the DISTRICT field. We can now sum the road lengths and group them for each district. You may recall that in earlier versions of QGIS, you needed a plugin called Group Stats to do this. But now we can do this via the built-in Statistic by Categories algorithm. Select the Joined layer
as the Input vector layer, length_km as the Field to calculate statistics on. Select the DISTRICT field as the Field(s) with Categories.karnataka.gpkg
as a layer named national_highways_by_districts
. Click Run.length_km
column for each district. The values in the Sum column is the total length of national highways in the district.Note that we did data processing, spatial analysis and statistical analysis - all using just processing algorithms in a fast, reproducible and intuitive workflow. We have now finish the first part of this exercise. Your output should match the contents of the Processing_Tools_Checkpoint2.qgz
file in the solutions
folder.
Apply the following formatting changes on the national_highways_by_districts
layer and save it to karnataka.gpkg
as a new layer national_highways_by_districts_formatted
.
round()
function.Hint: You can apply an expression on a field in the Refactor Fields algorithm by entering it in the Source Expression column.
So far we have run the algorithm on 1 layer at a time. But each processing algorithm can also be run in a Batch mode on multiple inputs. This provides an easy way to process large amounts of data and automate repetitive tasks.
The batch processing interface can be invoked by right-clicking any processing algorithm and choosing Execute as Batch Process.
We will take multiple country-level data layers and use the batch processing operation to clip them to a state polygon in a single operation.
state boundary
layer, and click Zoom to Layer.state_boundary
and click OK.state_boundary
layer.clipped_
. This is just the prefix of the name that you will auto-complete in the next step. Select GPKG files (.gpkg) in Save as type. Click Save.clipped.gpkg
. Click Runclipped.gpkg
geopackage layer will be created with all the clipped layers in it.Your output should match the contents of the Batch_Processing_Checkpoint1.qgz
file in the solutions
folder.
Reprojected all the clipped layers to a UTM projection. You can reproject all the layers to the CRS WGS84 / UTM zone 43N EPSG:32643. Save the reprojected layers to the same GeoPackage clipped.gpkg
GIS Workflows typically involve many steps - with each step generating intermediate output that is used by the next step. If you change the input data or want to tweak a parameter, you will need to run through the entire process again manually. Fortunately, Processing Framework provides a graphical modeler that can help you define your workflow and run it with a single invocation. You can also run these workflows as a batch over a large number of inputs.
National Geospatial-Intelligence Agency’s Maritime Safety Information portal provides a shapefile of all incidents of maritime piracy in the form of Anti-shipping Activity Messages. We can create a density map by aggregating the incident points over a global hexagonal grid.
The steps needed to create a hex-bin layer suitable for visualization is as follows:
We will now learn how to build a model that runs the above processing steps in a single workflow.
piracy_hexbin
as the Name of the model and projects
as the Groups. Note that there is a History panel in the bottom left, which keeps track of all the work done in Model Builder. Click the Save button.piracy_hexbin.model3
. Also, check the folder in which it is being saved. Processing → models, from this location QGIS will read the model by default.+ Base Layer
input and the Reproject layer
algorithm. This connection indicates the flow of our processing pipeline. The next step is to create a hexagonal grid. Now search for the Create grid algorithm and drag it to the canvas.ne_10m_land
and the Input Points layer is ASAM_events
. The Grid Size needs to be specified in the units of the selected CRS. Enter 100000
as the Grid Size. Grid size is in the unit of the current CRS, which is meters. Click Run to start the processing pipeline. Once the process finishes, click Close.Aggregated
loaded as the result of the model. As you explore, you will notice the layer contains an attribute called NUMPOINTS containing the number of piracy incidents points contained within that grid feature.Your output should match the contents of the Modeler_Checkpoint1.qgz
file in the solutions
folder.
Add a step to extract the grid with the highest piracy incidents.
Hint: The expression maximum("NUMPOINTS")
will give you the highest value contained in the "NUMPOINTS"
field. Use it to build your complete expression.
When you ran your model, you may have noticed a warning message No spatial index exists for the input layer, performance will be severely degraded. This is because certain spatial queries make use of a spatial index and QGIS warns you when having a spatial index can speed up your operations. PostGIS documentation has a good overview of spatial indexes and why they are important.
You can compare a spatial index to a book index. When you want to search for a particular term, rather than scanning each page sequentially, you can speed up your search by looking up the index and directly going to the pages where the word appears. Spatial indexes work in similarly. You spent the effort once to create the index and all subsequent operations can make use of it. When you create a spatial index, each feature’s bounding box is used to establish its relationship with other features. This is stored alongside the dataset and can be used by algorithms. When trying to determine spatial relationships, the algorithms speed-up the look-up using the following two-pass method:
For large datasets, this approach helps reduce the processing time significantly.
QGIS has built-in tools to create and use spatial indexes. Let’s see how we can create a spatial index for a layer and use it in our model.
piracy_hexbin
model from Processing Toolbox, select ne_10m_land as Base Layer, 200000 as Grid size and ASAM_events as Inputs Points. Click Run.piracy_hexbin
model and select Edit Model…. Search Create spatial index in Algorithms tab and drag it to the model canvas.Create spatial index
algorithm is connected to the Create grid
algorithm. But the workflow should be, Create grid → Create spatial index → Extract from location.Extract by location
layer and select Edit. In Extract features from select “Indexed Layer” from algorithm “Create spatial index”.Create Grid
and Extract by location
layer. Click Save model.piracy_hexbin
model.Your output should match the contents of the Modeler_Checkpoint2.qgz
file in the solutions
folder.
Can you change the model so that instead of entering the grid size in meters, the user can enter the size in kilometers?
Hint: The Create Grid algorithm expects the size in meters, so you will have to convert the input to meters. Instead of using Model Input, use Pre-calculated Value to enter an expression.
GeoPackage is an open and flexible data format that makes data management simple. We saw how you can package multiple layers in a GeoPackage. We will now learn how to embed QGIS projects inside of a goepackage.
QGIS projects contain information of layer configuration, symbology, label positions and even models. We can make the reproducing your workflow even easier by embedding the QGIS Project in the same geopackage containing the source layers.
martime_piracy.gpkg
geopackage. Click Project → Save To → GeoPackage….martime_piracy
and click OK.martime_piracy.gpkg
the project is saved. You now have just 1 GeoPackage file that contains all your input layers, styles, project and models. Sharing this 1 file will enable anyone to reproduce your output using the embedded model and layers.Your can load the Modeler_Checkpoint3.qgz
project contained in the maritime_piracy.gpkg
in the solutions
folder. (check the martime_piracy.gpkg
)
Time is an important component of many spatial datasets. Along with location information, time providers another dimension for analysis and visualization of data. If you are working with a dataset that contains timestamps or have observations recorded at multiple time-steps, you can easily visualize it using the Temporal Controller in QGIS 3.14 or above.
ne_10_land
for map extent.Your output should match the contents of the 2DAnimation_Checkpoint1.qgz
file in the solutions
folder.
Recent versions of QGIS include native support for 3D data. Using this feature, you can easily view, explore and animate 3D elevation data. Note that your computer must have a supported graphics card for this feature to work.
We will work with a 5m Digital Elevation Model (DEM) of Denali peak in Alaska and create a 3D visualization of the dataset.
denali_hillshade
and under Layer Rendering choose Multiply as Blending mode. The selected layer will be blended with the bottom layer. This is a great way to visualize your elevation dataset.denali_dem
layer as the Elevation. Click OK.Your output should match the contents of the 3DAnimation_Checkpoint1.qgz
file in the solutions
folder.
As we did in the previous section, you can use a service such as ezgif.com to create a GIF/Video from these frames.
QGIS expression engine has a powerful function called ‘summary aggregates’ that allows evaluating a feature’s geometry and attributes with those of another layer. Expressions can be used for static calculations as well as on-the-fly computations, such as labels, virtual fields, symbology etc. This enables some powerful use cases.
The summary aggregate function operates on all the values from a different layer, returning a single summary value. The syntax of the aggregate function is as follows
aggregate(
layer:='layer name or id',
aggregate:='aggegate type',
expression:='expression to aggregate',
filter:='optional filter expression,
concatenator:='optional string to use to join values',
order_by:='optional expression to order the features'
)
We will work with a land parcels data layer provided by the City of San Francisco. The goal of this exercise is to demonstrate the use of aggregate expression for on-the-fly computation when digitizing new features.
boundary
layer and click the Open Field Calculator button.count
with the following expression. The expression is reading the features from the parcels
layer and giving an aggregate count of the features. You will notice that the the result will be displayed at the bottom of the window.aggregate(
layer:= 'parcels',
aggregate:='count',
expression:=fid
)
polygons
layer and right-click it. Select Properties.count
and choose Text Edit as the Widget Type. At the Default Value field at the bottom, enter the following expression. Note that additional filter value. Here the $geometry
refers to the geometry of the parcels
layer and geometry(@parent)
refers to the geometry of feature from the polygons
layer. Click OK.aggregate(
layer:= 'parcels',
aggregate:='count',
expression:=fid,
filter:=intersects($geometry, geometry(@parent))
)
count
field.parcels
field. Enter the following expression as the Default Value.aggregate(
layer:= 'parcels',
aggregate:='concatenate',
concatenator:=',',
expression:=to_string(fid),
filter:=intersects($geometry, geometry(@parent))
)
Your output should match the contents of the
Expressions_Checkpoint1.qgz
file in the solutions
folder.
Add a new field in the parcels
layer called neighbors with the count of neighboring polygons.
Hint: You can use the aggregate()
function on the same layer to compare each feature against all other features in the layer. The @layer
variable contains the name of the current layer, so you can use the below expression
aggregate(
layer:=@layer,
aggregate:='count',
expression:="fid",
filter:=touches($geometry, geometry(@parent)))
A key feature introduced in QGIS 3.14 was the ability for models to have a conditional branch. This allows models to check for a condition and execute a different part of the model based on the output of the condition. This is equivalent to If/Else statements in a programming language.
Let us develop a Conditional branch in the piracy_hexbin
model covered in the previous section. We will add a new check-box input Use Spatial Index - that allows the user to specify whether to use a spatial index in the model. We will add a conditional branch that takes the following actions:
piracy_hexbin
model and select Edit Model…. 6. All the model output and intermediate layers will be displayed in Bold characters, double click in usespatialindex. It will add the variable
@usespatialindex
as the expression. Click OK.
NOT @usespatialindex
Click OK.Make sure to hit the Enter key after typing the Branch Name. The input may not be saved if you forget to do so.
Create spatial index
algorithm in model canvas and click … next to Dependencies.@usespatialindex
. We can use the if
condition to return a layer based on user input. Use the expression below. Remember to pick the variables from the list and double-click them to enter them in your expression. If you had named the algorithms or input differently, the variable names will be different for you. Click OK.if( @usespatialindex ,
@Extract_by_location__Spatial_Index__OUTPUT ,
@Extract_by_location__No_Spatial_Index__OUTPUT)
Extract by location (Spatial Index)
and Extract by location (No Spatial Index)
and click OK.To take your processing experience to the next-level, you can use the built-in Locator Bar. At the bottom-left of QGIS main window, there is a universal search bar that can do keyword-search across layers, settings, processing algorithms and more. You can open the locator bar using the keyboard shortcut Ctrl+K.
I find that rather than clicking-around the processing toolbox, you can just use the locator bar to search and open the algorithms. Type Ctrl+K, followed by a (to restrict search to algorithms), followed by a space and a few characters. Use the arrow keys to select and press Enter to open the algorithm.
Processing algorithms are designed to take inputs and produce outputs. The default behavior is to create a new layer after each operation. This is useful for many workflows, especially in an enterprise setting, where you may not have the ability to edit the source data. If your algorithms are altering the source data, that also means that the workflows cannot be reproduced easily. So you would want a setup where the algorithms read from source data and create modified outputs.
An exception to this workflow is when you are doing data editing. When your workflow involves creating new features or editing them - creating a new layer for every edit is undesirable. A recent QGIS crowd-funding campaign added the ability for processing algorithms to modify the features in-place and this functionality is available out-of-the-box in QGIS now.
This course material is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to re-use and adapt the material but are required to give appropriate credit to the original author as below:
Advanced QGIS Course by Ujaval Gandhi www.spatialthoughts.com
This course is offered as an instructor-led online class. Visit Spatial Thoughts to know details of upcoming sessions.
© 2021 Spatial Thoughts www.spatialthoughts.com