Note: This workshop is no longer being maintained. You can check out our Advanced QGIS course which covers the same topics in much more depth.
This workshop focuses on techniques for automation of GIS workflows and covers the Processing Toolbox in detail. Below are the topics covered in this workshop
The code examples in this workshop use a variety of datasets. All the
required layers, project files etc. are supplied to you in the zip file
automating_gis_workflows.zip
. Unzip this file to the
Downloads
directory.
Download automating_gis_workflows.zip
QGIS 2.0 introduced a new concept called Processing Framework. Previously known as Sextante, the Processing Framework provides an environment within QGIS to run native and third-party algorithms for processing data. It is now the recommended way to run any type of data processing and analysis within QGIS - including tasks such as selecting features, altering attributes, saving layers etc. - that can be accomplished by other means. But leveraging the processing framework allows you to be more productive, fast, and less error prone.
The Processing Framework consists of the following distinct elements that work together.
Processing Toolbox is available from the top-level menu Processing → Toolbox. There are hundreds of algorithms available out of the box. They are organized by Providers. The tools created by QGIS developers is available as a Native QGIS provider. Processing Framework providers an easy way to integrate tools written by other software and libraries such as GDAL, GRASS and SAGA. QGIS Plugins can also add new functionality via processing algorithms in the toolbox.
The aim of this exercise is to show how a multi-step spatial analysis problem can be solved using a purely processing-based workflow. This exercise also shows the richness of available algorithms in QGIS that are able to do sophisticated operations that previously needed plugins or were more complex.
We will work with shapefiles provided by US Census Bureau. The
tl_2019_us_primaryroads.zip
file comes TIGER/Line roads
database and contains all the major roads in the US - including
Interstate highways. The tl_2019_us_state.zip
file comes
from the Cartographic Boundary Files and contains the state
boundaries.
tl_2019_us_primaryroads.zip
and
tl_2019_us_state.zip
files. Drag and drop the
tl_2019_us_primaryroads.shp
and
tl_2019_us_state.shp
layers to the canvas.tl_2019_us_primaryroads
contain all major
roads, including interstate highways, state highways, US highways etc.
Select the layer and use the keyboard shortcut F6 to
open the attribute table. You will notice that the RTTYP
column has information about road designation. As we are interested in
only interstate highways, we can use the information in this column to
extract relevant road segments.Note: The roads layer has 2 line segments per road, representing the route in both directions.
tl_2019_us_primaryroads
as the Input
layer. Select RTTYP
as the Selection
attribute and I
as the Value. This will
extract all feature where RTTYP value is I (Interstate). Click
Run.Extracted (attribute)
in the
Layers Panel. Next, we want to calculate the length of each segment. You
can use the built-in algorithm Add geometry
attributesExtracted (attribute)
as the
Input layer, select Ellipsoidal
as Calculate
using and click Run.length
will
be added to the Layers panel. The distances in this field are in
meters.tl_2019_us_state
layer, but not in the roads layer. To add the name of the state to the
roads layer, we need to perform a spatial-join. This is done using the
Join attributes by Location algorithm.Added geom info
as the Input layer
and tl_2019_us_state
as the Join layer. For the
Fields to add, we need only the NAME
of the
state.Take attribute of the first located feature only
and click
Run to perform a one-to-one join.If our input road layers had segments that crossed state lines, we would have to do an extra step of splitting them so when we count road lengths per state - we get accurate results. You may do this by converting the states layer to lines using Polygons to lines algorithm, followed by the Split by Lines to split the road segments at the boundary.
Joined layer
now has the state name for
every road segment.Joined layer
as the Input vector layer.
We want to calculate lengths, but group them by states. So select
length
as the Field to calculate statistics on and
NAME
as the Field(s) with categories. Click
Run.length
column for each state. The values in the
sum column is the total length of national highways in
the district.NAME
and
sum
fields.sum
field is in meters. We
can convert it to miles here itself. Click the button next to
sum
in the Source experssion column and enter the
following expression that converts meters to miles and rounds the value
to the nearest integer. Click OK.round(sum*0.000621371)
NAME
to State
and
sum
to Length_Miles
. We are ready to apply the
changes. So far we have created temporary output layers, but this is the
final result of the analysis, so we can save it to the disk. Click the
...
button at the bottom and select
Save to GeoPackage
.roads.gpkg
. When prompted
for a layer name, enter road_length_by_state
. Click
Run.road_length_by_state
will be added to the
Layers panel. The table will have the fields renamed and
re-calculated as we had specified.Note: The refactor fields algorithm has a bug for the Mac version that makes all fields as strings regardless of the user’s choice. This is a known issue and will be fixed in the future version. Till then, if you are on Mac, a workaround is to use the Field Calculator algorithm to take the result and add fields with correct types and conversion expressions such as to_int(), to_double() etc.
Note that we did data processing, spatial analysis and statistical analysis - all using just processing algorithms in a fast, re-producible and intuitive workflow.
The Processing Toolbox contains hundreds of algorithms - with new ones being added everyday. Many plugins also add processing algorithms that support new functionality. While you may think that processing algorithms are meant for analysis - there are plenty of algorithms that offer functionality that is beyond geoprocessing. Here are a few algorithms that I find very useful to automate my workflows that you may not be aware of.
Download file
: Download file from a URL or FTP site.
Allows you to automate tasks where you need to download new data
regularly and process it.Import geotagged photos
: Extracts latitude, longitude
and azimuth information from a directory of photos and creates a point
layer.Add autoincremental field
: Simple but very useful
algorithm to add a unique integer field. Many databases (even
geopackage) requires that each layer has a unique integer field. This
algorithm helps creating this field if your source layer doesn’t have
one. CAD or CSV layers frequently have this problem.Create spatial index
: Spatial indexes speed up your
geoprocessing operation a lot. This algorithm can create spatial indices
on many types of layers, including those loaded from a database.ORS Tools Algorithms
: OpenRouteService (ORS) provides a
QGIS plugin which installs a slew of network analysis algorithms to the
Processing Toolbox. They allow rich network analysis functionality using
OpenStreetMap data and it works without having to download any
data.A fun and interesting algorithm is called
Topological coloring
. This algorithm implements helps with cartography by allowing you assign a color_id to your polygon layers such that no adjacent polygons have the same color.
As we saw in the previous example, GIS Workflows typically involve many steps - with each step generating intermediate output that is used by the next step. If you change the input data or want to tweak a parameter, you will need to run through the entire process again manually. Fortunately, Processing Framework provides a graphical modeler that can help you define your workflow and run it with a single invocation. You can also run these workflows as a batch over a large number of inputs.
We will take the workflow from the previous section and build a model that can precisely reproduce all the intermediate steps and give us the result. The model will allow us to specify the input layers and parameters and perform all intermediate steps without any user input. This will greatly speed up our analysis and ensure we do not make manual errors when running the same calculations again.
calculate_road_lengths
and click the Save button. Save the model as
calculate_road_lengths.model3
file at the default
location.Vector Layer
input and
drag it to the canvas.Roads
as the Parameter name and set
Line
as the Geometry Type. Click OK.Vector Layer
input.
States
as the Parameter name and set
Polygon
as the Geometry Type. Click
OK.Extract by attribute
algorithm. Drag it to the canvas.Roads
as the Input layer, RTTYP
as
the Selection attribute and I
as the
Value. Click OK.Add geometry attributes
algorithm.'Extracted (attribute)' from algorithm 'Extract by Attribute'
as the Input layer. Select Ellipsoidal
as the
method for Calculate using. Click OK.Join attribute by location
algorithm.'Added geom info' from algorithm 'Add geometry attributes'
as the Input layer. The Joim layer will be the
States
layer. In the Fields to add, enter
NAME
. Click OK.States
input to the
Join attributes by location
box. Next, add the
Statistics by category
algorithm.'Join layer' from algorithm 'Join attribues by location'
as
the Input vector layer. Enter length
as the
Field to calculate statistics on and NAME
as the
Field(s) with categories. Click OK.Refactor fields
algorithm.'Statistics by category' from algorithm 'Statistics by categories'
as the Input layer. The Field mapping table is empty.
We need to add rows for the fields that we want in the output. Click
Add new field twice to add 2 rows. Configure the rows to match
our input in Step 19 from the previous exercise. As this is the final
result, we should specify the name of the output layer. Enter
road_length_by_states
as Refactored. Click
OK.tl_2019_us_primaryroads
as Roads and
tl_2019_us_states
as States. Click
Run.Road Category
as the Parameter name.If you closed the modeler window, you can open it again by going to Processing → Toolbox → Models. Locate the
calculate_road_lengths
model, right-click and select Edit.
I
for the road
category, we will make it user configurable. Right-click the
Extract by attribute
box and select Edit.Model Input
.Road Category
will be auto-selected for
the value. This update to our model will allow the user to enter any
value as the Road Category which will be used by this
algorithm.U
and select the other vector layer inputs. Click Run. The result
will be a table with lengths of US Highways for each state.So far we have run the algorithm on 1 layer at a time. But each processing algorithm can also be run in a Batch mode on multiple inputs. This provides an easy way to process large amounts of data and automate repetitive tasks.
The batch processing interface can be invoked by right-clicking any processing algorithm and choosing Execute as Batch Process.
We will take multiple country-level data layers and use the batch processing operation to clip them to a state polygon in a single operation.
Open the batch_processing project from the data
package. The project contains 5 layers in total. The
tl_2019_us_state
represents individual states. The other
layers tl_2019_us_places
, tl_2019_us_mil
,
tl_2019_us_rails
and tl_2019_us_primaryroads
are line and polygon layers which need to be clipped.
tl_2019_us_state
layer and use the
Select Features tool to select a state by clicking
it.tl_2019_us_state
as the Input
layer and click Run. This will create a new layer with the
selected state polygon called Selected features
tl_2019_us_places
,
tl_2019_us_mil
, tl_2019_us_rails
and
tl_2019_us_primaryroads
layers and click OK. 3 new
rows will be automatically added to accommodate all 4 inputs.Selected features
layer as
the Overlay layer.clipped_
. When prompted, choose Fill
with parameter values as the autofill mode, and Input
layer as the Parameter to use.clipped.gpkg
.We saw how we can save multiple layers to a single geopackage file. This makes data sharing easy between computers and users. Your spatial analysis project contains a lot more information than the dataset. There are layer styles, settings, variables, processing models etc. that are saved in different places. Modern versions of QGIS have the ability to store all relevant information into a single geopackage. This means you can document your entire project in a single geopackage.
We will take the source layers and result of the spatial analysis exercise of finding length of interstate roads and package them up in a GeoPackage.
tl_2019_us_primaryroads
,
tl_2019_us_state
and road_lengths_by_state
layers in QGIS. Run the Package Layers processing
algorithm.results.gpkg
. Click
Run.results.gpkg
back in QGIS. Now that all loaded layers are
from a single geopackage, we can save the project. Go to Project
→ Save To → GeoPackage.results.gpkg
file. Click OK. Enter
results
as the Project name. Click
OK.results.gpkg
file. Click the
Refresh button in the Browser panel and expand the
results.gpkg
file. You will see the results
project.calculate_road_lenghts
model under Models. Right-click and select Edit
Model.GeoPackage is a flexible and versatile format that is adopted across QGIS. You saw how you can package all relevant information about your project inside a single file. Following these practices of saving your models along with source data ensures no information is lost. It also ensures you have well documented workflows that are reproducible by you in the future or by anyone with whom you have shared your data.
This workshop material is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to re-use and adapt the material but are required to give appropriate credit to the original author as below:
Automating GIS Workflows with QGIS by Ujaval Gandhi www.spatialthoughts.com
If you want to report any issues with this page, please comment below.