This workshop focuses on techniques for automation of GIS workflows and covers the Processing Toolbox in detail. Below are the topics covered in this workshop
The code examples in this workshop use a variety of datasets. All the
required layers, project files etc. are supplied to you in the zip file
automating_gis_workflows.zip. Unzip this file to the
QGIS 2.0 introduced a new concept called Processing Framework. Previously known as Sextante, the Processing Framework provides an environment within QGIS to run native and third-party algorithms for processing data. It is now the recommended way to run any type of data processing and analysis within QGIS - including tasks such as selecting features, altering attributes, saving layers etc. - that can be accomplished by other means. But leveraging the processing framework allows you to be more productive, fast, and less error prone.
The Processing Framework consists of the following distinct elements that work together.
Processing Toolbox is available from the top-level menu Processing → Toolbox. There are hundreds of algorithms available out of the box. They are organized by Providers. The tools created by QGIS developers is available as a Native QGIS provider. Processing Framework providers an easy way to integrate tools written by other software and libraries such as GDAL, GRASS and SAGA. QGIS Plugins can also add new functionality via processing algorithms in the toolbox.
The aim of this exercise is to show how a multi-step spatial analysis problem can be solved using a purely processing-based workflow. This exercise also shows the richness of available algorithms in QGIS that are able to do sophisticated operations that previously needed plugins or were more complex.
We will work with shapefiles provided by US Census Bureau. The
tl_2019_us_primaryroads.zip file comes TIGER/Line roads
database and contains all the major roads in the US - including
Interstate highways. The
tl_2019_us_state.zip file comes
from the Cartographic Boundary Files and contains the state
tl_2019_us_state.zip files. Drag and drop the
tl_2019_us_state.shp layers to the canvas.
tl_2019_us_primaryroads contain all major
roads, including interstate highways, state highways, US highways etc.
Select the layer and use the keyboard shortcut F6 to
open the attribute table. You will notice that the
column has information about road designation. As we are interested in
only interstate highways, we can use the information in this column to
extract relevant road segments.
Note: The roads layer has 2 line segments per road, representing the route in both directions.
tl_2019_us_primaryroads as the Input
RTTYP as the Selection
I as the Value. This will
extract all feature where RTTYP value is I (Interstate). Click
Extracted (attribute) in the
Layers Panel. Next, we want to calculate the length of each segment. You
can use the built-in algorithm Add geometry
Extracted (attribute) as the
Input layer, select
Ellipsoidal as Calculate
using and click Run.
be added to the Layers panel. The distances in this field are in
layer, but not in the roads layer. To add the name of the state to the
roads layer, we need to perform a spatial-join. This is done using the
Join attributes by Location algorithm.
Added geom info as the Input layer
tl_2019_us_state as the Join layer. For the
Fields to add, we need only the
NAME of the
Take attribute of the first located feature only and click
Run to perform a one-to-one join.
If our input road layers had segments that crossed state lines, we would have to do an extra step of splitting them so when we count road lengths per state - we get accurate results. You may do this by converting the states layer to lines using Polygons to lines algorithm, followed by the Split by Lines to split the road segments at the boundary.
Joined layer now has the state name for
every road segment.
Joined layer as the Input vector layer.
We want to calculate lengths, but group them by states. So select
length as the Field to calculate statistics on and
NAME as the Field(s) with categories. Click
length column for each state. The values in the
sum column is the total length of national highways in
sum field is in meters. We
can convert it to miles here itself. Click the button next to
sum in the Source experssion column and enter the
following expression that converts meters to miles and rounds the value
to the nearest integer. Click OK.
Length_Miles. We are ready to apply the
changes. So far we have created temporary output layers, but this is the
final result of the analysis, so we can save it to the disk. Click the
... button at the bottom and select
Save to GeoPackage.
roads.gpkg. When prompted
for a layer name, enter
road_length_by_state will be added to the
Layers panel. The table will have the fields renamed and
re-calculated as we had specified.
Note: The refactor fields algorithm has a bug for the Mac version that makes all fields as strings regardless of the user’s choice. This is a known issue and will be fixed in the future version. Till then, if you are on Mac, a workaround is to use the Field Calculator algorithm to take the result and add fields with correct types and conversion expressions such as to_int(), to_double() etc.
Note that we did data processing, spatial analysis and statistical analysis - all using just processing algorithms in a fast, re-producible and intuitive workflow.
The Processing Toolbox contains hundreds of algorithms - with new ones being added everyday. Many plugins also add processing algorithms that support new functionality. While you may think that processing algorithms are meant for analysis - there are plenty of algorithms that offer functionality that is beyond geoprocessing. Here are a few algorithms that I find very useful to automate my workflows that you may not be aware of.
Download file: Download file from a URL or FTP site.
Allows you to automate tasks where you need to download new data
regularly and process it.
Import geotagged photos: Extracts latitude, longitude
and azimuth information from a directory of photos and creates a point
Add autoincremental field: Simple but very useful
algorithm to add a unique integer field. Many databases (even
geopackage) requires that each layer has a unique integer field. This
algorithm helps creating this field if your source layer doesn’t have
one. CAD or CSV layers frequently have this problem.
Create spatial index: Spatial indexes speed up your
geoprocessing operation a lot. This algorithm can create spatial indices
on many types of layers, including those loaded from a database.
ORS Tools Algorithms: OpenRouteService (ORS) provides a
QGIS plugin which installs a slew of network analysis algorithms to the
Processing Toolbox. They allow rich network analysis functionality using
OpenStreetMap data and it works without having to download any
A fun and interesting algorithm is called
Topological coloring. This algorithm implements helps with cartography by allowing you assign a color_id to your polygon layers such that no adjacent polygons have the same color.
As we saw in the previous example, GIS Workflows typically involve many steps - with each step generating intermediate output that is used by the next step. If you change the input data or want to tweak a parameter, you will need to run through the entire process again manually. Fortunately, Processing Framework provides a graphical modeler that can help you define your workflow and run it with a single invocation. You can also run these workflows as a batch over a large number of inputs.
We will take the workflow from the previous section and build a model that can precisely reproduce all the intermediate steps and give us the result. The model will allow us to specify the input layers and parameters and perform all intermediate steps without any user input. This will greatly speed up our analysis and ensure we do not make manual errors when running the same calculations again.
and click the Save button. Save the model as
calculate_road_lengths.model3 file at the default
Vector Layer input and
drag it to the canvas.
Roads as the Parameter name and set
Line as the Geometry Type. Click OK.
Vector Layer input.
States as the Parameter name and set
Polygon as the Geometry Type. Click
Extract by attribute algorithm. Drag it to the canvas.
Roads as the Input layer,
the Selection attribute and
I as the
Value. Click OK.
Add geometry attributes algorithm.
'Extracted (attribute)' from algorithm 'Extract by Attribute'
as the Input layer. Select
Ellipsoidal as the
method for Calculate using. Click OK.
Join attribute by location algorithm.
'Added geom info' from algorithm 'Add geometry attributes'
as the Input layer. The Joim layer will be the
States layer. In the Fields to add, enter
NAME. Click OK.
States input to the
Join attributes by location box. Next, add the
Statistics by category algorithm.
'Join layer' from algorithm 'Join attribues by location' as
the Input vector layer. Enter
length as the
Field to calculate statistics on and
NAME as the
Field(s) with categories. Click OK.
'Statistics by category' from algorithm 'Statistics by categories'
as the Input layer. The Field mapping table is empty.
We need to add rows for the fields that we want in the output. Click
Add new field twice to add 2 rows. Configure the rows to match
our input in Step 19 from the previous exercise. As this is the final
result, we should specify the name of the output layer. Enter
road_length_by_states as Refactored. Click
tl_2019_us_primaryroads as Roads and
tl_2019_us_states as States. Click
Road Category as the Parameter name.
If you closed the modeler window, you can open it again by going to Processing → Toolbox → Models. Locate the
calculate_road_lengthsmodel, right-click and select Edit.
I for the road
category, we will make it user configurable. Right-click the
Extract by attribute box and select Edit.
Road Category will be auto-selected for
the value. This update to our model will allow the user to enter any
value as the Road Category which will be used by this
and select the other vector layer inputs. Click Run. The result
will be a table with lengths of US Highways for each state.
So far we have run the algorithm on 1 layer at a time. But each processing algorithm can also be run in a Batch mode on multiple inputs. This provides an easy way to process large amounts of data and automate repetitive tasks.
The batch processing interface can be invoked by right-clicking any processing algorithm and choosing Execute as Batch Process.
We will take multiple country-level data layers and use the batch processing operation to clip them to a state polygon in a single operation.
Open the batch_processing project from the data
package. The project contains 5 layers in total. The
tl_2019_us_state represents individual states. The other
are line and polygon layers which need to be clipped.
tl_2019_us_state layer and use the
Select Features tool to select a state by clicking
tl_2019_us_state as the Input
layer and click Run. This will create a new layer with the
selected state polygon called
tl_2019_us_primaryroads layers and click OK. 3 new
rows will be automatically added to accommodate all 4 inputs.
Selected features layer as
the Overlay layer.
clipped_. When prompted, choose Fill
with parameter values as the autofill mode, and Input
layer as the Parameter to use.
We saw how we can save multiple layers to a single geopackage file. This makes data sharing easy between computers and users. Your spatial analysis project contains a lot more information than the dataset. There are layer styles, settings, variables, processing models etc. that are saved in different places. Modern versions of QGIS have the ability to store all relevant information into a single geopackage. This means you can document your entire project in a single geopackage.
We will take the source layers and result of the spatial analysis exercise of finding length of interstate roads and package them up in a GeoPackage.
layers in QGIS. Run the Package Layers processing
results.gpkg back in QGIS. Now that all loaded layers are
from a single geopackage, we can save the project. Go to Project
→ Save To → GeoPackage.
results.gpkg file. Click OK. Enter
results as the Project name. Click
results.gpkg file. Click the
Refresh button in the Browser panel and expand the
results.gpkg file. You will see the
model under Models. Right-click and select Edit
GeoPackage is a flexible and versatile format that is adopted across QGIS. You saw how you can package all relevant information about your project inside a single file. Following these practices of saving your models along with source data ensures no information is lost. It also ensures you have well documented workflows that are reproducible by you in the future or by anyone with whom you have shared your data.
This workshop material is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to re-use and adapt the material but are required to give appropriate credit to the original author as below:
Automating GIS Workflows with QGIS by Ujaval Gandhi www.spatialthoughts.com