This workshop focuses on techniques for automation of GIS workflows and covers the Processing Toolbox in detail. Below are the topics covered in this workshop
The code examples in this workshop use a variety of datasets. All the
required layers, project files etc. are supplied to you in the zip file
automating_gis_workflows.zip. Unzip this file to the
QGIS 2.0 introduced a new concept called Processing Framework. Previously known as Sextante, the Processing Framework provides an environment within QGIS to run native and third-party algorithms for processing data. It is now the recommended way to run any type of data processing and analysis within QGIS - including tasks such as selecting features, altering attributes, saving layers etc. - that can be accomplished by other means. But leveraging the processing framework allows you to be more productive, fast, and less error prone.
The Processing Framework consists of the following distinct elements that work together.
Processing Toolbox is available from the top-level menu Processing → Toolbox. There are hundreds of algorithms available out of the box. They are organized by Providers. The tools created by QGIS developers is available as a Native QGIS provider. Processing Framework providers an easy way to integrate tools written by other software and libraries such as GDAL, GRASS and SAGA. QGIS Plugins can also add new functionality via processing algorithms in the toolbox.
The aim of this exercise is to show how a multi-step spatial analysis problem can be solved using a purely processing-based workflow. This exercise also shows the richness of available algorithms in QGIS that are able to do sophisticated operations that previously needed plugins or were more complex.
We will work with shapefiles provided by US Census Bureau. The
tl_2019_us_primaryroads.zip file comes TIGER/Line roads
database and contains all the major roads in the US - including
Interstate highways. The
tl_2019_us_state.zip file comes
from the Cartographic Boundary Files and contains the state
tl_2019_us_state.zipfiles. Drag and drop the
tl_2019_us_state.shplayers to the canvas.
tl_2019_us_primaryroadscontain all major roads, including interstate highways, state highways, US highways etc. Select the layer and use the keyboard shortcut F6 to open the attribute table. You will notice that the
RTTYPcolumn has information about road designation. As we are interested in only interstate highways, we can use the information in this column to extract relevant road segments.
Note: The roads layer has 2 line segments per road, representing the route in both directions.
tl_2019_us_primaryroadsas the Input layer. Select
RTTYPas the Selection attribute and
Ias the Value. This will extract all feature where RTTYP value is I (Interstate). Click Run.
Extracted (attribute)in the Layers Panel. Next, we want to calculate the length of each segment. You can use the built-in algorithm Add geometry attributes
Extracted (attribute)as the Input layer, select
Ellipsoidalas Calculate using and click Run.
lengthwill be added to the Layers panel. The distances in this field are in meters.
tl_2019_us_statelayer, but not in the roads layer. To add the name of the state to the roads layer, we need to perform a spatial-join. This is done using the Join attributes by Location algorithm.
Added geom infoas the Input layer and
tl_2019_us_stateas the Join layer. For the Fields to add, we need only the
NAMEof the state.
Take attribute of the first located feature onlyand click Run to perform a one-to-one join.
If our input road layers had segments that crossed state lines, we would have to do an extra step of splitting them so when we count road lengths per state - we get accurate results. You may do this by converting the states layer to lines using Polygons to lines algorithm, followed by the Split by Lines to split the road segments at the boundary.
Joined layernow has the state name for every road segment.
Joined layeras the Input vector layer. We want to calculate lengths, but group them by states. So select
lengthas the Field to calculate statistics on and
NAMEas the Field(s) with categories. Click Run.
lengthcolumn for each state. The values in the sum column is the total length of national highways in the district.
sumfield is in meters. We can convert it to miles here itself. Click the button next to
sumin the Source experssion column and enter the following expression that converts meters to miles and rounds the value to the nearest integer. Click OK.
Length_Miles. We are ready to apply the changes. So far we have created temporary output layers, but this is the final result of the analysis, so we can save it to the disk. Click the
...button at the bottom and select
Save to GeoPackage.
roads.gpkg. When prompted for a layer name, enter
road_length_by_state. Click Run.
road_length_by_statewill be added to the Layers panel. The table will have the fields renamed and re-calculated as we had specified.
Note: The refactor fields algorithm has a bug for the Mac version that makes all fields as strings regardless of the user’s choice. This is a known issue and will be fixed in the future version. Till then, if you are on Mac, a workaround is to use the Field Calculator algorithm to take the result and add fields with correct types and conversion expressions such as to_int(), to_double() etc.
Note that we did data processing, spatial analysis and statistical analysis - all using just processing algorithms in a fast, re-producible and intuitive workflow.
The Processing Toolbox contains hundreds of algorithms - with new ones being added everyday. Many plugins also add processing algorithms that support new functionality. While you may think that processing algorithms are meant for analysis - there are plenty of algorithms that offer functionality that is beyond geoprocessing. Here are a few algorithms that I find very useful to automate my workflows that you may not be aware of.
Download file: Download file from a URL or FTP site. Allows you to automate tasks where you need to download new data regularly and process it.
Import geotagged photos: Extracts latitude, longitude and azimuth information from a directory of photos and creates a point layer.
Add autoincremental field: Simple but very useful algorithm to add a unique integer field. Many databases (even geopackage) requires that each layer has a unique integer field. This algorithm helps creating this field if your source layer doesn’t have one. CAD or CSV layers frequently have this problem.
Create spatial index: Spatial indexes speed up your geoprocessing operation a lot. This algorithm can create spatial indices on many types of layers, including those loaded from a database.
ORS Tools Algorithms: OpenRouteService (ORS) provides a QGIS plugin which installs a slew of network analysis algorithms to the Processing Toolbox. They allow rich network analysis functionality using OpenStreetMap data and it works without having to download any data.
A fun and interesting algorithm is called
Topological coloring. This algorithm implements helps with cartography by allowing you assign a color_id to your polygon layers such that no adjacent polygons have the same color.
As we saw in the previous example, GIS Workflows typically involve many steps - with each step generating intermediate output that is used by the next step. If you change the input data or want to tweak a parameter, you will need to run through the entire process again manually. Fortunately, Processing Framework provides a graphical modeler that can help you define your workflow and run it with a single invocation. You can also run these workflows as a batch over a large number of inputs.
We will take the workflow from the previous section and build a model that can precisely reproduce all the intermediate steps and give us the result. The model will allow us to specify the input layers and parameters and perform all intermediate steps without any user input. This will greatly speed up our analysis and ensure we do not make manual errors when running the same calculations again.
calculate_road_lengthsand click the Save button. Save the model as
calculate_road_lengths.model3file at the default location.
Vector Layerinput and drag it to the canvas.
Roadsas the Parameter name and set
Lineas the Geometry Type. Click OK.
Statesas the Parameter name and set
Polygonas the Geometry Type. Click OK.
Extract by attributealgorithm. Drag it to the canvas.
Roadsas the Input layer,
RTTYPas the Selection attribute and
Ias the Value. Click OK.
Add geometry attributesalgorithm.
'Extracted (attribute)' from algorithm 'Extract by Attribute'as the Input layer. Select
Ellipsoidalas the method for Calculate using. Click OK.
Join attribute by locationalgorithm.
'Added geom info' from algorithm 'Add geometry attributes'as the Input layer. The Joim layer will be the
Stateslayer. In the Fields to add, enter
NAME. Click OK.
Statesinput to the
Join attributes by locationbox. Next, add the
Statistics by categoryalgorithm.
'Join layer' from algorithm 'Join attribues by location'as the Input vector layer. Enter
lengthas the Field to calculate statistics on and
NAMEas the Field(s) with categories. Click OK.
'Statistics by category' from algorithm 'Statistics by categories'as the Input layer. The Field mapping table is empty. We need to add rows for the fields that we want in the output. Click Add new field twice to add 2 rows. Configure the rows to match our input in Step 19 from the previous exercise. As this is the final result, we should specify the name of the output layer. Enter
road_length_by_statesas Refactored. Click OK.
tl_2019_us_primaryroadsas Roads and
tl_2019_us_statesas States. Click Run.
Road Categoryas the Parameter name.
If you closed the modeler window, you can open it again by going to Processing → Toolbox → Models. Locate the
calculate_road_lengthsmodel, right-click and select Edit.
Ifor the road category, we will make it user configurable. Right-click the
Extract by attributebox and select Edit.
Road Categorywill be auto-selected for the value. This update to our model will allow the user to enter any value as the Road Category which will be used by this algorithm.
Uand select the other vector layer inputs. Click Run. The result will be a table with lengths of US Highways for each state.
So far we have run the algorithm on 1 layer at a time. But each processing algorithm can also be run in a Batch mode on multiple inputs. This provides an easy way to process large amounts of data and automate repetitive tasks.
The batch processing interface can be invoked by right-clicking any processing algorithm and choosing Execute as Batch Process.
We will take multiple country-level data layers and use the batch processing operation to clip them to a state polygon in a single operation.
Open the batch_processing project from the data
package. The project contains 5 layers in total. The
tl_2019_us_state represents individual states. The other
are line and polygon layers which need to be clipped.
tl_2019_us_statelayer and use the Select Features tool to select a state by clicking it.
tl_2019_us_stateas the Input layer and click Run. This will create a new layer with the selected state polygon called
tl_2019_us_primaryroadslayers and click OK. 3 new rows will be automatically added to accommodate all 4 inputs.
Selected featureslayer as the Overlay layer.
clipped_. When prompted, choose Fill with parameter values as the autofill mode, and Input layer as the Parameter to use.
We saw how we can save multiple layers to a single geopackage file. This makes data sharing easy between computers and users. Your spatial analysis project contains a lot more information than the dataset. There are layer styles, settings, variables, processing models etc. that are saved in different places. Modern versions of QGIS have the ability to store all relevant information into a single geopackage. This means you can document your entire project in a single geopackage.
We will take the source layers and result of the spatial analysis exercise of finding length of interstate roads and package them up in a GeoPackage.
road_lengths_by_statelayers in QGIS. Run the Package Layers processing algorithm.
results.gpkg. Click Run.
results.gpkgback in QGIS. Now that all loaded layers are from a single geopackage, we can save the project. Go to Project → Save To → GeoPackage.
results.gpkgfile. Click OK. Enter
resultsas the Project name. Click OK.
results.gpkgfile. Click the Refresh button in the Browser panel and expand the
results.gpkgfile. You will see the
calculate_road_lenghtsmodel under Models. Right-click and select Edit Model.
GeoPackage is a flexible and versatile format that is adopted across QGIS. You saw how you can package all relevant information about your project inside a single file. Following these practices of saving your models along with source data ensures no information is lost. It also ensures you have well documented workflows that are reproducible by you in the future or by anyone with whom you have shared your data.
This workshop material is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to re-use and adapt the material but are required to give appropriate credit to the original author as below:
Automating GIS Workflows with QGIS by Ujaval Gandhi www.spatialthoughts.com