This course is also offered as an online class. Visit www.spatialthoughts.com/events to know details of upcoming sessions. You may also sign up for my mailing list to know when new sessions are scheduled.
You may also purchase a self-study version of this course with the data package, PDF course material and email support. View and purchase on Gumroad.
This class focuses on techniques for automation of GIS workflows. You will learn techniques that will help you be more productive, create beautiful visualizations and solve complex spatial analysis problems. This class is ideal for participants who already use QGIS and want to take their skills to the next level.
Below are the topics covered in this class
The code examples in this class use a variety of datasets. All the required layers, project files etc. are supplied to you in the
advanced_qgis.zip file with your purchase. Unzip this file to the
QGIS 2.0 introduced a new concept called Processing Framework. Previously known as Sextante, the Processing Framework provides an environment within QGIS to run native and third-party algorithms for processing data. It is now the recommended way to run any type of data processing and analysis within QGIS - including tasks such as selecting features, altering attributes, saving layers etc. - that can be accomplished by other means. But leveraging the processing framework allows you to be more productive, fast, and less error prone.
The Processing Framework consists of the following distinct elements that work together.
We will learn about each of these through hands-on exercises in the following sections.
Processing Toolbox is available from the top-level menu Processing → Toolbox. There are hundreds of algorithms available out of the box. They are organized by Providers. The tools created by QGIS developers is available as a Native QGIS provider. Processing Framework providers an easy way to integrate tools written by other software and libraries such as GDAL, GRASS and SAGA. QGIS Plugins can also add new functionality via processing algorithms in the toolbox.
You can control what providers are available in settings. The Processing Options menu also provides a way to fine-tune the configuration of the framework.
I strongly recommend changing the default settings and enable the option Prefer output filename for layer names. This option ensures that when you use batch processing, the resulting layers are unique.
The aim of this exercise is to show how a multi-step spatial analysis problem can be solved using a purely processing-based workflow. This exercise also shows the richness of available algorithms in QGIS that are able to do sophisticated operations that previously needed plugins or were more complex.
We will work with roads extracted from OpenStreetMap for the state of Karnataka in India. The admin boundary for the state and the districts come from DataMeet.
karnataka.gpkg. Drag and drop the
karnataka_major_roadslayers to the canvas.
karnataka_major_roadscontain all major roads, including national highways, state highways, major arterial roads etc. Select the layer and use the keyboard shortcut F6 to open the attribute table.
refcolumn has information about road designation. As we are interested in only national highways, we can use the information in this column to extract relevant road segments.
reffield starts with the letters
We are using Regular Expression (or RegEx) to match the field value to a specific pattern. Regular expressions are quite powerful and can be used for many complex data filtering operations. Here’s a good tutorial that explains the basics of regular expressions.
Matching Featuresin the Layers Panel. Next, we want to calculate the length of each segment. You can use the built-in algorithm Add geometry attributes
lengthwill be added to the Layers panel. The distances in this field are in meters. Let’s convert them to kilometers. You may reach out for the trusty QGIS field calculator to add a new field. That’s a perfectly valid way - but as mentioned earlier, there is a processing way to do things which is the preferred way. Search and open the Field Calculator processing algorithm instead and enter the following expression.
length_kmwill be added to the Layers panel. Now we are ready to find out the answer. We just need to sum of the values in the
length_kmfield. Use the Basic Statictics for Fields algorithm.
What do you think of the results? The resulting number may not be perfect because the OpenStreetMap database may have missing roads or are classified differently. But it is close to the number provided in the official statistics.
Calculatedand is a temporary memory layer. Let’s save it to the disk so we can use it later. The layer contains many fields which are not relevant to us, so let’s delete some columns before saving. The classic way to do this is to toggle editing and use the Delete Column button from the Attribute Table. If you wanted to rename/reorder certain fields, that needed a plugin. But now, we have a really easy processing algorithm called Refactor Fields that can add, delete, rename, re-order and change the field types all at once. Delete fields that are not required and save the result as the layer
national_highwaysin the source
national_highwayswill be added to the Layers panel. We achieved the goal of the exercise, but we can explore the results a bit better if we can break down the results by a smaller administrative unit. Let’s try to calculate the length of national highways for each district in the state.
karnataka_districtslayer, but not in the
national_highwayslayer. To add the name of the district to the roads layer, we need to perform a spatial-join. This is done using the Join attributes by Location algorithm.
national_highwaysas the input layer and do a one-to-one join with the
karnataka_districtslayer. Select only the DISTRICT field to be added to the output.
Joined layernow has the intersecting district name in the DISTRICT field. We can now sum the road lengths and group them for each district. You may recall that in earlier versions of QGIS ,you needed a plugin called Group Stats to do this. But now we can do this via the built-in Statistic by Categories algorithm.
The output of the algorithm is a table containing various statistics on the
length_km column for each district. The values in the Sum column is the total length of national highways in the district.
Note that we did data processing, spatial analysis and statistical analysis - all using just processing algorithms in a fast, re-producible and intuitive workflow.
To take your processing experience to the next-level, you can use the built-in Locator Bar. At the bottom-left of QGIS main window, there is a universal search bar that can do keyword-search across layers, settings, processing algorithms and more. You can open the locator bar using the keyboard shortcut Ctrl+K.
I find that rather than clicking-around the processing toolbox, you can just use locator bar to search and open the algorithms. Type Ctrl+K, followed by a (to restrict search to algorithms), followed by a space and a few characters. Use the arrow keys to select and press Enter to open the algorithm.
Processing algorithms are designed to take inputs and produce outputs. The default behavior is to create a new layer after each operation. This is useful for many workflows, especially in an enterprise setting, where you may not have the ability to edit the source data. If your algorithms are altering the source data, that also means that the workflows cannot be reproduced easily. So you would want a setup where the algorithms read from a source data and create modified outputs.
An exception to this workflow is when you are doing data editing. When your workflow involves creating new features or editing them - creating a new layer for every edit is undesirable. A recent QGIS crowd-funding campaign added the ability for processing algorithms to modify the features in-place and this functionality is available out-of-the-box in QGIS now.
So far we have run the algorithm on 1 layer at a time. But each processing algorithm can also be run in a Batch mode on multiple inputs. This provides an easy way to process large amounts of data and automate repetitive tasks.
The batch processing interface can be invoked by right-clicking any processing algorithm and choosing Execute as Batch Process.
We will take multiple country-level data layers and use the batch processing operation to clip them to a state polygon in a single operation.
India-Stateslayer and use the Select Features tool to select a state by clicking it.
Selected featureslayer as the Overlay layer.
clipped_. When prompted, choose Fill with parameter values as the autofill mode, and Input layer as the Parameter to use.
GIS Workflows typically involve many steps - with each step generating intermediate output that is used by the next step. If you change the input data or want to tweak a parameter, you will need to run through the entire process again manually. Fortunately, Processing Framework provides a graphical modeler that can help you define your workflow and run it with a single invocation. You can also run these workflows as a batch over a large number of inputs.
National Geospatial-Intelligence Agency’s Maritime Safety Information portal provides a shapefile of all incidents of maritime piracy in the form on Anti-shipping Activity Messages. We can create a density map by aggregating the incident points over a global hexagonal grid.
The steps needed to create a hex-bin layer suitable for visualization is as follows
We will now learn how to build a model that runs the above processing steps in a single workflow.
piracy_hexbinas the Name of the model and
projectsas the Groups. Click the Save button.
ne_10m_landand the Input Points layer is
ASAM_events. The Grid Size needs to be specified in the units of the selected CRS. Enter
100000(100 Kms) as the Grid Size. Click Run to start the processing pipeline. Once the process finishes, click Close.
darker( @symbol_color , 130)
Can you change the model so that instead of entering the grid size in meters, the user can enter the size in kilometers?
Hint: The Create Grid algorithm expects the size in meters, so you will have to convert the input to meters.
When you ran your model, you may have noticed a warning message No spatial index exists for input layer, performance will be severely degraded. This is because certain spatial queries make use of a spatial index and QGIS warns you when having a spatial index can speed up your operations. PostGIS documentation has a good overview of spatial indexes and why they are important.
You can compare a spatial index to a book index. When you want to search for a particular term, rather than scanning each page sequentially, you can speed up your search by looking up the index and directly going to the pages where the word appears. Spatial indexes work in a similar way. You spent the effort once to create the index and all subsequent operations can make use of it. When you create a spatial index, each feature’s bounding box is used to establish its relationship with other features. This is stored alongside the dataset and can be used by algorithms. When trying to determine spatial relationships, the algorithms speed-up the look-up using the following two-pass method:
For large datasets, this approach helps reduce the processing time significantly. QGIS has built-in tools to create and use spatial indexes. Let’s see how we can create spatial index for a layer and use it in our model.
piracy_hexbinmodel and select Edit Model….
Time is an important component of many spatial datasets. Along with location information, time providers another dimension for analysis and visualization of data. If you are working with dataset that contains timestamps or have observations recorded at multiple time-steps, you can easily visualize it using the TimeManager plugin in QGIS.
TimeManager allows you to view and export ‘slices’ of data between certain time intervals that can be combined into animations.
Go to Plugins → Manage and Install Plugins…. Search for and install the TimeManager plugin.
We will continue to work with the maritime piracy dataset. First we will create a heatmap visualization and then animate the heatmap to show how the piracy hot-spots have changed over past 2 decades.
ASAM_eventslayers and click the Open the layer Styling Panel button in the Layers panel. Click the Single symbol drop-down.
dateofocc- representing the date on which the incident took place. This is the field that will be used by the plugin to determine the points that are rendered for each time period. Select ASAM_events as the Layer and dateofocc as the Start time. The End time should be set to Same as start. Click OK. Back in the Time manager settings window, click OK.
You will notice that for each frame of the animation, a date is displayed at the bottom-right. Instead of the full date and time, let’s change it to display the year that the map represents. Also change the placement of the label to the top-left corner. The output should look something like below.
Recent versions of QGIS include native support for 3D data. Using this feature, you can easily view, explore and animate 3D elevation data. Note that your computer must have a supported graphics card for this feature to work.
We will work with a 5m Digital Elevation Model (DEM) of Denali peak in Alaska and create an animation showing a 3D visualization of the dataset.
denali_demlayer as the Elevation. Click OK.
QGIS expression engine has a powerful function called ‘summary aggregates’ that allows evaluating a feature’s geometry and attributes with those of another layer. Expressions can be used for static calculations as well as on-the-fly computations, such as labels, virtual fields, symbology etc. This enables some powerful use cases.
The summary aggregate function operates on all the values from a different layer, returning a single summary value. The syntax of the aggregate function is as follows
aggregate( layer:='layer name or id', aggregate:='aggegate type', expression:='expression to aggregate', filter:='optional filter expression, concatenator:='optional string to use to join values', order_by:='optional expression to order the features' )
We will work with a land parcels data layer provided by the City of San Francisco. The goal of this exercise is to demonstrate the use of aggregate expression for on-the-fly computation when digitizing new features.
boundarylayer and click the Open Field Calculator button.
countwith the following expression. The expression is reading the features from the
parcelslayer and giving an aggregate count of the features. You will notice that the the result will be displayed at the bottom of the window.
aggregate( layer:= 'parcels', aggregate:='count', expression:=fid )
polygonslayer and right-click it. Select Properties.
countand choose Text Edit as the Widget Type. At the Default Value field at the bottom, enter the following expression. Note that additional filter value. Here the
$geometryrefers to the geometry of the
geometry(@parent)refers to the geometry of feature from the
polygonslayer. Click OK.
aggregate( layer:= 'parcels', aggregate:='count', expression:=fid, filter:=intersects($geometry, geometry(@parent)) )
parcelsfield. Enter the following expression as the Default Value.
aggregate( layer:= 'parcels', aggregate:='concatenate', concatenator:=',', expression:=to_string(fid), filter:=intersects($geometry, geometry(@parent)) )
This course material is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You are free to use the material for any non-commercial purpose. Kindly give appropriate credit to the original author
If you would like to use this material for commercial use or for teaching a course, you can purchase a license. Click to view and purchase on Gumroad.
© 2020 Ujaval Gandhi www.spatialthoughts.com