This is an introductory course that covers QGIS from the very basics. You will learn to use QGIS for mapping, spatial data processing, and spatial analysis. This class is ideal for participants with a basic knowledge of GIS and who want to learn how to use QGIS to carry out everyday GIS tasks.
This course requires QGIS LTR version 3.22.x.
Please review QGIS-LTR Installation Guide for step-by-step instructions.
The exercises and challenges in this course4 use a variety of datasets. All the required datasets are supplied in the introduction_to_qgis.zip
file. Unzip this file to a directory - preferably to the <home folder>/Downloads/introduction_to_qgis/
folder.
The data package also comes with a solutions
folder that contain model solutions for each section.
Not enrolled in our instructor-led class but want to work through the material on your own? Get free access to the data package
We will be using several toolbars in this course. To ensure you have the required tools for the exercises, go to View menu, select Toolbars and ensure that the following toolbars are checked.
We will be using the following plugins during the course. Go to From the Plugins menu, choose Manage and Install Plugins…. Under the All tab, search for the plugin name and click on the Install Plugin button to install it.
This section is designed to help you get familiar with the basic workflow of importing data layers, applying symbology, adding labels, and designing layouts for maps. We will take a text file containing historical records of earthquakes and turn it into an informative visualization like the one below.
ne_10m_land.shp
file and click Open. In the Data Source Manager window, click Add.ne_10m_land
will be added to the Layers panel and displayed on the Canvas. This layer contains polygons representing the land areas of the world. Click on the Open Data Source Manager button again.gem_active_faults_harmonized.gpkg
file and click Open followed by Add.gem_active_faults_harmonized
will be added to the Layers panel and displayed on the Canvas. This is a global layer containing lines representing all the active faults. We will now import another layer of earthquake points. Click on the Open Data Source Manager button again.significant_earthquakes_2000_2020.tsv
file. This is a text file in the Tab-Separated Values (TSV) format. In the File Format section, select Custom delimiters.Note: Windows users may need to change the File Type as All in Choose a Delimited Text File to Open dialog to see the TSV file.
significant_earthquakes_2000_2020
will be added to the Layers panel and displayed on the Canvas. This layer contains over 1000 records of significant earthquakes recorded between 2000 and 2020. Right-click on the significant_earthquakes_2000_2020
layer and select Open Attribute Table. Examine all the attributes and their values.Note: If the selection toolbar is not enabled, right-click on the toolbar panel and check Selection Toolbar.
significant_earthquakes_2000_2020
layer and select Open Attribute Table. You will see that there are 6 selected features in the layer. If you want to examine their attributes, there is a handy shortcut. Click the Move selection to top button.significant_earthquakes_2000_2020
layer and go to Export → Save Selected Features As...large_earthquakes.gpkg
. Click Save. Click OK.large_earthquakes
will be added to the Layers panel.We have now finished the first part of this exercise. Your output should match the contents of the Earthquakes_Checkpoint1.qgz
file in the solutions
folder.
Do you know about Null Island? The ne_10m_land
contains a polygon for this feature. Locate this polygon on the map.
Hint: Open the attribute table, find and select the feature for Null island. Then use the Zoom map to the selected rows button.
The symbology of a layer is its visual appearance on the map. We will now learn different techniques for styling each layer to convey the information visually.
ne_10m_land
layer and click Open the Layer Styling Panel.gem_active_faults_harmonized
layer in the Layers panel. Click the Simple Line symbol to reveal more styling options. Change the Color to a shade of brown. Set the Stroke width to 0.1.significant_earhtquakes_2000_2020
layer. Click the Simple Marker symbol. Change the Size option to 0.7 Millimeters. Select red as the Fill color and white as the Stroke color. Change the Stroke width to 0.1.large_earthquakes
layer, which features the main information we want to convey through this map. We will use a Proportional Circle style and have the size of the circle represent the total fatalities caused by the respective earthquake. Click Simple Marker to see more styling options. Click the Data defined override button next to Size and choose Assistant.The default Scale method used by QGIS is Flannery. This method applies a non-linear scaling to compensate for human perception of areas. Learn more. ↗
We have now finish the second part of this exercise. Your output should match the contents of the Earthquakes_Checkpoint2.qgz
file in the solutions
folder.
QGIS has many rich cartography features. One of my favorites is called Live Layer Effects. This allows you to add effects such as Outer Glow, Drop Shadow, etc., to each symbol. This takes your symbology to the next level and helps highlight certain features. Select the large_earthquakes
layer and open the Layer Styling Panel. Expand the Layer Rendering section and enable Draw effects. Click the Customize effects button and add a drop shadow effect to the layer.
Labels are a useful way to convey additional information for any features. Labels are associated with each feature and can be configured to show information from the attributes. We will now add labels to each of the large earthquake points to show the name of the location as well as the deaths caused due to that earthquake.
large_earthquakes
layer and open the Layer Styling Panel. Switch to the Labels tab.Note: Changing the Project CRS does not change the CRS of the layers, but reprojects them on-the-fly to the chosen CRS for display.
"Location Name" || ';' || 'Deaths:' || "Total Deaths"
We have now finished the third part of this exercise. Your output should match the contents of the Earthquakes_Checkpoint3.qgz
file in the solutions
folder.
The numbers displayed in the labels can be hard to read since they are not formatted. We can make them readable by adding a thousand-separator. So a number such as 227899 is displayed as 227,899 and a number like 5749 as 5,749. Update the expression for the labels, so the numbers are formatted. To achieve this, you can use the format_number()
function in the QGIS expression editor.
QGIS comes with a rich set of tools to create map layouts that allow you to add elements such as labels, images, legend, scale bar, north arrow, etc., to your map. It also allows you to export the map layout as an image or a PDF. We will now take the visualization created in the QGIS map canvas and create a print layout. 1. Go to Project → New Print Layout….
ne_10m_land
layer and click the Remove selected item(s) from legend.gem_active_faults_harmonized
layer and click the Edit selected item properties button. Change the name of the layer to Faults
. Similarly, change the large_earthquakes
layer name to Deaths
.Total Deaths
layer and click Remove selected item(s) from legend. The legend now contains only the items that are easy to interpret and add context to the symbols on the map.We come to the end of this exercise. Your output should match the contents of the Earthquakes_Checkpoint4.qgz
file in the solutions
folder.
Export your layout as a PDF.
Hint: The PDF Export Options dialog has an option Simplify geometries to reduce output file size at the bottom. While useful, this can have unexpected effect on the output. Un-check it while doing the export.
Print Layout Exported as a PDF
In this section, we will learn basic data processing and visualization techniques. We will use geographic boundaries and population count data for the City of New York and create a population density map. This requires doing a table join and using a graduated symbology to create a choropleth map.
nynta2010.shp
file and click Open. In the Data Source Manager window, click Add.EPSG:2263 NAD83 / New York Long Island (ftUS)
projection whereas the default projection in QGIS is EPSG:4326 - WGS84
. This dialog presents several transformations to convert between the coordinates between these projections. Choose the first option and click OK.nyta2010
will be added to the Layers panel and will be displayed on the Canvas. This layer contains polygons representing the Neighborhood Tabulation Areas (NTAs) for New York city. Right-click on the nyta2010
layer and select Open Attribute Table.NTACode
field contains a unique identifier for each polygon. Notice that we do not have any population or demographic attributes in this layer.New_York_City_Population_By_Neighborhood_Tabulation_Areas.csv
file and select it. Since this CSV file is just tabular data, select No geometry (attribute only table) option and click Add.New_York_City_Population_By_Neighborhood_Tabulation_Areas
is added to the Layers panel, right-click on it and select Open Attribute Table.nynta2020
layer. We can use this column to join this table with the vector layer.New_York_City_Population_By_Neighborhood_Tabulation_Areas
layer and select Filter."Year" = 2010
nynta2010
as the Input layer and NTACode
as the Table field. Select New_York_City_Population_By_Neighborhood_Tabulation_Areas
as the Input layer 2 and NTA Code
as the Table field 2. Click the … button next to Layer 2 fields to copy.Population
field and click OK.nynta_with_population
. Make sure the file type is selected as GPKG files (*.gpkg). Click Save.nynta_with_population
will be added to the Layers panel. Right-click the layer and select Open Attribute Table. You will see that we now have an additional column Population in the attribute table. The table also has a column Shape_Area containing the area of each polygon in Sq.Ft.Density
as the Field Name. We will now build the expression to calculate population density. From the function groups next to the Expression panel, expand the Fields and Values section. Double-click the Population
field to add it to the expression editor. Note that fields are referred using double-quotes (") in QGIS.Shape_Area
field to enter it. You may also type the expression instead of picking the values from the dialog. The final expression should look like below."Population" / "Shape_area"
5280 * 5280 * ("Population" / "Shape_area")
nynta_population_density.gpkg
and click Run.nynta_population_density
will be added to the Layers panel. Open the attribute table and verify that you have a new column named Density.nynta_population_density
. Right-click and select Remove Layer….NYC_Population_Density
and click Save. QGIS will save the project file in the QGZ format.We have now finish the first part of this exercise. Your output should match the contents of the NYC_Population_Density_Checkpoint1.qgz
file in the solutions
folder.
Round the population density values to the nearest integer and store them in another column named Density_Round.
round()
that can round a fraction to the chosen number of decimal places.Graduated
renderer.Density
as the Value.YlOrBr
(Yellow-Orange-Brown) ramp.6
and click Classify. You will see each polygon colored according to the population density attribute.20000
.> 100000
.We have now finish the second part of this exercise. Your output should match the contents of the NYC_Population_Density_Checkpoint2.qgz
file in the solutions
folder.
Create a new layer containing all the neighborhood tabulation areas having a population density > 100000.
Hint: You can use the Extract by Attribute algorithm from the Processing Toolbox.
Georeferencing is the process of assigning real-world coordinates to each pixel of the raster. This is an important step in preparing your data for further analysis. Many projects, particularly machine learning projects - need continuous historic records to build a model. Many of the older datasets may come in form of scanned maps or aerial photos that needs to be georeferenced. Similarly, some organizations may only share a PDF or a static map image of the dataset which will need to be converted into a GIS-ready format using georeferencing process.
Georeferencing process involves collecting GCPs (Ground Control Points) or Tie-Points. These GCPs are easily identifiable features in the image or map whose real-world coordinates are obtained from field-survey using a GPS device or identified from already georeferenced sources within a GIS.
In this exercise, you’re going to georeference an old scanned map of Bangalore, India created in 1924. This map is possibly hand-drawn and has no coordinate markings, so we will use a tiled basemap layer to locate the features and obtain the GCPs.
Get contributed pack
to download 3rd-party basemaps.You will see a warning against using contributed services. Some of these services may have restrictions on their usage and/or attribution requirement that you need to follow. Please review them before using them in your project.
We have now finished the first part of this exercise. Your output should match the contents of the Georeferencing_Checkpoint1.qgz
file in the solutions folder.
Load the Stamen Watercolor basemap by Stamen. This is an award-winning basemap that renders OpenStreetMap data in a hand-painted watercolor style.
In this section, you will learn how to load a scanned image, collect GCPs (ground control points) and warp the image to create a GeoTiff file.
Bangalore_1924.png
file in your data package and click open.Banglore_1924_modified.tif
. Choose LZW in Compression. Finally, check the Save GCP points and Load in QGIS when done boxes. Click OK to save this setting and close the settings dialog.Banglore_1924_modified
will be added and overlayed on the basemap.We have now completed all steps and you should have a georeferenced image.Your output should match the contents of the Georeferencing_Checkpoint2.qgz
file in the solutions folder. We have also provided the saved GCPs in the solutions folder. To load them, you can go to File → Load GCP Points.. and select the bangalore_gcp.points
file in the solutions/
folder of the data package.
In this exercise we used the Polynomial 2 technique. For datasets that require more aggressive transformation, you can use the Thin Plate Spline algorithm. This method is also known as Rubber Sheeting. Change the transformation setting to use Thin Plate Split and run the georeferencer again. Compare the output with the previous result.
Many GIS tasks require editing existing data layers or creating new datasets. Often a large amount of GIS time is spent digitizing raster data to create vector layers that you use in your analysis. Many machine learning projects also require creating a labeled dataset that needs to be made by digitizing features from satellite imagery or historical maps. QGIS has powerful on-screen digitizing and editing capabilities that we will explore in this tutorial.
In this exercise, you will create a vector layer of historic lakes in the city of Bangalore, India. This city has experienced urbanization at a rapid pace. Due to this, many water bodies have been lost. We will use the georeferenced scanned map from the previous exercise to digitize polygons for all the water bodies in 1924 and label their current status. Finally, we will create a vector layer and assign attributes to them, indicating whether they are healthy, lost, or partially lost.
We will first create a new layer and configure the attribute form to capture the data about the features.
Digitizing.qgz
project and click Open.OSM Standard
basemap and the Banglore_1924_modified
georeferenced scanned map. To digitize the waterbodies, let’s create a new vector layer. Click on the Layer → Create Layer → New GeoPackage Layer… from the menubar.banglore_lakes
and click OK. Now the Table name will be auto-populated as banglore_lakes
. Choose MultiPolygon
as Geometry type. Let the CRS be in default EPSG:4326
projection.name
, in Type choose Text Data
, and in Maximum length, enter 50
. Now click Add to Fields List button.status
filed with Type as integer
. Click OK to close the dialog.banglore_lakes
layer will be added to the Layers tab.banglore_lakes
layer and click Open Attribute Table.bangalore_lakes
attribute table, there are three fields. fid is an integer field which is required by the GeoPackage format and is autogenerated. The name and status must be entered while digitizing the waterbodies. Close the attribute table.banglore_lakes
layer and click Properties to open the properties dialog.Value Map
. Value Map allows us to create a drop-down menu that allows you to pick from a predefined set of values.Value | Description |
---|---|
1 | Healthy |
2 | Partially Lost |
3 | Lost |
Save the project. Your results should match the contents of the Digitizing_Checkpoint1.qgz
file in the solutions folder.
The fid column contains auto-increment unique id for each feature.The GeoPackage format requires this integer field to maintain data integrity. Manually overriding this id to a different value can cause data corruption. Edit the attribute form for the fid field so that it is not user-editable.
Banglore_1924_modified
layer and check if the lake exists in the current basemap. This will be helpful in entering the attribute of the lake you will digitize.Banglore_1924_modified
on. Select the banglore_lakes
layer and start digitizing the waterbody. Click the Toggle editing followed by the Add Polygon Feature button. Starting from an edge, keep on adding vertices using left-click. Once the polygon is fully digitized, right-click to complete it.OSM Standard
basemap, enter the name and select the status of the waterbody. Click OKbanglore_lakes
layer and click Open Attribute Table.24
digitized features. You can note the fid column is auto-populated with a unique value for each record. Close the attribute table.Your results should now match the contents of the Digitizing_Checkpoint2.qgz
file in the solutions folder.
Style the layer based on the status column. This column has categorical values that can be used assign a different color to each waterbody.
Hint: Use the Categorized renderer.
Geoprocessing refers to the set of operations used to transform the input data to create a new dataset. This section will learn about some essential vector and raster geoprocessing tools to solve a complex spatial analysis problem.
In this analysis, we will learn how to download vector data from OpenStreetMap and use it to determine the number of people who live within 1 km of a metro station.
bangalore.json
file and click Open. In the Data Source Manager window, click Add.railway=station
. Enter railway
as the Key, station
as the Value and Bangalore as the geographic filter In. Expand the Advanced section and check only the Node and Points boxes. Click Run Query.railway_station_Bangalore
loaded in the canvas. This includes all railway stations - including the ones operator by Indian Railways and stations under construction. We can apply a filter to select only the operational metro stations. Right-click the layer and select Filter.Note use of
IS NOT
operator instead of!=
. The reason is that these columns contain NULL records. NULL is not a value that cannot be equal or not equal to another value. TheIS NOT
operator will match all records that do not match the value, including NULL records.
"operator" = 'BMRCL' AND
"disused" IS NOT 'yes' AND
"disused:railway" IS NOT 'station'
railway_station_Bangalore
layer in the Layers panel, indicating that it is a temporary scratch layer. IF we do not save it, it will go away when we close QGIS. Right-click on it and select Make Permanent.railway_station_Bangalore.gpkg
. Click OK. We have now saved the layer to the disk. Save your project as Geoprocessing.qgz
.We have now completed the data download and pre-processing steps. Your output should match the contents of the Geoprocessing_Checkpoint1.qgz
file in the solutions folder.
You will notice that the attribute table for the railway_station_bangalore
layer has many columns. Open the attribute table and delete all the columns except the fid and osm_id columns. Hint: Use the Delete field tool from the attribute table.
railway_station_Bangalore
as the Input layer. Select EPSG:32643 - WGS 84 UTM Zone 43N
as the Target CRS. Name the Reprojected layer as metro_stations_reprojected.gpkg
.metro_stations_reprojected
is created, search for the Vector geometry → Buffer algorithm and double-click to launch the algorithm.metro_stations_reprojected
as the Input layer. Enter 1 kilometers as the Distance. Check the Dissolve result option and name the Buffered output layer as metro_stations_buffer.gpkg
. Click Run.metro_stations_buffer
as the Input layer and EPSG:4326 -WGS 84
as the Target CRS. Name the output as metro_station_buffer_reprojected.gpkg
. Click Run.metro_station_buffer_reprojected
will be added to the canvas. We will now remove the intermediate layers from the project. Hold the Shift key and select the metro_station_buffer
, metro_stations_reprojected
and railway_station_Bangalore
layers. Right-click and select Remove Layer….Your output should match the contents of the Geoprocessing_Checkpoint2.qgz
file in the solutions folder.
Your data package contains a dataset called bangalore_pubs.gpkg
with the location of all pubs within the city. Select all the pubs from the layer within 1km of a metro station. Hint: Use the Select by Location tool from the Processing Toolbox.
We will now use a population grid and overlay the buffered polygon to calculate the number of people who live within the buffer zone.
bangalore_ppp_2020_constrained.tif
in the data package. Click Add.bangalore_ppp_2020_constrained
layer and use the Identify tool to explore the pixel values. The resolution of the raster is 100m x 100m per pixel. The raster layer has only one band, and the pixel value is the estimated number of people within the 100 sq.km area. Click the Pan Map button (Hand icon) to exit the Identify mode.metro_station_buffer_reprojected
as the Input layer and bangalore_ppp_2020_constrained
as the Raster layer. Enter population_ as the Output column prefix. Click the … button next to Statistics to calculate and choose only Sum. Finally, next to Zonal Statistics output, click the ...
and save the layer as metro_station_buffer_pop.gpkg
Click Run.metro_station_buffer_pop
, will be added to the canvas. Right-click the layer and select Open Attribute Table. You will see a new field population_sum field containing the population within the buffer polygon.Your output should match the contents of the Geoprocessing_Checkpoint3.qgz
file in the solutions folder.
Repeat the Zonal Statistics operation on the bangalore
layer to calculate the city’s total population. Determine what percentage of the city population lives within 1km of a metro station.
This course material is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to re-use and adapt the material but are required to give appropriate credit to the original author as below:
Introduction to QGIS Course by Ujaval Gandhi www.spatialthoughts.com
This course is offered as an instructor-led online class. Visit Spatial Thoughts to know details of upcoming sessions.
© 2022 Spatial Thoughts www.spatialthoughts.com