Introduction

This is an introductory course that covers QGIS from the very basics. You will learn to use QGIS for mapping, spatial data processing, and spatial analysis. This class is ideal for participants with a basic knowledge of GIS and who want to learn how to use QGIS to carry out everyday GIS tasks.

View Presentation

View the Presentation ↗

Software

This course requires QGIS LTR version 3.34.x.

Please review QGIS-LTR Installation Guide for step-by-step instructions.

Get the Data Package

The exercises and challenges in this course use a variety of datasets. All the required datasets are supplied in the introduction_to_qgis.zip file. Unzip this file to a directory - preferably to the <home folder>/Downloads/introduction_to_qgis/ folder.

Download introduction_to_qgis.zip.

Configuration and Setup

Enable Required Toolbars

We will be using several toolbars in this course. To ensure you have the required tools for the exercises, go to View menu, select Toolbars and ensure that the following toolbars are checked.

  • Attributes Toolbar
  • Data Source Manager Toolbar
  • Digitizing Toolbar
  • Label Toolbar
  • Map Navigation Toolbar
  • Project Toolbar
  • Selection Toolbar
  • Snapping Toolbar

Install Required Plugins

We will be using the following plugins during the course. Go to From the Plugins menu, choose Manage and Install Plugins…. Under the All tab, search for the plugin name and click on the Install Plugin button to install it.

  • QuickMapServices
  • QuickOSM

1. Creating Maps

This section is designed to help you get familiar with the basic workflow of importing data layers, applying symbology, adding labels, and designing layouts for maps. We will take a text file containing historical records of earthquakes and turn it into an informative visualization like the one below.

1.1 Importing Vector Data

  1. Open QGIS. The first step is to import the source datasets. Click on the Open Data Source Manager button.

  1. Select the Vector tab. Click the button next to Vector Dataset(s) and browse to the data directory.

  1. Select the ne_10m_land.shp file and click Open. In the Data Source Manager window, click Add.

  1. A new layer, ne_10m_land will be added to the Layers panel and displayed on the Canvas. This layer contains polygons representing the land areas of the world. Click on the Open Data Source Manager button again.

  1. Click the button next to Vector Dataset(s) and browse to the data directory. Select the gem_active_faults_harmonized.gpkg file and click Open followed by Add.

  1. A new layer, gem_active_faults_harmonized will be added to the Layers panel and displayed on the Canvas. This is a global layer containing lines representing all the active faults. We will now import another layer of earthquake points. Click on the Open Data Source Manager button again.

  1. Select the Delimited Text tab. Click the button next to File name and browse to the data directory. Select the significant_earthquakes_2000_2020.tsv file. This is a text file in the Tab-Separated Values (TSV) format. In the File Format section, select Custom delimiters.

Note: Windows users may need to change the File Type as All in Choose a Delimited Text File to Open dialog to see the TSV file.

  1. Check the Tab checkbox. In the Geometry Definition section, ensure Longitude is selected as the X Field and Latitude is selected as the Y Field. Choose EPSG:4326 as Geometry CRS. Leave other options to their default values and click Add.

  1. A new layer, significant_earthquakes_2000_2020 will be added to the Layers panel and displayed on the Canvas. This layer contains over 1000 records of significant earthquakes recorded between 2000 and 2020. Right-click on the significant_earthquakes_2000_2020 layer and select Open Attribute Table. Examine all the attributes and their values.

  1. We will now learn about some of the tools to query and select records. From the Selection Toolbar, click the Select Features by Value… button.

Note: If the selection toolbar is not enabled, right-click on the toolbar panel and check Selection Toolbar.

  1. In the Select Features dialog, enter 2020 as the Year and click the Select Features button. You will see all earthquakes that occurred during 2020 will be highlighted in yellow. You may also click the Flash Features button to see the selected records blink.

  1. Let’s refine the query a little more. Enter 7 as the Mag parameter and set the criteria as Greater than (>). Click Select Features. You will now see only those points where the earthquake occurred in 2020, and its magnitude was greater than 7. Close the window.

  1. Right-click on the significant_earthquakes_2000_2020 layer and select Open Attribute Table. You will see that there are 6 selected features in the layer. If you want to examine their attributes, there is a handy shortcut. Click the Move selection to top button.

  1. All the selected rows will be displayed on the top of the attribute table - making it easy to examine the selected features. Click the Deselect all features from the layer button.

  1. For our map, we need another layer of 10 largest earthquakes - so we can style it differently than other earthquakes. For our visualization, we will define the largest earthquakes like the ones that resulted in the highest number of deaths. Locate the Total Deaths attribute and click twice on the column header. This will sort the features in descending order of the values in this column.

  1. Hold the Shift key and select the first 10 rows. This selection will be the 10 earthquakes with the high fatalities.

  1. We will save the selected 10 features as a new layer. Right-click the significant_earthquakes_2000_2020 layer and go to Export → Save Selected Features As...

  1. Select GeoPackage as the Format. Click the button next to File name and browse to the data directory. Name the layer as large_earthquakes.gpkg. Click Save. Click OK.

  1. A new layer, large_earthquakes will be added to the Layers panel.

  1. Our data preparation is now complete. Let’s save our work. Go to Project → Save. Browse to the data directory and enter the name as Earthquakes. Click Save.

  1. The project will be saved in the QGZ format as a file.

We have now finished the first part of this exercise. Your output should match the contents of the Earthquakes_Checkpoint1.qgz file in the solutions folder.

1.1.1 Challenge

Do you know about Null Island? The ne_10m_land contains a polygon for this feature. Locate this polygon on the map.

Hint: Open the attribute table, find and select the feature for Null island. Then use the Zoom map to the selected rows button.

1.2 Symbology

The symbology of a layer is its visual appearance on the map. We will now learn different techniques for styling each layer to convey the information visually.

  1. Select the ne_10m_land layer and click Open the Layer Styling Panel.

  1. We will style this layer with a simple grey color. Click Simple Fill to reveal more options. Click the dropdown next to Fill color.

  1. Use the color picker to select a light shade of grey color. The Layer Styling Panel is interactive, so you can immediately preview your styling changes in the map canvas.

  1. Similarly, change the Stroke color to white.

  1. Next, we will style the faults layer. Select gem_active_faults_harmonized layer in the Layers panel. Click the Simple Line symbol to reveal more styling options. Change the Color to a shade of brown. Set the Stroke width to 0.1.

  1. Now we will change the style of the earthquake points. Select the significant_earhtquakes_2000_2020 layer. Click the Simple Marker symbol. Change the Size option to 0.7 Millimeters. Select red as the Fill color and white as the Stroke color. Change the Stroke width to 0.1.

  1. We have now styled the three background layers. We will style the large_earthquakes layer, which features the main information we want to convey through this map. We will use a Proportional Circle style and have the size of the circle represent the total fatalities caused by the respective earthquake. Click Simple Marker to see more styling options. Click the Data defined override button next to Size and choose Assistant.

  1. In the Input section, select Total Deaths as the Source. Set the range of Values from 5000 to 500000. In the Output section, set the range of Size from 3 to 10. This will use the attribute value in the Total Deaths field to set the size of the circles. Click the Go Back button at the top of the panel to return to the previous menu.

The default Scale method used by QGIS is Flannery. This method applies a non-linear scaling to compensate for human perception of areas. Learn more. ↗

  1. Click the Fill color and adjust the transparency. This will help show the information under the large circles.

  1. Set the Stroke color to white.

  1. The circles represent the number of deaths caused by each earthquake. But the reader of our map will not know what these sizes represent. It would help interpret the map better if we had a legend. Let’s set a legend for this layer. Click the Marker symbol. At the bottom of the panel, select Advanced → Data-defined Size Legend.

  1. Choose the Collapsed legend option. Check the Manual size classes option. Click the + button to add the class definition manually.

  1. Set three different size classes for 5000, 50000 and 500000. You will see a legend in the Layers panel showing the circle sizes and corresponding fatality values.

  1. Click the Save Project button to save your work.

We have now finish the second part of this exercise. Your output should match the contents of the Earthquakes_Checkpoint2.qgz file in the solutions folder.

1.2.1 Challenge

QGIS has many rich cartography features. One of my favorites is called Live Layer Effects. This allows you to add effects such as Outer Glow, Drop Shadow, etc., to each symbol. This takes your symbology to the next level and helps highlight certain features. Select the large_earthquakes layer and open the Layer Styling Panel. Expand the Layer Rendering section and enable Draw effects. Click the Customize effects button and add a drop shadow effect to the layer.

1.3 Labelling

Labels are a useful way to convey additional information for any features. Labels are associated with each feature and can be configured to show information from the attributes. We will now add labels to each of the large earthquake points to show the name of the location as well as the deaths caused due to that earthquake.

  1. Before we proceed further, let’s change the projection of our map to a more appropriate one. The preferred and modern choice for global maps is the Equal Earth projection. It is much more visually appealing and also preserves relative areas of continents. Go to Project → Properties….

  1. Switch to the CRS tab. Search 8857 and select the WGS84 / Equal Earth Greenwich CRS. Click OK.

  1. Back in the QGIS Window, you will notice that the shape of the map looks different now. The bottom-right corner also displayed the current project CRS EPSG:8857. Select the large_earthquakes layer and open the Layer Styling Panel. Switch to the Labels tab.

Note: Changing the Project CRS does not change the CRS of the layers, but reprojects them on-the-fly to the chosen CRS for display.

  1. Select Single Labels. We will combine the values from multiple attributes for the label using an expression. Click the Expression button next to Value.

  1. Locate the Location Name attribute under Fields and Values group. Double-click to add it to the expression. You can check the Preview at the bottom to see the result of the expression.

  1. We will create a longer label text by combining multiple attributes. You can use the || operator in the QGIS expression to concatenate strings. Create the expression as shown below. Click OK.
"Location Name" ||  ';' || 'Deaths:' || "Total Deaths"

  1. The labels will be rendered next to the points on the canvas. Select the Formatting tab. At the bottom, enter ; as the value for Wrap on character and 20 characters as the value for Wrap lines to. This will break the labels into multiple lines and make them readable.

  1. Next, we will change the appearance of the label text. Switch to the Text tab. Change the Size to 8 and set the Color to white.

  1. A useful labeling technique is to add a background to the labels to improve legibility. Switch to the Background tab and enable Draw background. Set the Color to black. Also, set the Size X and Size Y of Buffer to be 1 point. At the bottom, set the Radius X,Y to 5.

  1. To attach the labels to each symbol, you can use a leader line. Switch to the Callouts tab and check Draw callouts.

  1. As we have only a few labels, we can adjust their placement manually to fit the layout better. Close the Layer Styling Panel. Right-click anywhere on the Toolbar and select the Label Toolbar to activate it.

  1. Once the Label Toolbar is activated, you will see new tools. Select the Move a Label, Diagram, or Callout tool.

  1. Click on any of the labels. The first time you do this, you will be prompted with an Auxiliary Storage: Choose Primary Key dialog. This is used to store the position of the labels for each label. You can choose any field containing unique values. In our case, the default value fid is fine. Click OK.

  1. Click on the label again to start moving it. Drag your mouse to the new label location and click again to move the label.

Tip: If a label disappears after moving it, it means that it has been placed at a position that cannot be displayed on the map without colliding with other labels. You can see the unplaced labels by clicking the Toggle Display of Unplaced Labels button on the Label Toolbar. Once the label is shown, you can move it to another spot where it can be displayed.

  1. Similarly, move other labels to appropriate locations. Once you are satisfied, save your work.

We have now finished the third part of this exercise. Your output should match the contents of the Earthquakes_Checkpoint3.qgz file in the solutions folder.

1.3.1 Challenge

The numbers displayed in the labels can be hard to read since they are not formatted. We can make them readable by adding a thousand-separator. So a number such as 227899 is displayed as 227,899 and a number like 5749 as 5,749. Update the expression for the labels, so the numbers are formatted. To achieve this, you can use the format_number() function in the QGIS expression editor.

Concept: Joins and Data Normalization

View Presentation

View the Presentation ↗

2. Visualizing Spatial Data

In this section, we will learn basic data processing and visualization techniques. We will use geographic boundaries and population count data for the City of New York and create a population density map. This requires doing a table join and using a graduated symbology to create a choropleth map.

2.1 Table Join

  1. Open QGIS. The first step is to import the source datasets. Click on the Open Data Source Manager button.

  1. Select the Vector tab. Click the button next to Vector Dataset(s) and browse to the data directory. Select the nynta2010.shp file and click Open. In the Data Source Manager window, click Add.

  1. You may be prompted to Select Transformation for nynta2010. The source shapefile is in the EPSG:2263 NAD83 / New York Long Island (ftUS) projection whereas the default projection in QGIS is EPSG:4326 - WGS84. This dialog presents several transformations to convert between the coordinates between these projections. Choose the first option and click OK.

  1. A new layer nyta2010 will be added to the Layers panel and will be displayed on the Canvas. This layer contains polygons representing the Neighborhood Tabulation Areas (NTAs) for New York city. Right-click on the nyta2010 layer and select Open Attribute Table.

  1. Examine the attributes of the layer. The NTACode field contains a unique identifier for each polygon. Notice that we do not have any population or demographic attributes in this layer.

  1. The population and other demographic datasets are typically distributed as tables. These tables would have the same unique identifier as the shapefile that can be used to merge the relevant fields to the vector layer. Let’s import a table representing New York City Population By Neighborhood Tabulation Areas. Click on the Open Data Source Manager button.

  1. Switch to the Delimited Text tab. Browse to the New_York_City_Population_By_Neighborhood_Tabulation_Areas.csv file and select it. Since this CSV file is just tabular data, select No geometry (attribute only table) option and click Add.

  1. Once the new tabular layer New_York_City_Population_By_Neighborhood_Tabulation_Areas is added to the Layers panel, right-click on it and select Open Attribute Table.

  1. This table has a Population column with the population for each of the tabulation areas. We also have the NTA Code column containing the same codes as our nynta2020 layer. We can use this column to join this table with the vector layer.

  1. If you double-click the NTA Code column to sort the table by the code, you will notice that each neighborhood has 2 records of population. For this exercise, we want to use the population for the year 2010. We will apply a filter to select only the population records for 2010.

  1. In the main QGIS window, right-click the New_York_City_Population_By_Neighborhood_Tabulation_Areas layer and select Filter.

  1. In the Query Builder dialog, enter the filter expression as below. You can also double-click the column name to insert them in the expression. Click OK.
"Year" = 2010

  1. Now we will do the table join. Open Processing → Toolbox from the main menu at the top.

  1. Search and locate the algorithm Vector general → Join attributes by field value and double-click to launch it.

  1. In the Join Attributes by Field Value dialog, select nynta2010 as the Input layer and NTACode as the Table field. Select New_York_City_Population_By_Neighborhood_Tabulation_Areas as the Input layer 2 and NTA Code as the Table field 2. Click the button next to Layer 2 fields to copy.

  1. We want to copy only the population data, so select the Population field and click OK.

  1. Next, we need to configure the output. Click the button next to Joined layer and select Save to File….

  1. Browse to the data directory and name the output as nynta_with_population. Make sure the file type is selected as GPKG files (*.gpkg). Click Save.

  1. Once the configuration is complete, click the Run button.

  1. Upon completion of processing, a new layer nynta_with_population will be added to the Layers panel. Right-click the layer and select Open Attribute Table. You will see that we now have an additional column Population in the attribute table. The table also has a column Shape_Area containing the area of each polygon in Sq.Ft.

  1. Our goal is to map the population density. We can use the population count and area columns and calculate a new column for population density. From the Processing Toolbox, search and locate the algorithm Vector table → Field Calculator and double-click to launch it.

  1. In the Field Calculator dialog, enter Density as the Field Name. We will now build the expression to calculate population density. From the function groups next to the Expression panel, expand the Fields and Values section. Double-click the Population field to add it to the expression editor. Note that fields are referred using double-quotes (“) in QGIS.

  1. Select the / button to enter the division operator and then click the Shape_Area field to enter it. You may also type the expression instead of picking the values from the dialog. The final expression should look like below.
"Population" / "Shape_area"

  1. This will give us the population density in persons per square feet. A more appropriate unit for population density is persons per square miles. Let’s convert the value to miles using the conversion factor or 1 mile = 5280 ft. Change the expression as shown below. Once done, select Save to File.
5280 * 5280 * ("Population" / "Shape_area")

  1. Name the output as nynta_population_density.gpkg and click Run.

  1. Once the processing finishes as new layer nynta_population_density will be added to the Layers panel. Open the attribute table and verify that you have a new column named Density.

  1. We don’t need the other layers in our project. Hold the Shift key and select all layers except nynta_population_density. Right-click and select Remove Layer….

  1. Let’s save our work so we can retrieve it later. Go to Project → Save.

  1. Save the project as NYC_Population_Density and click Save. QGIS will save the project file in the QGZ format.

We have now finish the first part of this exercise. Your output should match the contents of the NYC_Population_Density_Checkpoint1.qgz file in the solutions folder.

2.1.1 Challenge

Round the population density values to the nearest integer and store them in another column named Density_Round.

  • Hint1: Use the Field Calculator algorithm from the processing toolbox.
  • Hint2: The QGIS expression engine has a function named round() that can round a fraction to the chosen number of decimal places.

2.2 Creating a Choropleth Map

  1. Continuing the exercise, we will now visualize the spatial distribution of population density in form of a choropleth map. From the Layers panel, click the Open the Layer Styling panel button.

  1. Select the Graduated renderer.

  1. As we want to map the population density, choose Density as the Value.

  1. Next we select a color ramp. Click the drop-down button next to Color Ramp, select All Color Ramps and pick the YlOrBr (Yellow-Orange-Brown) ramp.

  1. Change the Classes value to 6 and click Classify. You will see each polygon colored according to the population density attribute.

  1. The default mode of classification is Quantile - which divides the input data such that all 6 classes have approximately equal number of features. There are other modes of classification as well. You can learn more about Data Classification Modes in the QGIS Documentation. We can also define custom data ranges for each class. Click on the Values column for the first row in the classification table. Change the Upper value to 20000.

  1. Similarly change other class ranges so they become easy to interpret. The last row contains the upper value to the maximum value in the dataset. Instead of displaying the maximum value, we can update the label. Click the Legend column for the last row.

  1. Change the label to > 100000.

  1. Now you have an informative visualization of population density in New York city with an easy to interpret legend. Click the Save button to save your work. All your visualization settings are saved along with the project, so next time when you load this project in QGIS, you will see the same visualization.

We have now finish the second part of this exercise. Your output should match the contents of the NYC_Population_Density_Checkpoint2.qgz file in the solutions folder.

2.2.1 Challenge

Create a new layer containing all the neighborhood tabulation areas having a population density > 100000.

Hint: You can use the Extract by Attribute algorithm from the Processing Toolbox.

Assignment

The following assignment is designed to help you practice the skills learnt so far in the course and explore the Print Layout.

Your task is to design a map of Population Density of New York City in QGIS Print Layout. You can take the choropleth map created in the previous section 2.2 Create a Choropleth Map and design a map in the Print Layout. In addition to the rendered vector layer, the map must have at least the following elements:

You may also optionally add other elements such a north arrow, logos etc. Feel free to use your creativity to customize the style and map layout. Below is an example map for inspiration.

Example Map Design for Assignment

Example Map Design for Assignment

3. Georeferencing

Georeferencing is the process of assigning real-world coordinates to each pixel of the raster. This is an important step in preparing your data for further analysis. Many projects, particularly machine learning projects - need continuous historic records to build a model. Many of the older datasets may come in form of scanned maps or aerial photos that needs to be georeferenced. Similarly, some organizations may only share a PDF or a static map image of the dataset which will need to be converted into a GIS-ready format using georeferencing process.

Georeferencing process involves collecting GCPs (Ground Control Points) or Tie-Points. These GCPs are easily identifiable features in the image or map whose real-world coordinates are obtained from field-survey using a GPS device or identified from already georeferenced sources within a GIS.

In this exercise, you’re going to georeference an old scanned map of Bangalore, India created in 1924. This map is possibly hand-drawn and has no coordinate markings, so we will use a tiled basemap layer to locate the features and obtain the GCPs.

3.1 Using Basemaps

  1. Open QGIS. We will use a plugin called QuickMapServices to load a basemap. From the Plugins menu choose Manage and Install Plugins….

  1. The Plugins dialog contains all the available plugins in QGIS. Under the All tab, search for quickmapservices. It has different basemaps that can be used based on your purpose. Click on the Install Plugin, to add this plugin to QGIS.

  1. Once installed, check the box next to the QuickMapServices label to enable it. Click Close.

  1. Now you will see a new Web menu added to the menu-bar. Go to Web → QuickMapServices menu. You will see some map providers and available basemaps. We can enable a few more providers to have many more options. Click on the Web → QuickMapServices → Settings.

  1. In the Settings dialog, switch to the More services tab. Click on the Get contributed pack to download 3rd-party basemaps.

You will see a warning against using contributed services. Some of these services may have restrictions on their usage and/or attribution requirement that you need to follow. Please review them before using them in your project.

  1. Once the new services are added, you will see many more options in the Web → QuickMapServices menu.

  1. For our current task, we will use a basemap based on OpenStreetMap data. Since we need to locate the features in the scanned maps let’s add the OSM Standard. Click on the Web → QuickMapServices → OSM → OSM Standard.

  1. Now in the canvas, the basemap will be loaded. This map is georeferenced and projected in EPSG:3857 CRS. This information can be viewed on the bottom-right of the QGIS, where the project CRS will be updated.

We have now finished the first part of this exercise. Your output should match the contents of the Georeferencing_Checkpoint1.qgz file in the solutions folder.

3.1.1 Challenge

Load the Stamen Watercolor basemap by Stamen. This is an award-winning basemap that renders OpenStreetMap data in a hand-painted watercolor style.

3.2 Using the Georeferencer

In this section, you will learn how to load a scanned image, collect GCPs (ground control points) and warp the image to create a GeoTiff file.

  1. Click on the Layer → Georeferencer from the menu-bar to open the georeferencing tool.

  1. A new Georeferencer window will open. This tool primarily contains two sections, the top is for viewing images, and the bottom is for tabular data.

  1. Click on the Open Raster.. button from the ribbon to load the scanned map. Browse to the Bangalore_1924.png file in your data package and click open.

  1. The image will be loaded in the Georeferencer window. The ribbon at the top has buttons essential operations like Zoom/Pan that you can use to navigate around the image. There are also buttons for adding and editing GCPs (Ground Control Points) that we will use next.

  1. Before we start collecting GCPs, you may also dock the Georeferencer window to the main QGIS window. This makes it easier to find the tie-points. Close the GCP table window. Click and drag the georeferencer title bar to the bottom of the canvas.

  1. Locate a feature that is visible in both the scanned map and the basemap. Click on the Add Point button.

  1. The Enter Map Coordinates dialog will appear. Click on the From Map Canvas button.

  1. Click on the QGIS main canvas at the visible feature on the basemap. This will fill the coordinates of that point in the CRS of the basemap. Click OK to close the dialog.

  1. Similarly, go ahead and find more GCPs. The best features to look for are rail and road intersections, building corners, city squares, or any other corners and edges. Depending on the method to transform type you would use, you need a minimum different number of points. For this exercise, we will be using a Polynomial transformation which requires a minimum of 6 GCPs. Learn more about Transformation Algorithms in the QGIS Documentation.

  1. Let’s view the points in a tabular format. If you had docked the window, click on the pop-out button in Georeferencing window to detach this tool from the main QGIS. Go to View → Panels → GCP table.

  1. The GCP table dialog will display the details of each point. Click on the Transformation Settings… button.

  1. In the Transformation Settings dialog, choose Polynomial 2 as Transformation type. Choose Nearest Neighbour as Resampling method and EPSG:3857 as Target SRS. Click on the in Output raster and save the file as Banglore_1924_modified.tif. Choose LZW in Compression. Finally, check the Save GCP points and Load in QGIS when done boxes. Click OK to save this setting and close the settings dialog.

  1. Note the Residual (pixels) columns will not display the error for each GCPs based on the chosen transformation algorithm. Lower error means the transformation would be able to accurately assign the chosen coordinate to the pixel.

  1. If you have a GCP with high residual, you can use the Move point to adjust the point’s position or Delete point to remove it. Before georeferencing, a minimum of six GCPs should be available.

  1. Repeat this process till you are satisfied with the the total mean error and the residuals.

  1. Click on the Start Georeferencing button to georeference the scanned map.

  1. In the main QGIS canvas, the georeferenced layer Banglore_1924_modified will be added and overlayed on the basemap.

We have now completed all steps and you should have a georeferenced image.Your output should match the contents of the Georeferencing_Checkpoint2.qgz file in the solutions folder. We have also provided the saved GCPs in the solutions folder. To load them, you can go to File → Load GCP Points.. and select the bangalore_gcp.points file in the solutions/ folder of the data package.

3.2.1 Challenge

In this exercise we used the Polynomial 2 technique. For datasets that require more aggressive transformation, you can use the Thin Plate Spline algorithm. This method is also known as Rubber Sheeting. Change the transformation setting to use Thin Plate Split and run the georeferencer again. Compare the output with the previous result.

4. Data Editing

Many GIS tasks require editing existing data layers or creating new datasets. Often a large amount of GIS time is spent digitizing raster data to create vector layers that you use in your analysis. Many machine learning projects also require creating a labeled dataset that needs to be made by digitizing features from satellite imagery or historical maps. QGIS has powerful on-screen digitizing and editing capabilities that we will explore in this tutorial.

In this exercise, you will create a vector layer of historic lakes in the city of Bangalore, India. This city has experienced urbanization at a rapid pace. Due to this, many water bodies have been lost. We will use the georeferenced scanned map from the previous exercise to digitize polygons for all the water bodies in 1924 and label their current status. Finally, we will create a vector layer and assign attributes to them, indicating whether they are healthy, lost, or partially lost.

4.1 Attribute Forms

We will first create a new layer and configure the attribute form to capture the data about the features.

  1. Go to Project → Open and browse to the data package. Select the Digitizing.qgz project and click Open.

  1. This project contains the OSM Standard basemap and the Banglore_1924_modified georeferenced scanned map. To digitize the waterbodies, let’s create a new vector layer. Click on the Layer → Create Layer → New GeoPackage Layer… from the menubar.

  1. In the New GeoPackage Layer dialog, click on the next to Database and browse to the project location. Enter the file name as banglore_lakes and click OK. Now the Table name will be auto-populated as banglore_lakes. Choose MultiPolygon as Geometry type. Let the CRS be in default EPSG:4326 projection.

  1. In the new layer, let’s add some basic fields. First, let’s add the name field. Under the New Field section, in Name enter name, in Type choose Text Data, and in Maximum length, enter 50. Now click Add to Fields List button.

  1. The Fields List section will get updated. Similarly, add a status filed with Type as integer. Click OK to close the dialog.

  1. Now the banglore_lakes layer will be added to the Layers tab.

  1. let’s inspect the attribute table of the new layer. Right-click on the banglore_lakes layer and click Open Attribute Table.

  1. In the bangalore_lakes attribute table, there are three fields. fid is an integer field which is required by the GeoPackage format and is autogenerated. The name and status must be entered while digitizing the waterbodies. Close the attribute table.

  1. Again right-click on banglore_lakes layer and click Properties to open the properties dialog.

  1. In the Layer Properties dialog, choose Attribute Form. Under Fields, select status. Choose the Widget Type as Value Map. Value Map allows us to create a drop-down menu that allows you to pick from a predefined set of values.

  1. The status field will be set as a drop-down with these three values. Enter the value and description as follows
Value Description
1 Healthy
2 Partially Lost
3 Lost

  1. We require this field’s input must be given for every feature, so let’s make this a mandatory field. Under Constraints check Not null and Enforce not null constraint. Click OK to save the changes and close the properties dialog.

Save the project. Your results should match the contents of the Digitizing_Checkpoint1.qgz file in the solutions folder.

4.1.1 Challenge

The fid column contains auto-increment unique id for each feature.The GeoPackage format requires this integer field to maintain data integrity. Manually overriding this id to a different value can cause data corruption. Edit the attribute form for the fid field so that it is not user-editable.

4.2 Digitizing Polygons

  1. Before we start digitizing, let’s enable the snapping toolbar. This toolbar will help select the nearby vertices and avoid invalid geometries. Right-click on the toolbar and check the Snapping Toolbar to enable it.

  1. Now the snapping toolbar will be added to the main QGIS.

  1. Enable snap setting by clicking Enable Snapping button. Zoom to any part of the map containing a waterbody. Toggle the visibility of the Banglore_1924_modified layer and check if the lake exists in the current basemap. This will be helpful in entering the attribute of the lake you will digitize.

  1. Turn the Banglore_1924_modified on. Select the banglore_lakes layer and start digitizing the waterbody. Click the Toggle editing followed by the Add Polygon Feature button. Starting from an edge, keep on adding vertices using left-click. Once the polygon is fully digitized, right-click to complete it.

  1. Now you will be prompted by the Feature Attributes dialog to enter the attributes values for the feature you just created. By referring to the OSM Standard basemap, enter the name and select the status of the waterbody. Click OK

  1. Now the polygon show the fully digitized feature.

  1. Zoom to another waterbody and digitize with the same process. If the waterbody name is unavailable in both basemap and scanned map, leave the name field empty and select the status of the waterbody. Click OK.

  1. Digitize all the available waterbodies. Once completed, click on the Save Layer Edits button and turn off the editing mode by pressing the Toggle Editing button.

  1. let’s inspect the attribute table of the digitized layer. Now right-click on the banglore_lakes layer and click Open Attribute Table.

  1. The Attribute Table contains 24 digitized features. You can note the fid column is auto-populated with a unique value for each record. Close the attribute table.

  1. Now let’s save the project, click Project → Save.

Your results should now match the contents of the Digitizing_Checkpoint2.qgz file in the solutions folder.

4.2.1 Challenge

Style the layer based on the status column. This column has categorical values that can be used assign a different color to each waterbody.

Hint: Use the Categorized renderer.

Concept: Introduction to OpenStreetMap

View Presentation

View the Presentation ↗

5. Geoprocessing

Geoprocessing refers to the set of operations used to transform the input data to create a new dataset. In this section you will learn about some essential vector and raster geoprocessing tools to solve a complex spatial analysis problem.

In this analysis, we will learn how to download vector data from OpenStreetMap and use it to determine the number of people who live within 1 km of a metro station.

5.1 Download OpenStreetMap Data

  1. Open QGIS. We will first load a layer with the boundary for the city of Bengaluru, India. This file comes in the GeoJSON format. Click on the Open Data Source Manager button. Select the Vector tab. Click the button next to Vector Dataset(s) and browse the data directory. Locate the bangalore.json file and click Open. In the Data Source Manager window, click Add.

  1. Now, we will query the OpenStreetMap database to get a vector layer of railway stations in the city. We will use the QuickOSM plugin for this task. From the Plugins menu, choose Manage and Install Plugins…. Under the All tab, search for quickosm. Click on the Install Plugin, to add this plugin to QGIS.

  1. Once installed, go to Vector → QuickOSM → QuickOSM.

  1. OpenStreetMap uses a tagging system to record properties of physical features. The tags and the key/value pairs are described in OpenStreetMap wiki. The railway stations are tagged with railway=station. Enter railway as the Key, station as the Value and Bangalore as the geographic filter In. Expand the Advanced section and check only the Node and Points boxes. Click Run Query.

  1. Once the query finishes, you will see a new layer, railway_station_Bangalore loaded in the canvas. The city has railway stations operated by two different agencies - one that operates the intercity trains and other that runs the metro service. We can apply a filter to select only the metro stations. Right-click the layer and select Filter.

Note: As the metro network for the city is rapidly growing, there maybe a few more stations resulting from the query compared to shown in the screenshot.

  1. In the Query Builder, enter the following expression to select the stations operated by the Bangalore Metro Rail Corporation Limited - the agency that operates the metro network in the city. Click OK.
"operator" = 'Bangalore Metro Rail Corporation Limited'

  1. Now, the map will update to show all the operational metro stations in the city.

  1. You will notice a memory icon next to the railway_station_Bangalore layer in the Layers panel, indicating that it is a temporary scratch layer. If we do not save it, it will go away when we close QGIS. Right-click on it and select Make Permanent.

  1. Click next to File name and save the file as railway_station_Bangalore.gpkg. Click OK. We have now saved the layer to the disk. Save your project as Geoprocessing.qgz.

We have now completed the data download and pre-processing steps. Your output should match the contents of the Geoprocessing_Checkpoint1.qgz file in the solutions folder.

5.1.1 Challenge

You will notice that the attribute table for the railway_station_bangalore layer has many columns. Open the attribute table and delete all the columns except the fid and osm_id columns. Hint: Use the Delete field tool from the attribute table.

5.2 Reproject and Buffer

  1. Now, the map will update to show only the operation metro stations in the city. Next, we need to apply a Buffer to these stations to find areas within 1km. But our data comes in the EPSG:4326 WGS84 Geographic Projection - which has degrees as units. To do geoprocessing operations on this layer in projected units such as kilometers, we must first reproject the layer in a suitable projected coordinate reference system (CRS). Go to Processing → Toolbox. Search for and locate the Vector general → Reproject layer algorithm. Double-click to launch it.

  1. Choose railway_station_Bangalore as the Input layer. Select EPSG:32643 - WGS 84 UTM Zone 43N as the Target CRS. Name the Reprojected layer as metro_stations_reprojected.gpkg.

  1. Once the reprojected layer metro_stations_reprojected is created, search for the Vector geometry → Buffer algorithm and double-click to launch the algorithm.

  1. Select metro_stations_reprojected as the Input layer. Enter 1 kilometers as the Distance. Check the Dissolve result option and name the Buffered output layer as metro_stations_buffer.gpkg. Click Run.

  1. The layer now has a polygon representing areas within 1km of a metro station. Now that we are done with the geoprocessing operation, let’s convert the result back to the original projection to use it and other layers. Search for Vector general → Reproject layer algorithm and launch it.

  1. Select metro_stations_buffer as the Input layer and EPSG:4326 -WGS 84 as the Target CRS. Name the output as metro_station_buffer_reprojected.gpkg. Click Run.

  1. A new layer metro_station_buffer_reprojected will be added to the canvas. We will now remove the intermediate layers from the project. Hold the Shift key and select the metro_station_buffer, metro_stations_reprojected and railway_station_Bangalore layers. Right-click and select Remove Layer….

  1. We now have a layer representing the area within 1 km of a metro station within the city of Bengaluru.

Your output should match the contents of the Geoprocessing_Checkpoint2.qgz file in the solutions folder.

5.2.1 Challenge

Your data package contains a dataset called bangalore_pubs.gpkg with the location of all pubs within the city. Select all the pubs from the layer within 1km of a metro station. Hint: Use the Select by Location tool from the Processing Toolbox.

5.3 Calculate Zonal Statistics

We will now use a population grid and overlay the buffered polygon to calculate the number of people who live within the buffer zone.

  1. Click on the Open Data Source Manager button. Select the Raster tab. Click the button next to Raster Dataset(s) and browse the data directory. Locate the bangalore_ppp_2020_constrained.tif in the data package. Click Add.

  1. Select the bangalore_ppp_2020_constrained layer and use the Identify tool to explore the pixel values. The resolution of the raster is 100m x 100m per pixel. The raster layer has only one band, and the pixel value is the estimated number of people within the 100 sq.km area. Click the Pan Map button (Hand icon) to exit the Identify mode.

  1. We can determine the total population by summing up the values from the pixels that fall within a polygon. This operation is known as Zonal Statistics. Search for and locate the Raster analysis → Zonal statistics algorithm. This algorithm would add a new attribute to each layer with the total population contained within the polygons—double-click to launch it.

  1. In the Zonal Statistics dialog, select the metro_station_buffer_reprojected as the Input layer and bangalore_ppp_2020_constrained as the Raster layer. Enter population_ as the Output column prefix. Click the button next to Statistics to calculate and choose only Sum. Finally, next to Zonal Statistics output, click the ... and save the layer as metro_station_buffer_pop.gpkg Click Run.

  1. Now, a new layer, metro_station_buffer_pop, will be added to the canvas. Right-click the layer and select Open Attribute Table. You will see a new field population_sum field containing the population within the buffer polygon.

Your output should match the contents of the Geoprocessing_Checkpoint3.qgz file in the solutions folder.

5.3.1 Challenge

Repeat the Zonal Statistics operation on the bangalore layer to calculate the city’s total population. Determine what percentage of the city population lives within 1km of a metro station.

Data Credits

License

This course material is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to re-use and adapt the material but are required to give appropriate credit to the original author as below:

Introduction to QGIS Course by Ujaval Gandhi www.spatialthoughts.com


This course is offered as an instructor-led online class. Visit Spatial Thoughts to know details of upcoming sessions.


© 2022 Spatial Thoughts www.spatialthoughts.com


If you want to report any issues with this page, please comment below.