Note: This course is no longer being maintained. You can check out our new Introduction to QGIS course instead.
This class is a broad introduction to working with location datasets. We will cover a wide range of use-cases and applications that give you hands-on experience in techniques for visualizing mapping data and deriving insights from them. This class assumes no prior knowledge of GIS/Remote Sensing and suitable for practitioners of all disciplines. We will use the open-source program QGIS for all the exercises.
This course requires QGIS LTR. Please review QGIS-LTR Installation Guide for step-by-step instructions.
QGIS offers an easy way for developers to extend the core functionality of the software using plugins. Plugins can be installed from QGIS from Plugins → Manage and Install Plugins... To install a plugin, switch to the All tab and search for the plugin. Once you find it, select and click Install Plugin.
For this class, we will be using the following plugins. Go ahead and install them.
The exercises in this class use a variety of datasets. All the
required layers, project files etc. are supplied to you in the file
spatial_data_viz.zip
. Unzip this file to the
Downloads
directory.
Download spatial_data_viz.zip.
This class needs about 1-hour of pre-work. Before starting the exercises, it is important to understand how spatial data is modeled and learn about coordinate reference systems.
Please watch the following video to get a good understanding of GIS concepts. Video can be streamed using video link below.
After you watch the video, please complete the following quiz to test your understanding. Quiz is open to everyone.
“Everything is related to everything else, but near things are more related than distant things.” - Waldo Tobler’s First Law of Geography
When modeling and analyzing our world, location is a critical factor. A non-spatial model cannot accurately reflect the processes and interactions happening in our world. Take this example - predicting housing prices - where a spatial prediction model performed much better than a purely non-spatial one.
Today the availability of location data - both for individuals and businesses - has exploded. Spatial data adds another dimension to data, and reveals patterns that are otherwise not obvious.
Individuals - with GPS sensors on their smartphones - have the ability to tag their data with location. Photos taken with smartphones have the location embedded in it. If opted-in, one can store and access their location history on an ongoing basis.
Most businesses have location data in one form or the other. Customer addresses, IP-locations of website visitors, sales territories, supply chain routes and so on. For other businesses - such as taxi aggregators, food delivery, logistics - generate huge amounts of location data that can be mined for intelligence.
IoT (Internet-Of-Things) devices are collecting location data continuously alongside with sensor data.
Governments are also increasing collecting and sharing location based data. Data relating to urban infrastructure, census, LIDAR and aerial imagery etc. are being collected at massive scale. Many governments have implemented open data sharing policies - making this data available to individuals and businesses to use.
The spatial data model consists of 2 parts: geometry + properties
Geometry (Shape) is defined with coordinates and a coordinate reference system Properties (Attributes) is defined with data and data types
Consider the following representation of a city as a point.
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [ 77.58270263671875, 12.963074139604124]
},
"properties": {
"id": 1,
"name": "Bengaluru"
}
}
This representation is in GeoJson
format. The point geometry is defined with X (Longitude) and Y
(Latitude) coordinates. The point is assigned 2 properties - id
with a value of 1
, and name with the value of
Bengaluru
. The GeoJson format supports only 1 type of
Coordinate Reference System (WGS84), so we do not need to specify it
explicitly.
We saw a basic way to represent the spatial data. But there is a variety of data formats to represent the data to suit different applications. In most cases, spatial data formats are an extension of existing data formats.
Type | Non-Spatial Data | Spatial Data |
---|---|---|
Text | csv, json, xml | csv, geojson, gml, kml |
Binary/Compressed | pdf, xls, zip | shapefile, geopdf, geopackage |
Images | tiff, jpg, png | geotiff, jpeg2000 |
Databases | SQLite, PostgreSQL, Oracle | Spatialite, PostGIS, Oracle Spatial |
Spatial Data can be broadly categorized into 2 types - Vector and Raster. For serving these data on the web, they are usually cut into smaller chunks (tiles), so they can be categorized as the 3rd type.
Type | Sub Types | Examples |
---|---|---|
Vector | Point | Sensor Observations, Places |
Line | GPS Tracks, Roads, Rivers, Contours | |
Polygons | Administrative Boundaries, Buildings | |
Point Cloud | LIDAR surveys | |
Raster | Photos | Aerial and Drone Photos |
Grids | Satellite Imagery, Elevation Data | |
Mesh | Climate and Scientific Data | |
Tiles | Raster Tile Layers | Web Maps |
Vector Tile Layers | Web Maps |
If there was one thing that makes spatial data ‘special’ - it would have to be Coordinate Reference System (CRS) or Spatial Reference System (SRS).
A Map Projection transforms the earth from its spherical shape (3D) to a planar shape (2D).
A Coordinate Reference System (CRS) then defines how the 2D map relates to real places on the earth.
The QGIS Documentation provides a comprehensive introduction to the topic.
There are hundreds of different map projection and CRS - each with different properties and uses. The most important thing to remember is that every projection distorts the map in some way. This mashup of map projections distortions provide a useful visual reference to population projection. For a more in-depth guide, you can refer to Jochen Albrecht’s guide to choosing a projection. A recent papaer shows the impact of projection choice for area and volume calculations.
So what projection should you use for your project? This being a vast and complex topic, often, the answer is - it depends. But following are guidelines that will help you.
+proj=eqearth +datum=WGS84 +wktext
The simplest representation of spatial data can be done using a table. A place can be represented using a pair of coordinates - Latitude and Longitude - with other attribute information about the place. Many spatial data source come in this form. Excel sheets, CSV files, database tables etc.
Worsening air quality is a severe problem in many countries around the world. India - particularly - Delhi suffers from acute problems of high pollution levels.One of the first steps to better understand the problem, is to have continuous monitoring of air quality across the cities. Many organizations have stepped up and setup such sensors that collect air quality data and make it publicly available. OpenAQ is a platform that collects this data from all public sources and makes it available in an easy to use form.
If you are interested in this air quality in India, Urban Emissions has a lot of relevant information and datasets.
We will take the sensor data for PM2.5 concentrations 1 day and map it. The aim is to turn this tabular data info an informative spatial data visualization.
For this exercise, we are using daily average data for Delhi, India for February 15, 2020. This data was downloaded from OpenAQ Data Download
openaq.csv
file in a text editor and examine
it. Each row of data contains data from 1 monitoring station. The
latitude
and longitude
column contain the
coordinates of the station and the value
contains the daily
average PM2.5 concentrationopenaq.csv
file and open it. As we want
to import this file as points, select Point coordinates. Choose
longitude
as X Field and latitude
as
Y Field. Choose EPSG 4326 - WGS 84
in Geometry
CRS. Click Add.Note: If you don’t see the OpenStreetMap monochrome, Go to Web → QuickMapServices → Settings, switch to More services, click
Get Contributed Pack
. Click Save.
Graduated
renderer and value
as the
Value column. Set the number of Classes to
6
and click Classify.The default classification mode is Equal Count - which is fine for this exercise. You can learn more about Data Classification Modes in the QGIS Documentation.
For the class ranges to have some meaning, we need to link them to the commonly used scale. India has adopted National Air Quality Index with the following definitions.
Note: The range boundaries includes the upper bound value but excludes the lower bound value. So if the range is 30-60, the range will include all values >30 and <=60. See the discussion here for more info.
RdYlGn
(Red-Yellow-Green) ramp.Single labels
and value
as
Value. Scroll down and check Formatted numbers and
change the Decimal places to 0
.@symbol_color
variable to add it to
the expression. Click OK.-5
.air_quality.qgz
.If you do not see the full extent of your map in the region, you can click the Set Map Extent to Match Main Canvas Extent button located in the toolbar under Item Properties for
Map 1
10
.Average PM2.5 Concentration (µg/m3)
and
date 15 February, 2020
.Data Source: Central Pollution Control Board, EPA AirNow DOS. Downloaded from OpenAQ.org
.
Now we will add a legend, so our users know how to interpret various
colors on the map. Go to Add Item → Add LegendOpenStreetMap monochrome
layer and set the
openaq
group label to Hidden.2
and check Split layers
button.Air Quality Index Category
.So let’s export our map to a PDF. Before exporting, switch to the
Layout tab. We are using the basemap layer from OpenStreetMap.
This layer is created using individual tiles that are zoom dependent.
Setting a higher export resolution will fetch higher resolution tiles
with different labeling scheme when exporting. You may experiment with
this value to get the right level of detail in the basemap. For this
particular exercise Export resolution to 100
dpi
works well. Go to Layout → Export as PDF.
delhi_air_quality.pdf
.In Adobe Reader, you can enable the measuring tool by going to Tools → Measure. Learn more
Many of our transportation infrastructure such as roads, bridges, railways etc. as well as natural features such as rivers, streams etc. can be modeled as lines. Other abstract concepts, such as contours and trajectories are also modeled using linear features. Shapefiles, GeoJSON, GPX are commonly used file formats for storing line datasets.
GPS tracks have become ubiquitous in modern life. With GPS built-into most phones, many of us capture the tracks while running or biking outdoors. Cab companies use GPS tracks collected during the trip to determine fares. Delivery and logistics companies store and analyze millions of GPS tracks from their assets to derive location intelligence.
We will use a GPS track I collected using the open-source GPS
Logger app on my Android phone while cycling to work. If you are on
iOS, I recommend the open-source app Open GPX
Tracker that can record GPS tracks. The default format for storing
GPS tracks is GPS Exchange
Format (GPX). It is a XML-based text format that allows storing
points, tracks and routes in a single file. We will use the data in
sample_gps_track.gpx
file and create an animated GIF
showing the trip.
sample_gps_track.gpx
file and drag it to the
canvas.track_points
and tracks
layers. Click
OK.sample_gps_track points
layer by un-checking the box next to it. Select the
sample_gps_track tracks
layer and click Open the Layer
Styling Panel. You can change the line Color to
Blue
and Width to 0.5
.sample_gps_track points
layer and select it. In the Layer Styling Panel, select
Simple marker symbol. Change the point Size to
1
. Choose a lighter shade of Blue as the Fill
color and a Transparent Stroke
as Stroke
color.gps_points
layer and choose Duplicate
Layer.sample_gps_track points_copy
layer, choose bright neon as
the Color from the color picker and increase the size to
1.5
. Check the Draw Effects option and click the
Effects button next to it.2.0
for both Spread and Blue
radius.sample_gps_track points_copy
layer and select
Properties.Single Field with Date/Time
as the
Configuration. Set time
as the Field.
Click OK.sample_gps_track points
layer and select
Properties.1
and
from the drop-down select seconds. Click the Temporal
Settings button on the top-right corner.10
.In the Title Label Decoration, click the Insert an Expression button.
The current timestamp of the map is stored in the
@map_start_time
variable. We can use it with the
format_date()
function to create a readable timestamp. But
note that the GPS timestamps are in universal time (UTC). So we can use
to_interval()
function to convert it to the UTC+5:30
timezone for India. Enter the following expression
format_date( @map_start_time + to_interval('5 hours 30 mins'), 'yyyy-MM-dd hh:mm')
24
. Set the Background bar color to
White
. Click OK.5
.
Click Save and QGIS will write an image for each time step to
the chosen directory.Regions are modeled as polygons. Polygons are most commonly used to model administrative areas, buildings, land parcels etc. Polygon geometry is represented as a series of coordinates. Since the shapes can be complex, polygons have a more verbose geometry descriptions and seldom come in a CSV files. GeoJSON and shapefile are the most commonly used file formats for storing polygon datasets.
Census data is one of the major sources of secondary data available in a country. Many types of spatial analysis requires detailed demographic information that is available from the census data.
Census data is usually published as tables by aggregating the raw numbers to an administrative region - typically a census block. To map these tables, one needs to know the geometry of these regions - which are supplied separately as boundary files. Both of these can be joined to create a polygon layer that can be visualized and mapped. See this tutorial on how this process is carried out in QGIS.
We will use India Village-Level Geospatial Socio-Economic Data Set published by NASA Socioeconomic Data and Applications Center (SEDAC). This dataset combines the village/town level boundaries with Primary Census Abstract (PCA) and Village Directory (VD) data series of the Indian census. It is distributed as shapefiles.
For this exercise, we will be using the shapefile for the state of Karnataka and map the literacy rate in the Gulbarga district.
india-village-census-2001-KA.shp
file in the
Browser panel and drag it to QGIS canvas.india-village-census-2001-KA
will be added
to the Layers panel. Use the Identify tool to click on
any polygon are explore the attributes. The definitions of each column
is contained in the documentation that is supplied with the data. As we
are looking to map the literacy levels, the attributes with
_LIT
suffix are useful for our purpose. The
P_LIT
column refers to Person Literates and
TOT_P
refers to Total Population that we will use
to calculate and map literacy rate.gulbarga_district.shp
layer that has been extracted from
the Districts shapefile supplied by DataMeet. The column
DT_CEN_CD
contains the district id for this particular
district. We can use this to filter the polygon layer.gulbarga_district
layer below the
india-village-census-2001-KA
layer. Right-click the
india-village-census-2001-KA
layer and select
Filter.DISTRICT = 4
to select all
villages and towns from our chosen district. Click OK.india-village-census-2001-KA
layer indicating that a filter
is applied to the layer. The map canvas will update to show only the
polygons belonging to the district.Click Open the Layer Styling Panel. Select Graduated renderer. In the Value column, click the Expression button.
100*("P_LIT"/"TOT_P")
There are many way to categorize your data into classes. This article gives a good overview with pros/cons of each mode.
gulbarga_district
layer. Change the Symbol layer
type to be Line pattern fill
. Change the spacing as
per your liking.-45.00
degrees.
You will see the gaps now rendered with a cross-pattern fill.Photos collected from airborne sensors - such as kites, hot-air balloons, planes, helicopters and more recently UAVs - are useful source of information for mapping. They often act as a basemap - providing context for other spatial data. They are also used to extract feature information that are modeled as vector data.
The most common format for imagery is GeoTiff. A geotiff file contains additional metadata that allow us to convert pixel location (row/column) to a real-world location (latitude/longitude). A regular photo can be converted to a spatially-aware raster through a process known as GeoReferencing.
OpenAerialMap is an open service to share and download overhead imagery. We will use an image of Kathmandu University Grounds shared by WeRobotics.
kathmandu_drone_imagery.tif
file and drag
it to QGIS. Use the Zoom/Pan tools to explore the imagery.2. At the bottom right, notice that the CRS is EPSG:32645 which refers to the UTM Zone 45N. Select the Identify tool and click anywhere on the image. You will see that the image contains 3 bands - one each for Red, Green and Blue. The coordinates are projected coordinates - not geographic coordinates. These are referred as X (Easting) and Y (Northing).
There are hundreds of Earth Observation Satellites in space continuously capturing images of the earth. Many space agencies around the world make this data available freely. These datasets are immensely valuable to scientists, researchers, governments and businesses.
The satellite images are different than regular photos because they contain information across many bands of wavelengths - not just - Red, Green and Blue. This rich information allows machine learning models to easily distinguish different objects. For example, an astroturf and real lawn may both look green, but reflect the infrared light very differently. So one can easily differentiate between these using the additional information contained in a multi-spectral image.
Sentinel-2 is a European Space Agency (ESA) mission with 2 satellites. The resolution of each pixel in the image is 10 meters. This is lower than drone or aerial imagery resolution, but still good enough for city and region level analysis. More importantly, the data is captured in 12 different bands - making it very useful for scientific applications. These mission capture every location on the earth every 5 days, allowing for continuous monitoring of the whole earth. ESA also makes all the data from this mission freely available.
We will load a Sentinel-2 image for Bangalore, India captured on 18
February, 2020. The data package directory
imagery/sentinel-2
4 files in JPEG2000 format. The 4 files
are for bands Red (B4), Green (B3), Blue (B2) and Near Infrared
(B8).
Table
and explore how the same object has
different reflectance in different bands.Virtual
will be added to the
Layers panel. This layer contains references to the 4 different
images. Note that the order of the bands in alphabetical, so the mapping
in the virtual raster is as follows:Virtual Raster Band | Image |
---|---|
Band 1 | B02 (Blue) |
Band 2 | B03 (Green) |
Band 3 | B04 (Red) |
Band 4 | B08 (NIR) |
We will first visualize a RGB Color Composite. This is also
referred as a Natural Color Composite since it is how the image
would be perceived by the human eye. Place Band 3
,
Band 2
and Band 1
as Red,
Green and Blue bands. Change the Min/Max Value
Settings to Cumulative count cut
. You will see the
image now appears in natural colors.
Modern mapping technology includes doing aerial surveys using a LiDAR sensor. LiDAR stands for “Light Detection and Ranging”. This sensor uses light pulses to determine the distance to the objects of the ground. For each light-pulse that is sent out the system computes X,Y and Z coordinates of the object. This data representation is not new for spatial data - but since survey of even a small area can result in millions of such points - standard tools for viewing and processing points do not work. Such a Point Cloud is typically stored in the LAS or LAZ formats.
UK’s Department of Environment Food & Rural Affairs (DEFRA) provides country-wide LiDAR data and products via the Defra Data Services Platform under an open license. We will use a point cloud dataset for Oxford University available as a LAZ file.
SP5008_P_10967_20161130_20161130.laz
file. Click
Open. Note that this small region is rendered with over 3.5M
source points.Heightmap Grayscale
option from the drop-down
selector.Raster data is well suited for modeling continuous phenomena, such as elevation. Each pixel of the raster is assigned the height as the value. This is a simple but effective way to model the terrain. Such raster are known as a Digital Elevation Model (DEM).
Digital Elevation Models fall into 2 broad categories
UK’s Department of Environment Food & Rural Affairs (DEFRA)
provides country-wide elevation data products via the Defra Data Services Platform
under an open license. We will use DSM and DTM dataset for Oxford
University available as a ASC files,
dsm_F0195499_20161130_20161130_mm_units.asc
and
dtm_F0195499_20161130_20161130_mm_units.asc
The .asc
file is in the text-based ASCII Raster File
Format. It is a simple data format that contains a header with
information about the raster and pixel values as rows/columns. If you
open any of the files in a text editor, it will appear as below:
cols 2000
nrows 2000
xllcorner 450000
yllcorner 208000
cellsize 1
NODATA_value -9999
-9999 -9999 -9999 -9999 -9999 -9999 ...
...
...
The header contains the X and Y coordinates of lower-left (ll) corner of the image. Knowing this 1 coordinate pair, the size of the image and the Coordinate Reference System (CRS) will allow us to geo-reference the entire image. The information about the CRS is contained in the metadata and is specified as EPSG:27700 British National Grid. We now have enough information to view these rasters.
EPSG:27700
and select it. Click OK.dsm_F0195499_20161130_20161130_mm_units.asc
file and drag it to the canvas. Select Identify and click
anywhere on the image. You will see that the Band 1 of the
image contains the elevation of the pixel in millimeters.Singleband pseudocolor
renderer. Expand the Min / Max
Value Settings section and select Cumulative count cut.
Select a color ramp of your choice. Once the style is applied you will
be able to see the building outlines, trees, riverbed etc clearly.dtm_F0195499_20161130_20161130_mm_units.asc
file and drag
it to the canvas. To enable easy comparison between the 2 layers, we
should visualize them with the same parameters. Fortunately QGIS
provides an easy way to copy/paste styles between layers. Right-click
the dsm_F0195499_20161130_20161130_mm_units
layer and go to
Styles → Copy Style.dtm_F0195499_20161130_20161130_mm_units
layer, right-click
and go to Styles → Paste Style.This course material is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to re-use and adapt the material but are required to give appropriate credit to the original author as below:
Spatial Data Visualization and Analysis Course by Ujaval Gandhi www.spatialthoughts.com
© 2020 Ujaval Gandhi www.spatialthoughts.com
If you want to report any issues with this page, please comment below.