Calculating Building Density in R with OSM data

Introduction

OpenStreetMaps is a great source of spatial data. Most common programming languages have packages for downloading data from OSM.

In this tutorial I will show how to download building data using R’s osmdata package, perform density analysis and plot it using ggplot, and interactively using tmap.

This requires some knowledge of spatial data structures.

#loading packages
library(osmdata)
library(dplyr)
library(ggplot2)
library(sf)
library(tmap)

Getting the Data

The first step requires downloading data from the OSM api using osmdata. First, I will need to set a bounding box for OSM to search for data. Next, I will download administrative boundaries for making a choropleth map. Finally, I will download building data for the defined bounding box.

Setting a search area

I will be searching for data in the city of Izmir, Turkey. In order to define my search, I will use the getbb command. This command takes a location name as a string and returns a bounding box defined by OSM. getbb can also return a bounding box as a polygon, this is useful as I want to search for buildings as defined by administrative boundaries.

Because the function returns a matrix, you can use your own polygon to define the OSM opq lookup (below), but it is important to make sure the CRS of the data is EPSG:3857.

#retrieve bounding box for region of interest
iz_bbox <- getbb("konak", format_out = "polygon")

Retrieve boundary lines

In this project I am looking for neighborhood boundaries of districts in Izmir. Because administrative level boundaries are not standardized across countries, it may take some research to figure out what administrative levels are needed. Luckily, there his a helpful tool that lets you find OSM boundaries within an interactive map.

Screenshot of the OSM boundaries map tool.

The numbers next to territory names signify the administrative level that is saved within OSM. In my case, I am looking for boundaries at level 8. I can pass this information using the osmdata request feature, which is a wrapper for the OSM API.

#retrieve level 8 administrative boundaries 
iz_boundary <- opq(iz_bbox) %>%
  add_osm_feature(key = "admin_level", value = "8") %>%
  osmdata_sf()

iz_boundary
## Object of class 'osmdata' with:
##                  $bbox : 38.3929644,27.068361,38.4508181,27.1896253
##         $overpass_call : The call submitted to the overpass API
##                  $meta : metadata including timestamp and version numbers
##            $osm_points : 'sf' Simple Features Collection with 6188 points
##             $osm_lines : 'sf' Simple Features Collection with 550 linestrings
##          $osm_polygons : 'sf' Simple Features Collection with 1 polygons
##        $osm_multilines : NULL
##     $osm_multipolygons : 'sf' Simple Features Collection with 166 multipolygons

(Note: OSM has many more map features available and can be queried using the right key:value pairs.)

The function returns an osmdata object, which is a list of sf dataframes. Each dataframe is composed of a different geometry type, this means that returned objects can be large. In order to get the right data, you need to know which geometry you are after.

In this case, the shape I need are multipolygon geometry types. OSM querys can end up returning large objects. In order to keep some space free, I will drop the extra dataframes. I’ll also clean up some of the data.

#select only df multipolygons
iz_polys <- iz_boundary$osm_multipolygons

#remove digits from any distirct name 
iz_polys$name <- gsub('[[:digit:]]+', '', iz_polys$name)
#remove . from any district name
iz_polys$name <- gsub("[.]", '', iz_polys$name)
#trim whitespace
iz_polys$name  <- trimws(iz_polys$name, "both")
#factorize 
iz_polys$name <- as.factor(iz_polys$name)

#calculate polygon areas for later analysis and append to new column
iz_polys$poly_area <- st_area(iz_polys)

#remove original osmdata object
rm(iz_boundary)

Now I want to check to see if I have the right boundaries.

ggplot(iz_polys) +
  geom_sf()

Looks good, but it has also returned the boundaries for all the outlying districts, as they probably fall along the bounding polygon line. That is something I can deal with later.

Retrieve Buildings

Using the same process as above, query OSM for all buildings within the original bounding polygon: iz_bbox.

iz_buildings <- opq(iz_bbox) %>%
  add_osm_feature(key = "building") %>%
  osmdata_sf()

This object is 100x larger than my boundaries object, so I’ll get rid of the excess data again. This time I will keep the polygon sf object.

build_polys <- iz_buildings$osm_polygons
rm(iz_buildings)

#drop unecessary columns
build_polys <- build_polys %>%
  select(osm_id, geometry)

But first, I’ll take a quick look at what I have so far.

ggplot(iz_polys) +
  geom_sf() +
  geom_sf(data = build_polys)

The buildings returned don’t really follow the borders of any polygon. I am not sure why this happens, but it is something that can be fixed.

Clipping point data

In order to filter out all of the outlying buildigns, I need to calculate them as points, which can be done by finding their centroids. I’ll create a new dataframe with the building centroids. Before I do that, I want to calculate the area of all the buildings, so the data carries over with the newly created dataframe. I won’t need the building polygons from this point on.

#calculate surface area of buildings
build_polys$area <- sf::st_area(build_polys)
#calculate centroids
build_cents <- sf::st_centroid(build_polys)

I will also need to create a polygon geometry in order to filter the outlying buildings. I can do ths using the original bounding box iz_bbox. But, because this object is a matrix, I need to convert it to an sf object. This can be done using sf_headers::sf_polygon.

#create a shape object out of original bounding polygon
iz_bbox_geom <- 
  sfheaders::sf_polygon(as.data.frame(iz_bbox),
  x="V1",
  y="V2"
)
#make sure that the projection matches the other data
st_crs(iz_bbox_geom) <- 4326

#plot
ggplot(iz_polys) +
  geom_sf() +
  geom_sf(data=iz_bbox_geom, col="red")

Now I have a polygon that I can filter points with. First, I will have to create a dataframe that will allow me to filter. setting st_join join variable to st_within will give me a column indiciating whether a point falls within the polygon dataframe.

#filtering join with a points df, polygon df
clipped <- st_join(build_cents, iz_bbox_geom, join = st_within)

The column id is created, with a 1 indiciating that a point falls within the polygon. Let’s check the result.

clipped %>%
  filter(id == 1) %>%
ggplot() +
  geom_sf() +
  geom_sf(data=iz_bbox_geom, color = 'red', fill = NA)

It worked, so now I’ll filter the right points, and then create a new dataframe from the iz_polys dataframe, but with the filtered points.

clipped <- clipped %>%
  filter(id == 1)
joined <- st_join(clipped, iz_polys)

Using the more granular iz_polys dataframe, I can aggregate and sum the total area of all buildings within each district. I will create a final dataframe that contains the total building area as well as the total area of the district polygon.

#aggregating and summing total building area
density_calc <- aggregate(joined$area, list(joined$osm_id.y),
                     FUN = sum)
#rename columns
colnames(density_calc) <- c("osm_id", "area")

#create final df that contains district polygons and building area
bounds_blds_sf <- merge(iz_polys, density_calc) 

#calculate building density
bounds_blds_sf <- bounds_blds_sf %>%
  mutate(bounds_blds_sf, b_dens = area/poly_area * 100)

Now I’ll create a nice interactive map using tmap.

tmap_mode('view')
## tmap mode set to interactive viewing
tm_basemap("Stamen.TonerLite") +
tm_shape(bounds_blds_sf) +
  tm_polygons(col="b_dens",
              id="name",
              title= "Building Density as % of Land Area",
              alpha=.8) 

Conclusion

So here we have an areal density map of buildings as a percentage total land area. R is a great tool when you have lots of data to wrangle, or need to access data programatically. Other programs, such as QGIS, may be more user friendly, but R gives you much more control, and makes great looking maps too.

Notes: This post is based off of Matt Herman’s post about trees in NYC, and this great open-source GIS course using Python.