OpenStreetMaps is a great source of spatial data. Most common programming languages have packages for downloading data from OSM.
In this tutorial I will show how to download building data using R’s osmdata package, perform density analysis and plot it using ggplot, and interactively using tmap.
This requires some knowledge of spatial data structures.
#loading packages library(osmdata) library(dplyr) library(ggplot2) library(sf) library(tmap)
Getting the Data
The first step requires downloading data from the OSM api using osmdata. First, I will need to set a bounding box for OSM to search for data. Next, I will download administrative boundaries for making a choropleth map. Finally, I will download building data for the defined bounding box.
Setting a search area
I will be searching for data in the city of Izmir, Turkey. In order to define my search, I will use the
getbb command. This command takes a location name as a string and returns a bounding box defined by OSM.
getbb can also return a bounding box as a polygon, this is useful as I want to search for buildings as defined by administrative boundaries.
Because the function returns a matrix, you can use your own polygon to define the OSM
opq lookup (below), but it is important to make sure the CRS of the data is EPSG:3857.
#retrieve bounding box for region of interest iz_bbox <- getbb("konak", format_out = "polygon")
Retrieve boundary lines
In this project I am looking for neighborhood boundaries of districts in Izmir. Because administrative level boundaries are not standardized across countries, it may take some research to figure out what administrative levels are needed. Luckily, there his a helpful tool that lets you find OSM boundaries within an interactive map.
The numbers next to territory names signify the administrative level that is saved within OSM. In my case, I am looking for boundaries at level 8. I can pass this information using the
osmdata request feature, which is a wrapper for the OSM API.
#retrieve level 8 administrative boundaries iz_boundary <- opq(iz_bbox) %>% add_osm_feature(key = "admin_level", value = "8") %>% osmdata_sf() iz_boundary
## Object of class 'osmdata' with: ## $bbox : 38.3929644,27.068361,38.4508181,27.1896253 ## $overpass_call : The call submitted to the overpass API ## $meta : metadata including timestamp and version numbers ## $osm_points : 'sf' Simple Features Collection with 6188 points ## $osm_lines : 'sf' Simple Features Collection with 550 linestrings ## $osm_polygons : 'sf' Simple Features Collection with 1 polygons ## $osm_multilines : NULL ## $osm_multipolygons : 'sf' Simple Features Collection with 166 multipolygons
(Note: OSM has many more map features available and can be queried using the right key:value pairs.)
The function returns an osmdata object, which is a list of
sf dataframes. Each dataframe is composed of a different geometry type, this means that returned objects can be large. In order to get the right data, you need to know which geometry you are after.
In this case, the shape I need are multipolygon geometry types. OSM querys can end up returning large objects. In order to keep some space free, I will drop the extra dataframes. I’ll also clean up some of the data.
#select only df multipolygons iz_polys <- iz_boundary$osm_multipolygons #remove digits from any distirct name iz_polys$name <- gsub('[[:digit:]]+', '', iz_polys$name) #remove . from any district name iz_polys$name <- gsub("[.]", '', iz_polys$name) #trim whitespace iz_polys$name <- trimws(iz_polys$name, "both") #factorize iz_polys$name <- as.factor(iz_polys$name) #calculate polygon areas for later analysis and append to new column iz_polys$poly_area <- st_area(iz_polys) #remove original osmdata object rm(iz_boundary)
Now I want to check to see if I have the right boundaries.
ggplot(iz_polys) + geom_sf()
Looks good, but it has also returned the boundaries for all the outlying districts, as they probably fall along the bounding polygon line. That is something I can deal with later.
Using the same process as above, query OSM for all buildings within the original bounding polygon:
iz_buildings <- opq(iz_bbox) %>% add_osm_feature(key = "building") %>% osmdata_sf()
This object is 100x larger than my boundaries object, so I’ll get rid of the excess data again. This time I will keep the polygon sf object.
build_polys <- iz_buildings$osm_polygons rm(iz_buildings) #drop unecessary columns build_polys <- build_polys %>% select(osm_id, geometry)
But first, I’ll take a quick look at what I have so far.
ggplot(iz_polys) + geom_sf() + geom_sf(data = build_polys)
The buildings returned don’t really follow the borders of any polygon. I am not sure why this happens, but it is something that can be fixed.
Clipping point data
In order to filter out all of the outlying buildigns, I need to calculate them as points, which can be done by finding their centroids. I’ll create a new dataframe with the building centroids. Before I do that, I want to calculate the area of all the buildings, so the data carries over with the newly created dataframe. I won’t need the building polygons from this point on.
#calculate surface area of buildings build_polys$area <- sf::st_area(build_polys) #calculate centroids build_cents <- sf::st_centroid(build_polys)
I will also need to create a polygon geometry in order to filter the outlying buildings. I can do ths using the original bounding box
iz_bbox. But, because this object is a matrix, I need to convert it to an sf object. This can be done using
#create a shape object out of original bounding polygon iz_bbox_geom <- sfheaders::sf_polygon(as.data.frame(iz_bbox), x="V1", y="V2" ) #make sure that the projection matches the other data st_crs(iz_bbox_geom) <- 4326 #plot ggplot(iz_polys) + geom_sf() + geom_sf(data=iz_bbox_geom, col="red")
Now I have a polygon that I can filter points with. First, I will have to create a dataframe that will allow me to filter. setting
st_join join variable to
st_within will give me a column indiciating whether a point falls within the polygon dataframe.
#filtering join with a points df, polygon df clipped <- st_join(build_cents, iz_bbox_geom, join = st_within)
id is created, with a 1 indiciating that a point falls within the polygon. Let’s check the result.
clipped %>% filter(id == 1) %>% ggplot() + geom_sf() + geom_sf(data=iz_bbox_geom, color = 'red', fill = NA)
It worked, so now I’ll filter the right points, and then create a new dataframe from the
iz_polys dataframe, but with the filtered points.
clipped <- clipped %>% filter(id == 1) joined <- st_join(clipped, iz_polys)
Using the more granular
iz_polys dataframe, I can aggregate and sum the total area of all buildings within each district. I will create a final dataframe that contains the total building area as well as the total area of the district polygon.
#aggregating and summing total building area density_calc <- aggregate(joined$area, list(joined$osm_id.y), FUN = sum) #rename columns colnames(density_calc) <- c("osm_id", "area") #create final df that contains district polygons and building area bounds_blds_sf <- merge(iz_polys, density_calc) #calculate building density bounds_blds_sf <- bounds_blds_sf %>% mutate(bounds_blds_sf, b_dens = area/poly_area * 100)
Now I’ll create a nice interactive map using
## tmap mode set to interactive viewing
tm_basemap("Stamen.TonerLite") + tm_shape(bounds_blds_sf) + tm_polygons(col="b_dens", id="name", title= "Building Density as % of Land Area", alpha=.8)
So here we have an areal density map of buildings as a percentage total land area. R is a great tool when you have lots of data to wrangle, or need to access data programatically. Other programs, such as QGIS, may be more user friendly, but R gives you much more control, and makes great looking maps too.