Building a spatial grid with R10 Sep 2015
In this post, we are going to learn how to build a spatial grid with R. It can be sometimes very useful to use a grid composed of equal-area cells in order to aggregate the data as a precursor to data analysis and visualization. For example, we can build a grid to spatially aggregate tweets posted from the metropolitan area of Barcelona in 2012 to build a heat map and/or perform a hotspot spatial analysis as we did in this paper.
To do so, we can reuse the dataset that we have already used in a previous post. This dataset is composed of a shapefile MA containing the boundaries of the 38 municipalities of the metropolitan area of Barcelona and a csv file Tweets which contains the geographical coordinates of a sample of tweets posted from this metropolitan area in 2012.
First, we need to import the data. The Twitter dataset is imported into a dataframe tweets using the function read_delim (package readr) way faster than the read.table function and MA is imported into a SpatialPolygonsDataFrame with the readOGR function (package rgdal) (see this post for more details about this type of object). tweets is composed of two columns (longitude/latitude) that we convert into a SpatialPoints object. At this point, the coordinate system of the two spatial objects is longitude/latitude but since we want to build the grid we need to project them on a plane. It is of course possible to build a grid using longitude/latitude coordinate but if you want to preserve the shape and/or the area it is recommended to use a projection. To project our spatial objects in UTM (zone 31 for Barcelona) we need the spTransform function (package sp). Finally, we merge the polygons of MA into a SpatialPolygons U using the function gUnaryUnion (package rgeos).
To check that everything is ok, we plot the spatial object on a map using the function fortify (package ggplot2) to add the Polygons U and using the function points to add the tweets on an empty map.
Everything looks good, we can now build the grid using the function GridTopology. This function takes as input the centroid of the bottom left cell (c(minx,miny) in our case), the cells’ length and width (both equal to l=2000 meters) and the number of cells in each dimension c(Nbx,Nby) (based on the bounding box in our case). We can now “spatialize” our grid G using the SpatialGrid function. Note that since the bounding box is based on U a spatial object projected in UTM and l is in meter we can directly project our SpatialGrid in UTM (zone 31 for Barcelona). Finally, we convert G into a SpatialPixels (don’t ask me why!) and in a SpatialPolygons.
Again, we plot the results on a map to be sure that there is no problem with the projections.
To go a little bit further, instead of considering the full grid we can select only the cells intersecting the metropolitan area using the function gIntersects (package rgeos). This function takes as input two spatial objects and returns a matrix of boolean in which the element of the ith row and jth column tells us if the ith element of the first spatial object intersects the jth element of the second spatial object. In our case, since U contains only one polygon we obtain a vector of boolean iGU. We can now work with the new grid G[iGU] composed of cells intersecting the metropolitan area. But to be more rigorous and have a grid that exactly matches the metropolitan area’s boundaries we can use the function gIntersection (package rgeos) to compute the intersection between the new grid G[iGU] and U.
Here is the code,
and the result:
Once we have the grid we can export it in shp format in the desired projection using the function writeOGR (package rgdal). We can also use our grid to build a heat map and/or to perform a hotspot analysis. To this end, we use again the function gIntersects to count the number of tweets in each grid cell. The results is a matrix of boolean iGTweet that we can aggregate with the function apply into one vector spt giving us the number of tweets per cell.
It can be sometimes easier to work with qualitative data to build a heat map in order to cluster together cells having a similar number of tweets. In this case, I used the cumulative proportion of tweets to dispatch the cells into ten clusters.
Finally, we plot our heat map based on this new qualitative variable composed of 10 categories using the heat.colors palette. Note that it is also possible to visualize the data using Google Earth by exporting the heat map in kml format as described in this post.