# Building a spatial grid with R

10 Sep 2015In this post, we are going to learn how to build a spatial grid with R. It can be sometimes very useful to use a grid composed of equal-area cells in order to aggregate the data as a precursor to data analysis and visualization. For example, we can build a grid to spatially aggregate tweets posted from the metropolitan area of Barcelona in 2012 to build a heat map and/or perform a hotspot spatial analysis as we did in this paper.

To do so, we can reuse the dataset that we have already used in a previous post. This dataset is composed of a shapefile MA containing the boundaries of the 38 municipalities of the metropolitan area of Barcelona and a csv file Tweets which contains the geographical coordinates of a sample of tweets posted from this metropolitan area in 2012.

First, we need to import the data. The Twitter dataset is imported into a dataframe **tweets** using the function **read_delim** (package **readr**)
way faster than the **read.table** function and **MA** is imported into a **SpatialPolygonsDataFrame** with the **readOGR** function (package **rgdal**)
(see this post for more details about this type of object). **tweets** is composed of
two columns (longitude/latitude) that we convert into a **SpatialPoints** object. At this point, the coordinate system of the two spatial objects
is longitude/latitude but since we want to build the grid we need to project them on a plane. It is of course possible to build a grid using
longitude/latitude coordinate but if you want to preserve the shape and/or the area it is recommended to use a projection. To project our spatial
objects in UTM (zone 31 for Barcelona) we need the **spTransform** function (package **sp**). Finally, we merge the polygons of **MA** into a
**SpatialPolygons U** using the function **gUnaryUnion** (package **rgeos**).

To check that everything is ok, we plot the spatial object on a map using the function **fortify** (package **ggplot2**) to add the
**Polygons U** and using the function **points** to add the tweets on an empty map.

Everything looks good, we can now build the grid using the function **GridTopology**. This function takes as input the centroid of the bottom
left cell (**c(minx,miny)** in our case), the cells’ length and width (both equal to **l=2000** meters) and the number of cells in each dimension
**c(Nbx,Nby)** (based on the bounding box in our case). We can now “spatialize” our grid **G** using the **SpatialGrid** function. Note that since
the bounding box is based on **U** a spatial object projected in UTM and **l** is in meter we can directly project our **SpatialGrid**
in UTM (zone 31 for Barcelona). Finally, we convert **G** into a **SpatialPixels** (don’t ask me why!) and in a **SpatialPolygons**.

Again, we plot the results on a map to be sure that there is no problem with the projections.

To go a little bit further, instead of considering the full grid we can select only the cells intersecting the metropolitan area using the
function **gIntersects** (package **rgeos**). This function takes as input two spatial objects and returns a matrix of boolean in which the
element of the ** ith** row and

**column tells us if the**

*jth***element of the first spatial object intersects the jth element of the second spatial object. In our case, since**

*ith***U**contains only one polygon we obtain a vector of boolean

**iGU**. We can now work with the new grid

**G[iGU]**composed of cells intersecting the metropolitan area. But to be more rigorous and have a grid that exactly matches the metropolitan area’s boundaries we can use the function

**gIntersection**(package

**rgeos**) to compute the intersection between the new grid

**G[iGU]**and

**U**.

Here is the code,

and the result:

Once we have the grid we can export it in **shp** format in the desired projection using the function **writeOGR** (package **rgdal**).
We can also use our grid to build a heat map and/or to perform a hotspot analysis. To this end, we use again the function **gIntersects** to count
the number of tweets in each grid cell. The results is a matrix of boolean **iGTweet** that we can aggregate with the function **apply** into one vector
spt giving us the number of tweets per cell.

It can be sometimes easier to work with qualitative data to build a heat map in order to cluster together cells having a similar number of tweets. In this case, I used the cumulative proportion of tweets to dispatch the cells into ten clusters.

Finally, we plot our heat map based on this new qualitative variable composed of 10 categories using the heat.colors palette. Note that it is also
possible to visualize the data using Google Earth by exporting the heat map in **kml** format as described in
this post.