Cartography with R17 Jun 2015
In a recent post I presented you how to build and export spatial objects with R. Today, I’ll show you how to add points to a map and how to manipulate spatial objects with R. Let’s consider a data frame tweets containing geographical coordinate of tweets posted in Spain in 2012. Each row of the data frame contains the longitude and latitude coordinate of a tweet.
First we plot the tweets on a map using the function points to have a first glance at the data. The package maptools provides a simple world map wrld_simp that we can plot using the function plot and use the xlim and ylim arguments to adjust the extremes of the longitude/latitude ranges.
The result is quite nice, we can even identified the main Spanish roads on the map!
A lot of maps are available in .Rdata through packages or websites, I have a preference for the data accessible on the Global Administrative Areas website which is a high-resolution database of country administrative areas at different levels. You can use the code below to import the data in R.
Ok, let’s move on to something more serious. Now I’m going to select the tweets posted from a particular area of Spain, for example the Barcelona’s metropolitan area. It falls pretty well, I actually have a shapefile available here which contains the boundaries of the municipalities composing the metropolitan area of Barcelona. This file MA is composed of four files, MA.shp the feature geometry itself, MA.shx a positional index of the feature geometry, MA.dbf columnar attributes for each shape and MA.prj the coordinate system and projection information. To import a shapefile with R I usually use the function readOGR (package rgdal):
readOGR returns a SpatialPolygonsDataFrame composed of 38 elements and 1 field (i.e. number of columns in the dataframe associated with the spatial elements). A SpatialPolygonsDataFrame is a S4 objects from the Spatial class, in which we can identify different components, called slots. To get the slot names of an S4 objects we have to use the slotNames function and a @ to access a slot. In our case, five slots are associated with MA:
- data: a dataframe containing the attribute of the spatial object,
- polygons: the polygon(s) representing a municipality,
- plotOrder: order in which the polygons should be plotted,
- bbox: a bounding box that contains a matrix with the coordinate values,
- proj4string: a CRS class object, the coordinate system and projection information (longitude/latitude in this case).
Each polygons represents a municipality identified by an ID, an area and a Polygons. A Polygons is a list of Polygon-class objects that contains several components, the centroid labpt, the area and the coords of the polygons (2-column numeric matrix with coordinates; first point (row) should equal last coordinates (row)).
You are following, right? It seems a bit complicated but it is easier to understand when you build your own SpatialPolygonsDataFrame from scratch as described here.
In our case, the coordinate system is longitude/latitude knows as “Geographic Coordinate System” but depending of what we want to do with our data it can be useful sometimes to project them on a plane. To project our SpatialPolygonsDataFrame and transform the coordinate we need the spTransform function (package sp). We need to specify a new coordinate system, in this example we are going to use the “Universal Transverse Mercator System” but there are a lot of “map projections” which distort more or less the surface.
Now we can plot the results, it is possible to plot the map directly with the function plot but I prefer to plot an empty map first and then add the Polygons one by one using the function fortify (package ggplot2). This allows us, for example, to modify the colour of a polygon according to its area:
To select the tweets posted from the metropolitan area of Barcelona, we first need to transform the tweets into SpatialPoints in order to assign them a coordinate system which is, in this case, the longitude/latitude coordinates. Then we project the tweets on a plane using the same reference system than the one we used to project MA, the UTM system.
Now, we need to identify the tweets intersecting the Barcelona’s metropolitan area. To do that we can merge the polygons of MA into a SpatialPolygons U using the function gUnaryUnion (package rgeos).
U can then be used to identify the tweets intersecting MA with the help of the gIntersects function (package rgeos). The function gIntersects takes as input two spatial objects and returns a matrix (if byid=TRUE) of boolean where the element ij tells you if the element i of the first object intersects the element j of the second one. In our case we obtain a vector of length number of tweets.
Finally we plot the SpatialPolygons U and add the tweets intersecting the metropolitan area on it: