Archive

PostGIS

 

There are numerous excellent packages to process and analyze spatial data in R. However, when a project involves multiple and/or large spatial data sets, it often makes sense to make use of a spatial database for data management.

I use a PostGIS database (which is an extension to the PostgreSQL system) for storing and organizing spatial data for my projects. Unfortunately, transferring spatial vector data between R and PostGIS may be a bit of a hassle if you’re not working on a Linux machine.

If you are working with a Linux distribution, it appears that you can use the OGR “PostGIS” driver that comes with rgdal, see here. Unfortunately, the OGR “PostGIS” driver doesn’t seem to be prepacked in the rgdal installation for Windows or Mac (and likely never will be), so you’ll either have to manually install the driver (which I didn’t manage to do…), or rely on a different approach.

One alternative to using the OGR driver is to convert the spatial data into text and transfer the data using a non-spatial PostgreSQL driver for R. I’ve written two very simple functions, one to upload spatial vector data from R to a PostGIS database, and one to do the opposite, based on this method. Since they’ve served me well until now, I thought they may also be helpful for others.

The dbWriteSpatial function converts the spatial information into the WKT format using the writeWKT function from the rgeos package, transfers the data to the DB using the PostgreSQL driver from the RPostgreSQL package, and then re-creates the geometries on the PostGIS database.

require(rgeos)
require(RPostgreSQL)

dbWriteSpatial <- function(con, spatial.df, schemaname="public", tablename, replace=FALSE) {
 
  # con:          A PostgreSQL connection (from RPostgreSQL)
  # spatial.df:   A Spatial Data Frame object
  # schemaname:   Target schema name
  # tablename:    Target table name
  # replace:      Replace the target table if it already exists
 
  # Create well known text and add to spatial DF
  spatialwkt <- writeWKT(spatial.df, byid=TRUE)
  spatial.df$wkt <- spatialwkt
 
  # Add temporary unique ID to spatial DF
  spatial.df$spatial_id <- 1:nrow(spatial.df)
 
  # Set column names to lower case
  names(spatial.df) <- tolower(names(spatial.df))
 
  # Upload DF to DB
  data.df <- spatial.df@data
  rv <- dbWriteTable(con, c(schemaname, tablename), data.df, overwrite=replace, row.names=FALSE)
 
  # Create geometry column and clean up table
  schema.table <- paste(schemaname, ".", tablename, sep="")
  query1 <- paste("ALTER TABLE ", schema.table, " ADD COLUMN the_geom GEOMETRY;", sep="")
  query2 <- paste("UPDATE ", schema.table, " SET the_geom = ST_GEOMETRYFROMTEXT(t.wkt) FROM ", schema.table, " t  WHERE t.spatial_id = ", schema.table, ".spatial_id;", sep="")
  query3 <- paste("ALTER TABLE ", schema.table, " DROP COLUMN spatial_id;")
  query4 <- paste("ALTER TABLE ", schema.table, " DROP COLUMN wkt;")
  er <- dbSendQuery(con, statement=query1)
  er <- dbSendQuery(con, statement=query2)
  er <- dbSendQuery(con, statement=query3)
  er <- dbSendQuery(con, statement=query4)
 
  return(TRUE)
}

The dbReadSpatial function converts the vector data into the WKT format on the DB, transfers the data into a data frame using the driver from the RPostgreSQL package, saves the data frame in CSV format in the temporary directory of the current R session on the local machine, and then imports the CSV as a Spatial Data Frame object using the appropriate OGR driver in the rgdal package. I’ve implemented the detour via the CSV file because the OGR CSV driver is very fast in converting the WKT information into the appropriate sp object. The alternative, using the readWKT function from the rgeos package, is much slower in most cases, since the readWKT function only accepts one string at a time, and thus necessitates a tedious looping structure.

require(rgeos)
require(rgdal)
require(RPostgreSQL)

dbReadSpatial <- function(con, schemaname="public", tablename, geomcol="the_geom", idcol=NULL) {
 
  # con:          A PostgreSQL connection (from RPostgreSQL)
  # spatial.df:   A Spatial Data Frame object
  # schemaname:   Target schema name
  # tablename:    Target table name
  # geomcol:      Name of the geometry column in the target table (target table may not have more than one geometry column!)
  # idcol:        Name of the column with unique IDs to be used in the ID slot of the spatial objects (not relevant for point data)
 
  ## Build query and fetch the target table
  # Get column names
  q.res <- dbSendQuery(con, statement=paste("SELECT column_name FROM information_schema.columns WHERE table_name ='", tablename, "' AND table_schema ='", schemaname, "';", sep=""))
  schema.table = paste(schemaname, ".", tablename, sep="")
  q.df <- fetch(q.res, -1)
  # Some safe programming
  if (!(geomcol %in% q.df[,1])) {stop(paste("No", geomcol, "column in specified table."))}
  if (!is.null(idcol)) {
    if (!(idcol %in% q.df[,1])) {stop(paste("Specified idname '", idcol, "' not found.", sep=""))}
  }
  # Get table
  query <- paste("SELECT", paste(q.df[,1][q.df[,1] != geomcol], collapse=", "), paste(", ST_ASTEXT(", geomcol, ") AS the_geom FROM", sep=""), schema.table, ";")
  t.res <- dbSendQuery(con, statement=query)
  t.df <- fetch(t.res, -1)
 
  ## Get geometry ID column number
  if (!is.null(idcol)) {
    idcolnum <- which(names(t.df) == idcol)
  } else {
    t.df$id.new <- 1:nrow(t.df)
    idcolnum <- which(names(t.df) == "id.new")
  }
 
  ## Get geometry column number
  geomcolnum <- which(names(t.df) == geomcol)
 
  ## Build spatial data frame using OGR
  write.df <- t.df[,geomcolnum,drop=FALSE]
  names(write.df) <- "WKT"
  filename <- paste("vector_", as.character(format(Sys.time(), "%H_%M_%S")), sep="")
  filename.csv <- paste(filename, ".csv", sep="")
  write.csv(write.df, paste(gsub("[\\]", "/", tempdir()), "/", filename.csv, sep=""), row.names=TRUE)
  down.spdf <- readOGR(dsn=paste(gsub("[\\]", "/", tempdir()), "/", filename.csv, sep=""), layer=filename, verbose=FALSE)
  rv <- file.remove(paste(gsub("[\\]", "/", tempdir()), "/", filename.csv, sep=""))
  data.df <- data.frame(t.df[,-geomcolnum])
  names(data.df) <- names(t.df)[-geomcolnum]  

  # For Spatial Points Data Frame  
  if (grepl("POINT", t.df[1,geomcolnum])) {
    spatial.df <-  SpatialPointsDataFrame(down.spdf@coords, data.df, match.ID=FALSE)
  }
  # For Spatial Polygons/Lines Data Frame    
  if (grepl("POLYGON", t.df[1,geomcolnum]) | grepl("LINE", t.df[1,geomcolnum])) {
    spatial.df <- down.spdf
    spatial.df@data <- data.df
    spatial.df <- spChFIDs(spatial.df, paste(t.df[,idcolnum]))
  }
  return(spatial.df)
}

A session might look something like this:

# Get connection
con <- dbConnect(dbDriver("PostgreSQL"), dbname="postgis", port=5432, host="localhost", user="postgres", password="pw")

# Write myspatialdata.spdf to PostGIS DB
dbWriteSpatial(con, myspatialdata.spdf, schemaname="myschema", tablename="myspatialdata", replace=FALSE)

# Get myschema.myspatialdata from PostGIS DB
myspatialdata.spdf <- dbReadSpatial(con, schemaname="myschema", tablename="myspatialdata", geomcol="the_geom")

The functions do not handle projection information at this time, though I might add this later… Let me know if you detect any bugs or have any suggestions for improvement!

Advertisements