Originally when I planned this post, I was going to walk through QGIS and the nightmare experience I originally had developing this project. The post was to be replete with esoteric $GDAL$ incantations, annotated screenshots and obscure python flotsam, jetsam and errata however that was going to test a self-imposed rule for this website;
All examples must be fully reproducible, with the minimum of human intervation.
In keeping with this principle, I am pleased to say that this time around, the outcomes are higher quality, reproducible and entirely driven from $R$. A much better result.
To achieve this, I had to make heavy use of the $sf$ and $raster$ packages and the Simple Features reference guide.
Geological Data
From Part 1, Jaxon informed me that I would need to take into account the regional geological conditions for the soil component of quantifying terroir.
The data structure I want for my regression model is a wide format, with a single row for each observation (a pixel at a lat/lon coordinate) and columns for each attribute. I want to be able to quantify the impact of each variable, so I want the columns to have 1, for an attribute at that pixel, 0 if not and NA for missing data.
Extract each feature individually to a separate layer
Set the layer values to 1 for present
Rasterize that layer into a grid then pivot to a long format
Aggregate the results for each column into a wide dataframe
I downloaded the zip archive using $curl$ and extracted the files associated with the “GeologicUnitPolygons2_5M.shp” layer.
The $sf$ package really does make working with geospatial files in $R$ a breeze. You get full access to the $dplyr$ verbs, and polygons are stored in lists inside the $geometry$ column of a data frame.
Due to this, I could import my $SHP$ file with a SQL query, set the CRS to EPSG: 4283 and pipe the result to a transmute for combined subsetting and transformation in four distinct pipe operations.
I now had a $sf$ dataframe with the structure below. From here, the approach becomes apparent, for each target column, break out each level into a layer and rasterize.
The function “per_level”, below, uses tidyselect notation to accept a level of a factor, passed as a quoted string and turn it into a symbolic variable ($rlang::sym$). I also use $glue::glue$ extensively to avoid nested $paste0$ calls. I use the “fasterize” package to turn a $sf$ object into a $raster$ however the output is a S$ type object so it doesn’t play nice with the pipe operator (yet).
To subset these objects, use “@” instead of “$”.
I then have a nice, vector of values to turn into a dataframe, set as a data.table (in place in memory) and assign a meaningful name.
The per category function takes a column from “shp_geo” and maps “per_level” to each level of the column before column binding the results together into a wide dataframe, which is written to an intermediate folder.
Finally, I apply the “per_category” function to a subset of columns and column bind them to a lat/lon data.frame to provide context.
Results are then saved in the top level folder for geological data.
Pleasingly, all have the right orientation, shape and coverage. All three present different pieces of information, of note, is the Lithology, typically sedimentary, metamorphic and granitoid in our wine growing regions, as expected.