One common type of task with spatio-temporal data is to match nearby sites. For example, we may want to verify the location of an old list of stations with current stations, or we may want to match the data from different data sources. Some of these matches only concern the spatial dimension, while others require temporal agreement.
This vignette introduces how to spatially and spatio-temporally match sites with the cubble structure with two examples. The first example pairs traditional weather stations with nearby automated stations in New South Wales, Australia. This exercise only concerns the matching based on spherical distance between stations. The next example pairs the river level recorded by the river gauges with the precipitation recorded by the nearby weather station in Victoria, Australia.
Again we will start with
prcp_aus to look at
precipitation and focus on New South Wales stations since that is where
most automated stations are implemented. The figure below shows the
location of traditional and automated weather station on the map:
<- ozmaps::abs_ste %>% nsw_map filter(NAME == "New South Wales") <- prcp_aus %>% nsw # subset for New South Wales stations filter(between(stringr::str_sub(id, 7, 8), 46, 75)) %>% mutate(automated = stringr::str_detect(name, "aws")) %>% face_temporal(ts) %>% filter(lubridate::month(date) == 1, ::year(date) == 2020) %>% lubridateface_spatial() %>% filter(!any(is.na(ts$prcp))) ggplot() + geom_sf(data = nsw_map, color = "grey", linetype = "dotted") + geom_point(data = nsw, aes(x = long, y = lat, color = automated)) + theme_bw() + theme(legend.position = "bottom") + labs(x = "Longitude", y = "Latitude") + scale_color_brewer(palette = "Dark2") + ggtitle("New Sourth Wales") + coord_sf(xlim = c(141, 154))
In the map we can see some traditional and automated weather stations are close to each other. This can be useful information to cross validate recordings from different types of weather stations.
match_temporal(). For a
spatial-only matching, you can use
match_sites(temporal_matching = FALSE) or simply
Any matching requires two datasets in the cubble and we call them
minor. Major and minor dataset
differs from how distance is calculated. Spatial matching calculates the
spherical distance using the Vincenty formula and this distance is
calculated from each site in the
major dataset is
to every site in the
Once the distance is calculated, three arguments are available to refine the matching results:
spatial_n_keep: Number of match each major site receive
spatial_dist_max: maximum distance allowed for a pair of matching
spatial_single_match: Whether each minor site can only be matched to one major site
The order that these three arguments applied will slightly affect the
spatial_n_keep, default to
1, is first applied to keep
n site(s) for each major site,
spaital_dist_max, default to 10, is then applied to filter
out the pairs with distance larger than this maximum distance.
spatial_single_match is lastly applied to resolve the
scenario where site
a (minor) is the closest match for both
B (major) with distance 5km and
spatial_single_match = TRUE,
only be matched to the major site with the smaller distance, that is,
Let’s get back to the weather stations.
We first construct the major site
auto and minor
non_auto by filtering on whether stations are automated or
not. Here we would like to find each station in
auto is the major
non_auto is the minor in the
<- nsw %>% filter(automated) auto <- nsw %>% filter(!automated) non_auto <- match_sites(auto, non_auto, temporal_matching = FALSE) matched
The result from the pairing is also a cubble with two additional
dist as the distance between the pair and
group as the grouping index:
matched #> # cubble: id : nested form #> # bbox: [145.79, -36.11, 151.67, -30.52] #> # temporal: date [date], prcp [dbl] #> id lat long elev name wmo_id automated ts dist group #> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <lgl> <list> <dbl> <int> #> 1 ASN00050137 -33.1 147. 193. condobo… 95708 TRUE <tibble> 1.41 1 #> 2 ASN00050052 -33.1 147. 195 condobo… 94707 FALSE <tibble> 1.41 1 #> 3 ASN00056238 -30.5 152. 1079 armidal… 95773 TRUE <tibble> 5.16 2 #> 4 ASN00056037 -30.5 152. 987 armidal… 94773 FALSE <tibble> 5.16 2 #> 5 ASN00064017 -31.3 149. 643 coonaba… 95728 TRUE <tibble> 6.58 3 #> 6 ASN00064008 -31.3 149. 505 coonaba… 94728 FALSE <tibble> 6.58 3 #> 7 ASN00048237 -31.5 146. 218 cobar a… 94710 TRUE <tibble> 6.86 4 #> 8 ASN00048027 -31.5 146. 260 cobar mo 94711 FALSE <tibble> 6.86 4 #> 9 ASN00066194 -33.9 151. 3 canterb… 94766 TRUE <tibble> 7.14 5 #> 10 ASN00066037 -33.9 151. 6 sydney … 94767 FALSE <tibble> 7.14 5 #> 11 ASN00068192 -34.0 151. 73.9 camden … 94755 TRUE <tibble> 8.17 6 #> 12 ASN00068257 -34.1 151. 112 campbel… 94757 FALSE <tibble> 8.17 6 #> 13 ASN00072160 -36.1 147. 164. albury … 95896 TRUE <tibble> 8.33 7 #> 14 ASN00072023 -36.1 147. 184 hume re… 94901 FALSE <tibble> 8.33 7 #> 15 ASN00067113 -33.7 151. 24.7 penrith… 94763 TRUE <tibble> 8.77 8 #> 16 ASN00063077 -33.7 151. 320 springw… 95744 FALSE <tibble> 8.77 8 #> 17 ASN00063291 -33.4 150. 744. bathurs… 94729 TRUE <tibble> 9.30 9 #> 18 ASN00063005 -33.4 150. 713 bathurs… 94730 FALSE <tibble> 9.30 9
Then we can create visualisation to see where these pairs are in the map:
ggplot() + geom_sf(data = nsw_map) + geom_point(data = matched, aes(x = long, y = lat, color = automated)) + ::geom_label_repel( ggrepeldata = matched %>% filter(automated), aes(x = long, y = lat, label = group)) + scale_color_brewer(palette = "Dark2") + ggtitle("New South Wales") + theme_minimal() + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + coord_sf(xlim = c(141, 154))
or compare the series within the same pair (as an example here we only look at records in Jan 2020):
We can see that in general the peaks of precipitation recorded by traditional and automated stations from our pairs are matched, while the exact read may need to be further calibrated.
Bureau of Meteorology collects water data from river gauges and this includes variables: electrical conductivity, turbidity, water course discharge, water course level, and water temperature. In particular, water level will interactive with precipitation from the climate data since rainfall will raise the water level in the river. Here is the location of available weather station and water gauges in Victoria:
Here we provide more details on how temporal matching works in
cubble. Suppose two locations have been matched spatially
and temproal matching will be conducted on variable
a in the plot below: .
We first find the
n peaks in each series (3 peaks here).
A variable needs to be specified in
for construct an interval. Here we pick variable
construct an interval with a default length of 5. The peaks in variable
a are then tested against whether they fall into the any of
the intervals constructed from
A. In this illustration,
there are 2 matches for these two variable The available tuning
parameter in temporal matches are:
temporal_n_highest: the number of peak used - 3 in the example above
temporal_window: the length of the interval - 5 in the example above
temporal_min_match: the minimum number of matched peak for a valid matched pair. To return all the pairs of the match, set this parameter to 0.
In the river level and precipitation example,
river will be matched to
climate. This can be specified in
temporal_by, an analogue to the
by syntax in
join. The goal in this example is to see if precipitation
will be reflected by the water level in the river and this puts
prcp, as the independent. Given there is one
year worth of data, the number of peak (
to consider is slightly raised from a default 20 to 30 and
temporal_min_match is raised accordingly.
<- match_sites(river, vic, res temporal_by = c("Water_course_level" = "prcp"), temporal_independent = "prcp", temporal_n_highest = 30, temporal_min_match = 15)
The output from temporal matching is also a cubble with
n_match for the number of matched temporal peaks (on top of
group from spatial
res#> # cubble: id : nested form #> # bbox: [144.52, -37.73, 146.06, -36.55] #> # temporal: date [date], matched_var [dbl] #> id name lat long type dist group ts n_match #> <chr> <chr> <dbl> <dbl> <chr> <dbl> <int> <list> <int> #> 1 405234 SEVEN CREEKS @ D/S… -36.9 146. river 6.15 5 <tibble> 21 #> 2 404207 HOLLAND CREEK @ KE… -36.6 146. river 8.54 10 <tibble> 21 #> 3 ASN00082042 strathbogie -36.8 146. clim… 6.15 5 <tibble> 21 #> 4 ASN00082170 benalla airport -36.6 146. clim… 8.54 10 <tibble> 21 #> 5 230200 MARIBYRNONG RIVER … -37.7 145. river 6.17 6 <tibble> 19 #> 6 ASN00086038 essendon airport -37.7 145. clim… 6.17 6 <tibble> 19 #> 7 406213 CAMPASPE RIVER @ R… -37.0 145. river 1.84 1 <tibble> 18 #> 8 ASN00088051 redesdale -37.0 145. clim… 1.84 1 <tibble> 18
We can look at the matched pair on the map:
ggplot() + geom_sf(data = vic_map) + geom_point(data = res, aes(x = long, y = lat, color = type)) + ::geom_label_repel(data = res %>% filter(type == "river"), ggrepelaes(x = long, y = lat, label = group)) + scale_color_brewer(palette = "Dark2") + ::theme_bw() + ggplot2::theme(legend.position = "bottom") + ggplot2::labs(x = "Longitude", y = "Latitude") + ggplot2ggtitle("Victoria")
or to look at the series:
<- res %>% res_long face_temporal(ts) %>% unfold(group, type) %>% rename(prcp = matched_var) %>% mutate(prcp = (prcp - min(prcp, na.rm = TRUE))/ (max(prcp, na.rm = TRUE) - min(prcp, na.rm = TRUE))) %>% res_long ggplot(aes(x = date, y = prcp, group = type,color = type)) + geom_line() + facet_wrap(vars(group)) + scale_color_brewer(palette = "Dark2", guide = "none") + theme_bw() + labs(x= "date") + scale_x_date(date_labels = "%b") + labs(x = "Week", y = "Precipitation/ water level")
There are four pairs of matches - all locates in the middle Victoria and we can observe concurrent increase of precipitation and water level (precipitation and water level have been standardised between 0 and 1 to be displayed on the same scale).