¶ Switching between space and time: Spatio-temporal analysis with
cubble

H. Sherry Zhang

Monash University, Australia

ECSS Miniconference 2022

2022 Nov 17

Hi!

  • A third year PhD student at Monash University, Melbourne, Australia

  • My research centers on exploring multivariate spatio-temporal data with data wrangling and visualisation tool.

  • Find me on

    • Twitter: huizezhangsh,
    • GitHub: huizezhang-sherry, and
    • https://huizezhangsh.netlify.app/

Storage of spatial and temporal data is often split into different tables

A long table?

Inefficient memory use,

repeated information,

especially when large geometry objects are combined with frequent temporal data (daily or weekly).

Still use all my data analysis toolkits built from tibble (data frame)

Cubble: a spatio-temporal vector data structure

Cubble: a spatio-temporal vector data structure

Cubble is a nested object built on tibble that allow easy pivoting between spatial and temporal form.

Pipeline with cubble

spatial <- stations %>% 
  {{ Your spatial analysis }} 

##############################
# more subsetting step if temporal analysis
# depends on spatial results
sp_id <- spatial %>% pull(id)
ts_subset <- ts %>% filter(id %in% sp_id)
##############################

temporal <- ts_subset %>% 
  {{ Your temporal analysis }} 

##############################
# more subsetting step if spatial analysis 
# depends on temporal results
ts_id <- temporal %>% pull(id)
sp_subset <- spatial %>% filter(id %in% ts_id)
##############################

sp_subset %>% 
  {{ Your spatial analysis }} 
cb_obj %>% 
  {{ Your spatial analysis }} %>% 
  face_temporal() %>% 
  {{ Your temporal analysis }} %>% 
  face_spatial() %>% 
  {{ Your spatial analysis }} 

Australian weather station data:

stations
# A tibble: 30 × 6
  id            lat  long  elev name                       wmo_id
  <chr>       <dbl> <dbl> <dbl> <chr>                       <dbl>
1 ASN00060139 -31.4  153.   4.2 port macquarie airport aws  94786
2 ASN00068228 -34.4  151.  10   bellambi aws                94749
3 ASN00017123 -28.1  140.  37.8 moomba airport              95481
4 ASN00081049 -36.4  145. 114   tatura inst sustainable ag  95836
5 ASN00018201 -32.5  138.  14   port augusta aero           95666
# … with 25 more rows

ts
# A tibble: 10,632 × 5
  id          date        prcp  tmax  tmin
  <chr>       <date>     <dbl> <dbl> <dbl>
1 ASN00003057 2020-01-01     0  36.7  26.9
2 ASN00003057 2020-01-02    41  34.2  24  
3 ASN00003057 2020-01-03     0  35    25.4
4 ASN00003057 2020-01-04    40  29.1  25.4
5 ASN00003057 2020-01-05  1640  27.3  24.3
# … with 10,627 more rows

Cast your data into a cubble

(weather <- as_cubble(
  list(spatial = stations, temporal = ts),
  key = id, index = date, coords = c(long, lat)
))
# cubble:   id [30]: nested form
# bbox:     [114.09, -41.88, 152.87, -11.65]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
  id            lat  long  elev name              wmo_id ts                
  <chr>       <dbl> <dbl> <dbl> <chr>              <dbl> <list>            
1 ASN00003057 -16.5  123.     7 cygnet bay         94201 <tibble [316 × 4]>
2 ASN00005007 -22.2  114.     5 learmonth airport  94302 <tibble [363 × 4]>
3 ASN00005084 -21.5  115.     5 thevenard island   94303 <tibble [366 × 4]>
4 ASN00010515 -32.1  117.   199 beverley           95615 <tibble [354 × 4]>
5 ASN00012314 -27.8  121.   497 leinster aero      95448 <tibble [366 × 4]>
# … with 25 more rows
  • the spatial data (stations) can be an sf object and temporal data (ts) can be a tsibble object.

Switch between the two forms

long form

(weather_long <- weather %>% 
  face_temporal())
# cubble:  date, id [30]: long form
# bbox:    [114.09, -41.88, 152.87, -11.65]
# spatial: lat [dbl], long [dbl], elev [dbl],
#   name [chr], wmo_id [dbl]
  id          date        prcp  tmax  tmin
  <chr>       <date>     <dbl> <dbl> <dbl>
1 ASN00003057 2020-01-01     0  36.7  26.9
2 ASN00003057 2020-01-02    41  34.2  24  
3 ASN00003057 2020-01-03     0  35    25.4
4 ASN00003057 2020-01-04    40  29.1  25.4
5 ASN00003057 2020-01-05  1640  27.3  24.3
# … with 10,627 more rows

back to the nested form:

(weather_back <- weather_long %>% 
   face_spatial())
# cubble:   id [30]: nested form
# bbox:     [114.09, -41.88, 152.87, -11.65]
# temporal: date [date], prcp [dbl], tmax [dbl],
#   tmin [dbl]
  id         lat  long  elev name  wmo_id ts      
  <chr>    <dbl> <dbl> <dbl> <chr>  <dbl> <list>  
1 ASN0000… -16.5  123.     7 cygn…  94201 <tibble>
2 ASN0000… -22.2  114.     5 lear…  94302 <tibble>
3 ASN0000… -21.5  115.     5 thev…  94303 <tibble>
4 ASN0001… -32.1  117.   199 beve…  95615 <tibble>
5 ASN0001… -27.8  121.   497 lein…  95448 <tibble>
# … with 25 more rows
identical(weather_back, weather)
[1] TRUE

Access variables in the other form

Reference temporal variables with $

weather %>% 
  mutate(avg_tmax = mean(ts$tmax, na.rm = TRUE))
# cubble:   id [30]: nested form
# bbox:     [114.09, -41.88, 152.87, -11.65]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
  id            lat  long  elev name              wmo_id ts                 avg_tmax
  <chr>       <dbl> <dbl> <dbl> <chr>              <dbl> <list>                <dbl>
1 ASN00003057 -16.5  123.     7 cygnet bay         94201 <tibble [316 × 4]>     32.4
2 ASN00005007 -22.2  114.     5 learmonth airport  94302 <tibble [363 × 4]>     33.2
3 ASN00005084 -21.5  115.     5 thevenard island   94303 <tibble [366 × 4]>     30.7
4 ASN00010515 -32.1  117.   199 beverley           95615 <tibble [354 × 4]>     26.4
5 ASN00012314 -27.8  121.   497 leinster aero      95448 <tibble [366 × 4]>     29.6
# … with 25 more rows

Move spatial variables into the long form

weather_long %>% unfold(long, lat)
# cubble:  date, id [30]: long form
# bbox:    [114.09, -41.88, 152.87, -11.65]
# spatial: lat [dbl], long [dbl], elev [dbl], name [chr], wmo_id [dbl]
  id          date        prcp  tmax  tmin  long   lat
  <chr>       <date>     <dbl> <dbl> <dbl> <dbl> <dbl>
1 ASN00003057 2020-01-01     0  36.7  26.9  123. -16.5
2 ASN00003057 2020-01-02    41  34.2  24    123. -16.5
3 ASN00003057 2020-01-03     0  35    25.4  123. -16.5
4 ASN00003057 2020-01-04    40  29.1  25.4  123. -16.5
5 ASN00003057 2020-01-05  1640  27.3  24.3  123. -16.5
# … with 10,627 more rows

Explore temporal pattern across space

Glyph map transformation

DATA %>%
  ggplot() +
  geom_glyph(
    aes(x_major = X_MAJOR, x_minor = X_MINOR,
        y_major = Y_MAJOR, y_minor = Y_MINOR)) +
  ...

Avg. max. temperature on the map

cb <- as_cubble(
  list(spatial = stations, temporal = ts),
  key = id, index = date, coords = c(long, lat)
)

set.seed(0927)
cb_glyph <- cb %>%
  slice_sample(n = 20) %>%
  face_temporal() %>%
  mutate(month = lubridate::month(date)) %>%
  group_by(month) %>% 
  summarise(tmax = mean(tmax, na.rm = TRUE)) %>%
  unfold(long, lat)

ggplot() +
  geom_sf(data = oz_simp, 
          fill = "grey95", 
          color = "white") +
  geom_glyph(
    data = cb_glyph,
    aes(x_major = long, x_minor = month,
        y_major = lat, y_minor = tmax),
    width = 2, height = 0.7) + 
  ggthemes::theme_map()

Acknowledgements

Reference