I'm still trying to understand how R manages memory but @nick doesn't
select(Columns = destination.x,
Rows = destination.y)
end up creating a new table in memory? Cross joins can get to be pretty big fast. Maybe R does something in this case to minimize the memory usage in a pipeline like this?
This script skips the select() step and just changes the names attribute of transformed_dat at the end so it doesn't make a new table. If you don't care about the first column name just drop the part that changes the column name.
suppressPackageStartupMessages(library(tidyverse))
dat <- tribble(
~passenger_id, ~destination,
1, "China",
1, "Japan",
2, "England",
2, "US",
3, "Canada",
3, "China",
3, "US",
4, "Japan"
)
transformed_dat <-
left_join(dat, dat, by = "passenger_id") %>%
group_by(destination.x, destination.y) %>%
summarize(Count = n()) %>%
spread(destination.x, Count, fill = 0)
ns <- names(transformed_dat)
ns[[1]] <- "Rows"
names(transformed_dat) <- ns
transformed_dat
#> # A tibble: 5 x 6
#> Rows Canada China England Japan US
#> * <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Canada 1 1 0 0 1
#> 2 China 1 2 0 1 1
#> 3 England 0 0 1 0 1
#> 4 Japan 0 1 0 2 0
#> 5 US 1 1 1 0 2