> Gads2
> # A tibble: 3,487 x 4
> # Groups: Date, Time [1,741]
> Date Time Species Abundance
> <date> <time> <chr> <dbl>
> 1 1988-10-10 12:33 Gadus morhua 84
> 2 1988-10-10 12:33 Merlangius merlangus 240
> 3 1988-10-10 12:33 Trisopterus luscus 532
> 4 1988-10-10 12:33 Trisopterus minutus 540
> 5 1988-10-10 14:36 Gadus morhua 4
> 6 1988-10-10 14:36 Trisopterus minutus 140
> 7 1988-10-10 16:37 Gadus morhua 100
> 8 1988-10-10 16:37 Merlangius merlangus 16
> 9 1988-10-10 16:37 Pollachius pollachius 4
> 10 1988-10-10 16:37 Trisopterus luscus 808
> # … with 3,477 more rows
I want to group rows where date and time are equal (named Trawl 1, Trawl 2, etc.). But I have multiple species needed to be grouped within each row.
I need to combine each species and their abundance into a separate column in order to achieve this. So I tried the reshape2 package, based on what I read:
dcast(Gads2, Species ~ Abundance)
'Using Abundance as value column: use value.var to override.
Aggregation function missing: defaulting to length'
That did not work. I knew it was too simple a solution!
So I need help making each species into a new column, and then grouping by row where date and time are equal.
You could try using the argument names_repair = "universal" in the pivot_wider() function, which I believe will make your column names syntactic (i.e. it'll get rid of the spaces for you).
# Load libraries ----------------------------------------------------------
library("tidyverse")
# Create example data -----------------------------------------------------
set.seed(807762)
n = 1000
d <- tibble(
Date = sample(seq(from = 1988, to = 1990),
size = n,
replace = TRUE),
Species = sample(c("Gadus morhua",
"Merlangius merlangus",
"Trisopterus luscus",
"Trisopterus minutus",
"Pollachius pollachius"),
size = n, replace = TRUE),
Abundance = sample(seq(from = 1, to = 10000),
size = n,
replace = TRUE),
)
# Wrangle data ------------------------------------------------------------
d_grouped <- d %>%
mutate(obs_id = seq(from = 1, to = n)) %>%
pivot_wider(id_cols = c("obs_id", "Date"),
names_from = Species,
values_from = Abundance) %>%
mutate(trawl = group_indices(., Date))
Yielding
> d
# A tibble: 1,000 x 3
Date Species Abundance
<int> <chr> <int>
1 1988 Trisopterus luscus 9532
2 1988 Pollachius pollachius 424
3 1990 Gadus morhua 7242
4 1990 Merlangius merlangus 7351
5 1988 Merlangius merlangus 1537
6 1988 Gadus morhua 9917
7 1989 Merlangius merlangus 3151
8 1989 Merlangius merlangus 8302
9 1989 Pollachius pollachius 942
10 1989 Trisopterus luscus 2057
# … with 990 more rows
> d_grouped
# A tibble: 1,000 x 8
obs_id Date `Trisopterus luscus` `Pollachius pollachiu… `Gadus morhua` `Merlangius merlang… `Trisopterus minutu… trawl
<int> <int> <int> <int> <int> <int> <int> <int>
1 1 1988 9532 NA NA NA NA 1
2 2 1988 NA 424 NA NA NA 1
3 3 1990 NA NA 7242 NA NA 3
4 4 1990 NA NA NA 7351 NA 3
5 5 1988 NA NA NA 1537 NA 1
6 6 1988 NA NA 9917 NA NA 1
7 7 1989 NA NA NA 3151 NA 2
8 8 1989 NA NA NA 8302 NA 2
9 9 1989 NA 942 NA NA NA 2
10 10 1989 2057 NA NA NA NA 2
# … with 990 more rows