Extracting Column Values into New Column

I have a data frame for fishing trawls, "Gads2"

> Gads2
> # A tibble: 3,487 x 4
> # Groups:   Date, Time [1,741]
>    Date       Time   Species               Abundance
>    <date>     <time> <chr>                     <dbl>
>  1 1988-10-10 12:33  Gadus morhua                 84
>  2 1988-10-10 12:33  Merlangius merlangus        240
>  3 1988-10-10 12:33  Trisopterus luscus          532
>  4 1988-10-10 12:33  Trisopterus minutus         540
>  5 1988-10-10 14:36  Gadus morhua                  4
>  6 1988-10-10 14:36  Trisopterus minutus         140
>  7 1988-10-10 16:37  Gadus morhua                100
>  8 1988-10-10 16:37  Merlangius merlangus         16
>  9 1988-10-10 16:37  Pollachius pollachius         4
> 10 1988-10-10 16:37  Trisopterus luscus          808
> # … with 3,477 more rows

I want to group rows where date and time are equal (named Trawl 1, Trawl 2, etc.). But I have multiple species needed to be grouped within each row.

I need to combine each species and their abundance into a separate column in order to achieve this. So I tried the reshape2 package, based on what I read:

dcast(Gads2, Species ~ Abundance)

'Using Abundance as value column: use value.var to override.
Aggregation function missing: defaulting to length'

That did not work. I knew it was too simple a solution!

So I need help making each species into a new column, and then grouping by row where date and time are equal.

Is this what you are trying to do?

library(tidyr)
DF <- read.csv("~/R/Play/Dummy.csv")
DF_wide <- pivot_wider(data = DF, names_from = "Species", values_from = "Abundance" )                 
DF_wide
#> # A tibble: 3 x 7
#>   Date  Time  `Gadus morhua` `Merlangius mer~ `Trisopterus lu~ `Trisopterus mi~
#>   <fct> <fct>          <int>            <int>            <int>            <int>
#> 1 1988~ 12:33             84              240              532              540
#> 2 1988~ 14:36              4               NA               NA              140
#> 3 1988~ 16:37            100               16              808               NA
#> # ... with 1 more variable: `Pollachius pollachius` <int>

Created on 2020-06-30 by the reprex package (v0.3.0)

Having spaces in column names will be inconvenient. You should consider replacing the space between the genus and the species with an underscore.

You could try using the argument names_repair = "universal" in the pivot_wider() function, which I believe will make your column names syntactic (i.e. it'll get rid of the spaces for you).

library(tidyr)
DF <- read.csv("~/R/Play/Dummy.csv")
DF_wide <- pivot_wider(data = DF, names_from = "Species", values_from = "Abundance", names_repair = "universal" )                 
DF_wide
2 Likes

Hi @katemontana,

Would this work?

# Load libraries ----------------------------------------------------------
library("tidyverse")


# Create example data -----------------------------------------------------
set.seed(807762)
n = 1000
d <- tibble(
  Date = sample(seq(from = 1988, to = 1990),
                size = n,
                replace = TRUE),
  Species = sample(c("Gadus morhua",
                     "Merlangius merlangus",
                     "Trisopterus luscus",
                     "Trisopterus minutus",
                     "Pollachius pollachius"),
                   size = n, replace = TRUE),
  Abundance = sample(seq(from = 1, to = 10000),
                     size = n,
                     replace = TRUE),
)


# Wrangle data ------------------------------------------------------------
d_grouped <- d %>%
  mutate(obs_id = seq(from = 1, to = n)) %>%
  pivot_wider(id_cols = c("obs_id", "Date"),
              names_from = Species,
              values_from = Abundance) %>% 
  mutate(trawl = group_indices(., Date))

Yielding

> d
# A tibble: 1,000 x 3
    Date Species               Abundance
   <int> <chr>                     <int>
 1  1988 Trisopterus luscus         9532
 2  1988 Pollachius pollachius       424
 3  1990 Gadus morhua               7242
 4  1990 Merlangius merlangus       7351
 5  1988 Merlangius merlangus       1537
 6  1988 Gadus morhua               9917
 7  1989 Merlangius merlangus       3151
 8  1989 Merlangius merlangus       8302
 9  1989 Pollachius pollachius       942
10  1989 Trisopterus luscus         2057
# … with 990 more rows
> d_grouped
# A tibble: 1,000 x 8
   obs_id  Date `Trisopterus luscus` `Pollachius pollachiu… `Gadus morhua` `Merlangius merlang… `Trisopterus minutu… trawl
    <int> <int>                <int>                  <int>          <int>                <int>                <int> <int>
 1      1  1988                 9532                     NA             NA                   NA                   NA     1
 2      2  1988                   NA                    424             NA                   NA                   NA     1
 3      3  1990                   NA                     NA           7242                   NA                   NA     3
 4      4  1990                   NA                     NA             NA                 7351                   NA     3
 5      5  1988                   NA                     NA             NA                 1537                   NA     1
 6      6  1988                   NA                     NA           9917                   NA                   NA     1
 7      7  1989                   NA                     NA             NA                 3151                   NA     2
 8      8  1989                   NA                     NA             NA                 8302                   NA     2
 9      9  1989                   NA                    942             NA                   NA                   NA     2
10     10  1989                 2057                     NA             NA                   NA                   NA     2
# … with 990 more rows

Hope it helps :slightly_smiling_face:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.