Extracting Column Values into New Column

katemontana · June 30, 2020, 3:39pm

I have a data frame for fishing trawls, "Gads2"

> Gads2
> # A tibble: 3,487 x 4
> # Groups:   Date, Time [1,741]
>    Date       Time   Species               Abundance
>    <date>     <time> <chr>                     <dbl>
>  1 1988-10-10 12:33  Gadus morhua                 84
>  2 1988-10-10 12:33  Merlangius merlangus        240
>  3 1988-10-10 12:33  Trisopterus luscus          532
>  4 1988-10-10 12:33  Trisopterus minutus         540
>  5 1988-10-10 14:36  Gadus morhua                  4
>  6 1988-10-10 14:36  Trisopterus minutus         140
>  7 1988-10-10 16:37  Gadus morhua                100
>  8 1988-10-10 16:37  Merlangius merlangus         16
>  9 1988-10-10 16:37  Pollachius pollachius         4
> 10 1988-10-10 16:37  Trisopterus luscus          808
> # … with 3,477 more rows

I want to group rows where date and time are equal (named Trawl 1, Trawl 2, etc.). But I have multiple species needed to be grouped within each row.

I need to combine each species and their abundance into a separate column in order to achieve this. So I tried the reshape2 package, based on what I read:

dcast(Gads2, Species ~ Abundance)

'Using Abundance as value column: use value.var to override.
Aggregation function missing: defaulting to length'

That did not work. I knew it was too simple a solution!

So I need help making each species into a new column, and then grouping by row where date and time are equal.

FJCC · June 30, 2020, 3:59pm

Is this what you are trying to do?

library(tidyr)
DF <- read.csv("~/R/Play/Dummy.csv")
DF_wide <- pivot_wider(data = DF, names_from = "Species", values_from = "Abundance" )                 
DF_wide
#> # A tibble: 3 x 7
#>   Date  Time  `Gadus morhua` `Merlangius mer~ `Trisopterus lu~ `Trisopterus mi~
#>   <fct> <fct>          <int>            <int>            <int>            <int>
#> 1 1988~ 12:33             84              240              532              540
#> 2 1988~ 14:36              4               NA               NA              140
#> 3 1988~ 16:37            100               16              808               NA
#> # ... with 1 more variable: `Pollachius pollachius` <int>

^{Created on 2020-06-30 by the reprex package (v0.3.0)}

Having spaces in column names will be inconvenient. You should consider replacing the space between the genus and the species with an underscore.

MyKo101 · June 30, 2020, 4:22pm

You could try using the argument names_repair = "universal" in the pivot_wider() function, which I believe will make your column names syntactic (i.e. it'll get rid of the spaces for you).

library(tidyr)
DF <- read.csv("~/R/Play/Dummy.csv")
DF_wide <- pivot_wider(data = DF, names_from = "Species", values_from = "Abundance", names_repair = "universal" )                 
DF_wide

Leon · June 30, 2020, 5:19pm

Hi @katemontana,

Would this work?

# Load libraries ----------------------------------------------------------
library("tidyverse")


# Create example data -----------------------------------------------------
set.seed(807762)
n = 1000
d <- tibble(
  Date = sample(seq(from = 1988, to = 1990),
                size = n,
                replace = TRUE),
  Species = sample(c("Gadus morhua",
                     "Merlangius merlangus",
                     "Trisopterus luscus",
                     "Trisopterus minutus",
                     "Pollachius pollachius"),
                   size = n, replace = TRUE),
  Abundance = sample(seq(from = 1, to = 10000),
                     size = n,
                     replace = TRUE),
)


# Wrangle data ------------------------------------------------------------
d_grouped <- d %>%
  mutate(obs_id = seq(from = 1, to = n)) %>%
  pivot_wider(id_cols = c("obs_id", "Date"),
              names_from = Species,
              values_from = Abundance) %>% 
  mutate(trawl = group_indices(., Date))

Yielding

> d
# A tibble: 1,000 x 3
    Date Species               Abundance
   <int> <chr>                     <int>
 1  1988 Trisopterus luscus         9532
 2  1988 Pollachius pollachius       424
 3  1990 Gadus morhua               7242
 4  1990 Merlangius merlangus       7351
 5  1988 Merlangius merlangus       1537
 6  1988 Gadus morhua               9917
 7  1989 Merlangius merlangus       3151
 8  1989 Merlangius merlangus       8302
 9  1989 Pollachius pollachius       942
10  1989 Trisopterus luscus         2057
# … with 990 more rows
> d_grouped
# A tibble: 1,000 x 8
   obs_id  Date `Trisopterus luscus` `Pollachius pollachiu… `Gadus morhua` `Merlangius merlang… `Trisopterus minutu… trawl
    <int> <int>                <int>                  <int>          <int>                <int>                <int> <int>
 1      1  1988                 9532                     NA             NA                   NA                   NA     1
 2      2  1988                   NA                    424             NA                   NA                   NA     1
 3      3  1990                   NA                     NA           7242                   NA                   NA     3
 4      4  1990                   NA                     NA             NA                 7351                   NA     3
 5      5  1988                   NA                     NA             NA                 1537                   NA     1
 6      6  1988                   NA                     NA           9917                   NA                   NA     1
 7      7  1989                   NA                     NA             NA                 3151                   NA     2
 8      8  1989                   NA                     NA             NA                 8302                   NA     2
 9      9  1989                   NA                    942             NA                   NA                   NA     2
10     10  1989                 2057                     NA             NA                   NA                   NA     2
# … with 990 more rows

Hope it helps

system · July 7, 2020, 5:19pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.