How to create a number id for each repeated character value in a column?

So I have this column in a data frame consisting of species names:

species
Dasyatis pastinaca
Amblyraja radiata
Raja montagui
Raja montagui
Dasyatis pastinaca
Himantura imbricata
Mobula thurstoni
Raja montagui
Mobula thurstoni
Dalatias licha

Many of them are repeated and what I'm trying to do is, for every set of repeated names, add a suffix which is their id number. Basically this is the output I'm looking for for the column:

new_species
Dasyatis pastinaca_01
Amblyraja radiata_01
Raja montagui_01
Raja montagui_02
Dasyatis pastinaca_02
Himantura imbricata_01
Mobula thurstoni_01
Raja montagui_03
Mobula thurstoni_02
Dalatias licha_01

I've tried creating a new column (new_species) from the original column (species), followed by two for loops:

df<- df[order(df$species),]
df$new_species=df$species
for (i in unique(df$new_species)){
  for (j in df[df$species==i,]$new_species){
      paste0(j,seq(1,length(df[taxon1df9$new_species==j,]$new_species)))
  }
}

Thanks in advance for any answers

I would use the dplyr package to group the data by species and add row numbers.

DF <- data.frame(species = c(
"Dasyatis pastinaca",
"Amblyraja radiata",
"Raja montagui",
"Raja montagui",
"Dasyatis pastinaca",
"Himantura imbricata",
"Mobula thurstoni",
"Raja montagui",
"Mobula thurstoni",
"Dalatias licha"))
library(dplyr)

DF <- DF %>% group_by(species) %>% 
  mutate(Index = row_number(), new_species = paste(species, formatC(Index, width = 2, flag = "0"), 
                                                   sep = "_"))
DF
#> # A tibble: 10 x 3
#> # Groups:   species [6]
#>    species             Index new_species           
#>    <fct>               <int> <chr>                 
#>  1 Dasyatis pastinaca      1 Dasyatis pastinaca_01 
#>  2 Amblyraja radiata       1 Amblyraja radiata_01  
#>  3 Raja montagui           1 Raja montagui_01      
#>  4 Raja montagui           2 Raja montagui_02      
#>  5 Dasyatis pastinaca      2 Dasyatis pastinaca_02 
#>  6 Himantura imbricata     1 Himantura imbricata_01
#>  7 Mobula thurstoni        1 Mobula thurstoni_01   
#>  8 Raja montagui           3 Raja montagui_03      
#>  9 Mobula thurstoni        2 Mobula thurstoni_02   
#> 10 Dalatias licha          1 Dalatias licha_01

Created on 2020-05-07 by the reprex package (v0.2.1)

1 Like

Probably I'd use unite rather than paste after using row_number()

DF %>% group_by(species) %>% 
  mutate(index = row_number()) %>%
  unite(new_species, species, index)  %>% # underscore by default
  arrange(new_species)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.