I have a large dataset containing different fungi species, and one column on each row describes the taxonomy (including kingdom, phylum, class, order, family, genus, species). I would like to create a new column in the dataset, that only includes the "species" name, not all the other information from the taxonomy column. How would I go about isolating this information, as all species names occur after s__ in the taxonomy column, and are of different character lengths. I have attempted to use the mutate function, with str_extract, subset, and start. ITS_counts is that dataset, taxonomy is the column within the dataset im trying to use, s__ is the part of taxonomy I would like to isolate the species name from on each row. The code I have tried to use is...
mutate("species" = str_extract(ITS_counts$taxonomy, substr(start=".*s__", 1000, stop = NULL), group = NULL))
Error in substr(start = ".*s__", 1000, stop = NULL) :
invalid substring arguments
In addition: Warning message:
In substr(start = ".*s__", 1000, stop = NULL) : NAs introduced by coercion