Match unique values within the same dataframe

Hi there,

I am brand new to this forum (and to R), so please let me know if you need any more information.

I have a dataset of about 45K observations of phytoplankton species in several lakes - 245 different genuses. Each sample has the Genus and the Common name - what I am trying to do is create a list of the common names for each of the 245 unique genera.

Here is an example of the dataset:

gen.com
Genera Common_Name
1 Achnanthidium
2 Aphanizomenon Blue-green Algae
3 Chlorella Green Algae
4 Chroococcus
5 Chrysochromulina
6 Crucigenia Green Algae
7 Cryptomonas Dinoflagellates
8 Cyclotella Centric Diatoms
9 Diatoma
10 Dictyosphaerium
11 Dinobryon Yellow Algae
12 Kirchneriella Green Algae
13 Mallomonas Yellow Algae
14 Monoraphidium
15 Nitzschia Pennate Diatoms
16 Ochromonas Yellow Algae
17 Plagioselmis
18 Planktolyngbya

I have tried extracting the unique genus values and then matching them against the common names, but all I get is an NA error.

gen.com <- phyto.new[c("Genera", "Common_Name")]
gen <- unique(gen.com$Genera)
match(gen,gen.com$Common_Name)
match(gen,gen.com$Common_Name)
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[66] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[131] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[196] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Does anyone know what I'm doing wrong?

It would be better if you gave us some sample data we can work with using dput() to make it easily readable, and then show us what you have tried, the output that is wrong and say what output you want.

Anyway I don't understand your question.

Thanks so much for your quick reply. I have edited my original post - not sure how to use the dput() function, but hopefully my end goal is clearer. I put what I have tried and what I got, and what I ultimately want.

you've just shared a lot of repeated NA values... theres no information content.
if you have a dataframe, lets say its called 'mydataframe'
and lets say you dont want to share all observations, just the top 100 (at the head)
then you use dput like

dput(head(mydataframe,n=100))

have a go at that.

1 Like

A very simple and very effective way to supply some data is to use the dput() command.

dput(mydata)

and then simply copy the output and paste it here. If you have a very large data set then a sample should be fine. To supply us with 100 rows of your data set do

dput(head(mydata , 100))

where mydata is the name of your data.frame or tibble.

1 Like
library(dplyr)

select(phyto.new,Common_Name,Genera) %>% 
arrange(Common_Name,Genera)
        Common_Name           Genera
1  Blue-green Algae    Aphanizomenon
2   Centric Diatoms       Cyclotella
3   Dinoflagellates      Cryptomonas
4       Green Algae        Chlorella
5       Green Algae       Crucigenia
6       Green Algae    Kirchneriella
7   Pennate Diatoms        Nitzschia
8      Yellow Algae        Dinobryon
9      Yellow Algae       Mallomonas
10             <NA>    Achnanthidium
11             <NA>      Chroococcus
12             <NA> Chrysochromulina
13             <NA>          Diatoma
14             <NA>  Dictyosphaerium
15             <NA>    Monoraphidium

?

1 Like

Yes, that's it! Thank you soooooooo much for your help :heart_eyes: :heart_eyes: :grin:

Ok great. Just to clarify, no matching was involved here.
Select is picking columns form the frame to show.
Arrange is applying an ordering.

Good luck with your R journey :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.