I am new to R and having problems with undertaking analyses with imported data from csv file

Good day. I am new to R and am currently using R studio. I managed to import my CSV data into R studio, but when I try using R to determine the alpha diversity, B-diversity and other analyses, I get the error "Error in "104D0"shannon : operator is invalid for atomic vectors"

hist("104D0"$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in "104D0"shannon : operator is invalid for atomic vectors
hist("104D0"$simpson, main="Simpson diversity", xlab="", breaks=10)
Error in "104D0"simpson : operator is invalid for atomic vectors
hist("104D0"$chao, main="Chao richness", xlab="", breaks=15)
Error in "104D0"chao : operator is invalid for atomic vectors
hist("104D0"$ace, main="ACE richness", xlab="", breaks=15)
Error in "104D0"ace : operator is invalid for atomic vectors

I will thus be happy to get help on how to convert my csv data into an appropriate vector format in which the various statistical computations can be done.

Thanks for your help!

Hi @jod14139,

Welcome to RStudio Community! I suspect that you are getting these error messages because you use quotation marks (") instead of backticks (`). Try:

hist(`104D0`$shannon, main="Shannon diversity", xlab="", breaks=10)

This is because variables names should normally not begin with a number as in 104D0.

Thank you sir, I just tried that and it returned this:

hist(104D0$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in hist(104D0$shannon, main = "Shannon diversity", xlab = "", :
object '104D0' not found
hist(104D0.csv$shannon, main="Shannon diversity", xlab="", breaks=10)

The 104D0 is my CSV file containing two columns: the first column is the species column and the 2nd column is the abundance column. So I want to calculate alpha diversity, Beta diversity, and principal component analyses. I do not know if the problem is with my data format.

I wish I could upload my 104D0 file for you to see but I am unable to do so because I am new here

Can you provide a reproducible example?

1 Like

@jod14139,

You specifically forgot to add the backticks as specified in my solution. I am not sure what kind of keyboard you use, but the backtick is the following symbol in parentheses: (`). You must use it:

hist(`104D0`$shannon, main="Shannon diversity", xlab="", breaks=10)

But, the easiest solution is to change the name of your variable. When you import your dataset, just store it into a different variable name. Assuming that your data is in a csv file, you can just import it into a variable called, for example, my_data.

Thank you Sir, I did the following; please see below:

my_data1 = "104D0_1"
my_data1
[1] "104D0_1"
hist(my_data1$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in my_data1$shannon : $ operator is invalid for atomic vectors

mydata1 <-read.table("G:/CSV.files/104D0.csv", header=TRUE,

  •                   sep=",", row.names=NULL)
    

mydata1
Species.Abundance
1 Streptococcus;46927
2 Mycobacterium;11077
3 Atopobium;8951
4 Granulicatella;4685
5 Actinomyces;4016
6 Catonella;1904
7 Homo;1000
8 Macaca;688
9 unclassified (derived from Bacteria);568
10 Eubacterium;530
11 Clostridium;259
12 Gemella;253
13 Lactobacillus;124
14 Sorghum;115
15 Rothia;112
16 Bifidobacterium;99
17 unclassified (derived from Clostridiales);85
18 Bacillus;81
19 Canis;81
20 Ruminococcus;80
21 Schistosoma;80
22 Enterococcus;77
23 Danio;72
24 Neisseria;67
25 Pan;64
26 Lactococcus;62
27 unclassified (derived from Siphoviridae);62
28 Staphylococcus;61
29 unclassified (derived from Clostridiales Family XI. Incertae Sedis);58
30 Ciona;56
31 Mobiluncus;54
32 Veillonella;54
33 Prevotella;45
34 Peptostreptococcus;44
35 Olsenella;41
36 Loa;39
37 Nonionella;38
38 Oribacterium;37
39 Oryza;34
40 Shuttleworthia;33
41 Fusobacterium;32
42 Rattus;32
43 Moniezia;31
44 Collinsella;27
45 unclassified (derived from Erysipelotrichaceae);27
46 Bacteroides;24
47 Xylosandrus;23
48 Slackia;22
49 Coprococcus;22
50 Pongo;22
hist(mydata1$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in hist.default(mydata1$shannon, main = "Shannon diversity", xlab = "", :
'x' must be numeric
I used the (`) symbol instead of the (") symbol and it did not work. However, when I re-imported the CSV file and named it as mydata1, it did not return the same error....it's now stating that the 'x' must be numeric. Could you please be of help here?

Thanks a lot for your help

Dear Williaml,

I redid it as shown below:

mydata1 <-read.table("G:/CSV.files/104D0.csv", header=TRUE,

  •                   sep=",", row.names=NULL)
    

mydata1
Species.Abundance
1 Streptococcus;46927
2 Mycobacterium;11077
3 Atopobium;8951
4 Granulicatella;4685
5 Actinomyces;4016
6 Catonella;1904
7 Homo;1000
8 Macaca;688
9 unclassified (derived from Bacteria);568
10 Eubacterium;530
11 Clostridium;259
12 Gemella;253
13 Lactobacillus;124
14 Sorghum;115
15 Rothia;112
16 Bifidobacterium;99
17 unclassified (derived from Clostridiales);85
18 Bacillus;81
19 Canis;81
20 Ruminococcus;80
21 Schistosoma;80
22 Enterococcus;77
23 Danio;72
24 Neisseria;67
25 Pan;64
26 Lactococcus;62
27 unclassified (derived from Siphoviridae);62
28 Staphylococcus;61
29 unclassified (derived from Clostridiales Family XI. Incertae Sedis);58
30 Ciona;56
31 Mobiluncus;54
32 Veillonella;54
33 Prevotella;45
34 Peptostreptococcus;44
35 Olsenella;41
36 Loa;39
37 Nonionella;38
38 Oribacterium;37
39 Oryza;34
40 Shuttleworthia;33
41 Fusobacterium;32
42 Rattus;32
43 Moniezia;31
44 Collinsella;27
45 unclassified (derived from Erysipelotrichaceae);27
46 Bacteroides;24
47 Xylosandrus;23
48 Slackia;22
49 Coprococcus;22
50 Pongo;22
hist(mydata1$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in hist.default(mydata1$shannon, main = "Shannon diversity", xlab = "", :
'x' must be numeric

I am now getting the error above that the "x" in 'xlab' should be numeric....I hope this clarifies my position?

Regards,

This is how you can share your data with us so that we can better help you. After importing your data with:

mydata1 <-read.table("G:/CSV.files/104D0.csv", header=TRUE, sep=",", row.names=NULL)

Run the following code:

dput(mydata1)

Then copy and share the console output with us.

Alright. Thank you very much. Please see below:

dput(mydata1)
structure(list(Species.Abundance = c("Streptococcus;46927", "Mycobacterium;11077",
"Atopobium;8951", "Granulicatella;4685", "Actinomyces;4016",
"Catonella;1904", "Homo;1000", "Macaca;688", "unclassified (derived from Bacteria);568",
"Eubacterium;530", "Clostridium;259", "Gemella;253", "Lactobacillus;124",
"Sorghum;115", "Rothia;112", "Bifidobacterium;99", "unclassified (derived from Clostridiales);85",
"Bacillus;81", "Canis;81", "Ruminococcus;80", "Schistosoma;80",
"Enterococcus;77", "Danio;72", "Neisseria;67", "Pan;64", "Lactococcus;62",
"unclassified (derived from Siphoviridae);62", "Staphylococcus;61",
"unclassified (derived from Clostridiales Family XI. Incertae Sedis);58",
"Ciona;56", "Mobiluncus;54", "Veillonella;54", "Prevotella;45",
"Peptostreptococcus;44", "Olsenella;41", "Loa;39", "Nonionella;38",
"Oribacterium;37", "Oryza;34", "Shuttleworthia;33", "Fusobacterium;32",
"Rattus;32", "Moniezia;31", "Collinsella;27", "unclassified (derived from Erysipelotrichaceae);27",
"Bacteroides;24", "Xylosandrus;23", "Slackia;22", "Coprococcus;22",
"Pongo;22")), class = "data.frame", row.names = c(NA, -50L))

@jod14139 Your data is in a very odd format. Let's do this then. Can you upload your file to any cloud storage service (Google drive, dropbox, onedrive, ...) and then share the link here. I think that'll be easier.

There is a problem with the column names. You only have one column called Species.Abundance when it looks like you need two columns named Species and Abundance

Have a look at the csv file and see if it is using a (.) rather than a (:wink: as a separator in the first line.

I manually edited the earlier data you supplied and changed
Species.Abundance to Species;Abundance

and read in the data using :

dat1  <-  read.csv("jod.csv", sep = ";")

It looks okay.

Alright Sir,

Please use this link:

@jrkrideau,
You're right. I actually have the data in two columns, species and abundance. I then saved it from excel into csv format. So am quiet surprised that it came out that way after I imported in into R. However, when I check the original files, it still shows as Species;Abundance.

What could be the cause for this and how do i change it as you did? Do i use the command you just showed in your response, because the original files look OK....or I should revert the files to excel format?

Hello Sir,

I just did as you advised and got the response below:

dat1 <- read.csv("104D0.csv", sep = ";")
104D0
Error: unexpected symbol in "104D0"
"104D0"
[1] "104D0"
dat1
Species
1 Streptococcus
2 Mycobacterium
3 Atopobium
4 Granulicatella
5 Actinomyces
6 Catonella
7 Homo
8 Macaca
9 unclassified (derived from Bacteria)
10 Eubacterium
11 Clostridium
12 Gemella
13 Lactobacillus
14 Sorghum
15 Rothia
16 Bifidobacterium
17 unclassified (derived from Clostridiales)
18 Bacillus
19 Canis
20 Ruminococcus
21 Schistosoma
22 Enterococcus
23 Danio
24 Neisseria
25 Pan
26 Lactococcus
27 unclassified (derived from Siphoviridae)
28 Staphylococcus
29 unclassified (derived from Clostridiales Family XI. Incertae Sedis)
30 Ciona
31 Mobiluncus
32 Veillonella
33 Prevotella
34 Peptostreptococcus
35 Olsenella
36 Loa
37 Nonionella
38 Oribacterium
39 Oryza
40 Shuttleworthia
41 Fusobacterium
42 Rattus
43 Moniezia
44 Collinsella
45 unclassified (derived from Erysipelotrichaceae)
46 Bacteroides
47 Xylosandrus
48 Slackia
49 Coprococcus
50 Pongo
Abundance
1 46927
2 11077
3 8951
4 4685
5 4016
6 1904
7 1000
8 688
9 568
10 530
11 259
12 253
13 124
14 115
15 112
16 99
17 85
18 81
19 81
20 80
21 80
22 77
23 72
24 67
25 64
26 62
27 62
28 61
29 58
30 56
31 54
32 54
33 45
34 44
35 41
36 39
37 38
38 37
39 34
40 33
41 32
42 32
43 31
44 27
45 27
46 24
47 23
48 22
49 22
50 22
hist(dat1$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in hist.default(dat1$shannon, main = "Shannon diversity", xlab = "", :
'x' must be numeric
dput
function (x, file = "", control = c("keepNA", "keepInteger",
"niceNames", "showAttributes"))
{
if (is.character(file))
if (nzchar(file)) {
file <- file(file, "wt")
on.exit(close(file))
}
else file <- stdout()
.Internal(dput(x, file, .deparseOpts(control)))
}
<bytecode: 0x000001c745e4efe8>
<environment: namespace:base>

The columns separated fine....but the alpha diversity (shannon index) is still not going through...

@jod14139 The solution for importing the data the right way is to use the read.csv2() function. It should work.

mydata <- read.csv2("104D7.csv")

However, there are problems with your code for plotting histograms. What exactly are you trying to do?

Excellent @gueyenono!!
You're right!!! I tried your new code and I got this:

mydata3 <-read.csv2("104D7.csv")
"104D7.csv"
[1] "104D7.csv"
mydata3
Species Abundance
1 Streptococcus 4893
2 Atopobium 280
3 Granulicatella 211
4 Gemella 172
5 Actinomyces 169
6 Lactococcus 89
7 Homo 73
8 Eubacterium 51
9 Macaca 36
10 unclassified (derived from Clostridiales) 29
11 Bifidobacterium 20
12 Clostridium 11
13 Rothia 9
14 Mycobacterium 9
15 Lactobacillus 9
16 Catonella 9
17 Roseburia 9
18 Candida 9
19 Veillonella 8
20 Canis 7
21 Bacillus 6
22 Oribacterium 6
23 Solobacterium 6
24 Pseudomonas 6
25 unclassified (derived from Bacteria) 6
26 Rattus 6
27 Oryza 6
28 Staphylococcus 5
29 Enterococcus 5
30 Danio 5
31 Sorghum 5
32 Fusobacterium 4
33 Neisseria 4
34 Haemophilus 4
35 Candidatus Phytoplasma 4
36 Schistosoma 4
37 Brassica 4
38 Collinsella 3
39 Prevotella 3
40 Peptostreptococcus 3
41 Bulleidia 3
42 Xylosandrus 3
43 Aegla 3
44 Pan 3
45 Pongo 3
46 Ricinus 3
47 Corynebacterium 2
48 Propionibacterium 2
49 Cyanothece 2
50 Listeria 2
dput
function (x, file = "", control = c("keepNA", "keepInteger",
"niceNames", "showAttributes"))
{
if (is.character(file))
if (nzchar(file)) {
file <- file(file, "wt")
on.exit(close(file))
}
else file <- stdout()
.Internal(dput(x, file, .deparseOpts(control)))
}
<bytecode: 0x000001c745e4efe8>
<environment: namespace:base>

I am trying to calculate diversity, abundance, and richness of my species in my microbiome data. To do this, I need to find their alpha & beta diversities as well as principal component analyses.
I am following tutorials for doing this from this website: https://rstudio-pubs-static.s3.amazonaws.com/268156_d3ea37937f4f4469839ab6fa2c483842.html#alpha-diversity
It's the only website I have now.

I see. Now that the data has been imported properly, you can now run your analysis on it.

Thank you very much!

You're very welcome. Don't hesitate to post new questions on RStudio Community if you have any. Also, it would greatly help to learn how to post code in your questions. You can do so by moving to a new line and pressing: Ctrl + Shift + C

# This is a line of code