I am new to R and having problems with undertaking analyses with imported data from csv file

jod14139 · December 14, 2020, 3:37pm

Good day. I am new to R and am currently using R studio. I managed to import my CSV data into R studio, but when I try using R to determine the alpha diversity, B-diversity and other analyses, I get the error "Error in "104D0"shannon : operator is invalid for atomic vectors"

hist("104D0"$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in "104D0"shannon : operator is invalid for atomic vectors
hist("104D0"$simpson, main="Simpson diversity", xlab="", breaks=10)
Error in "104D0"simpson : operator is invalid for atomic vectors
hist("104D0"$chao, main="Chao richness", xlab="", breaks=15)
Error in "104D0"chao : operator is invalid for atomic vectors
hist("104D0"$ace, main="ACE richness", xlab="", breaks=15)
Error in "104D0"ace : operator is invalid for atomic vectors

I will thus be happy to get help on how to convert my csv data into an appropriate vector format in which the various statistical computations can be done.

Thanks for your help!

gueyenono · December 14, 2020, 3:59pm

Hi @jod14139,

Welcome to RStudio Community! I suspect that you are getting these error messages because you use quotation marks (") instead of backticks (`). Try:

hist(`104D0`$shannon, main="Shannon diversity", xlab="", breaks=10)

This is because variables names should normally not begin with a number as in 104D0.

jod14139 · December 14, 2020, 9:18pm

Thank you sir, I just tried that and it returned this:

hist(104D0$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in hist(104D0$shannon, main = "Shannon diversity", xlab = "", :
object '104D0' not found
hist(104D0.csv$shannon, main="Shannon diversity", xlab="", breaks=10)

The 104D0 is my CSV file containing two columns: the first column is the species column and the 2nd column is the abundance column. So I want to calculate alpha diversity, Beta diversity, and principal component analyses. I do not know if the problem is with my data format.

jod14139 · December 14, 2020, 9:20pm

I wish I could upload my 104D0 file for you to see but I am unable to do so because I am new here

williaml · December 14, 2020, 9:37pm

Can you provide a reproducible example?

FAQ: What's a reproducible example (`reprex`) and how do I create one? meta

Why reprex? Getting unstuck is hard. Your first step here is usually to create a reprex, or reproducible example. The goal of a reprex is to package your code, and information about your problem so that others can run it and feel your pain. Then, hopefully, folks can more easily provide a solution. What's in a Reproducible Example? Parts of a reproducible example: background information - Describe what you are trying to do. What have you already done? complete set up - include any library() calls and data to reproduce your issue. data for a reprex: Here's a discussion on setting up data for a reprex make it run - include the minimal code required to reproduce your error on the data…

gueyenono · December 14, 2020, 9:37pm

@jod14139,

You specifically forgot to add the backticks as specified in my solution. I am not sure what kind of keyboard you use, but the backtick is the following symbol in parentheses: (`). You must use it:

hist(`104D0`$shannon, main="Shannon diversity", xlab="", breaks=10)

But, the easiest solution is to change the name of your variable. When you import your dataset, just store it into a different variable name. Assuming that your data is in a csv file, you can just import it into a variable called, for example, my_data.

jod14139 · December 15, 2020, 1:15am

Thank you Sir, I did the following; please see below:

my_data1 = "104D0_1"
my_data1
[1] "104D0_1"
hist(my_data1$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in my_data1$shannon : $ operator is invalid for atomic vectors

mydata1 <-read.table("G:/CSV.files/104D0.csv", header=TRUE,

                  sep=",", row.names=NULL)

mydata1
Species.Abundance
1 Streptococcus;46927
2 Mycobacterium;11077
3 Atopobium;8951
4 Granulicatella;4685
5 Actinomyces;4016
6 Catonella;1904
7 Homo;1000
8 Macaca;688
9 unclassified (derived from Bacteria);568
10 Eubacterium;530
11 Clostridium;259
12 Gemella;253
13 Lactobacillus;124
14 Sorghum;115
15 Rothia;112
16 Bifidobacterium;99
17 unclassified (derived from Clostridiales);85
18 Bacillus;81
19 Canis;81
20 Ruminococcus;80
21 Schistosoma;80
22 Enterococcus;77
23 Danio;72
24 Neisseria;67
25 Pan;64
26 Lactococcus;62
27 unclassified (derived from Siphoviridae);62
28 Staphylococcus;61
29 unclassified (derived from Clostridiales Family XI. Incertae Sedis);58
30 Ciona;56
31 Mobiluncus;54
32 Veillonella;54
33 Prevotella;45
34 Peptostreptococcus;44
35 Olsenella;41
36 Loa;39
37 Nonionella;38
38 Oribacterium;37
39 Oryza;34
40 Shuttleworthia;33
41 Fusobacterium;32
42 Rattus;32
43 Moniezia;31
44 Collinsella;27
45 unclassified (derived from Erysipelotrichaceae);27
46 Bacteroides;24
47 Xylosandrus;23
48 Slackia;22
49 Coprococcus;22
50 Pongo;22
hist(mydata1$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in hist.default(mydata1$shannon, main = "Shannon diversity", xlab = "", :
'x' must be numeric
I used the (`) symbol instead of the (") symbol and it did not work. However, when I re-imported the CSV file and named it as mydata1, it did not return the same error....it's now stating that the 'x' must be numeric. Could you please be of help here?

Thanks a lot for your help

jod14139 · December 15, 2020, 1:18am

Dear Williaml,

I redid it as shown below:

mydata1 <-read.table("G:/CSV.files/104D0.csv", header=TRUE,

                  sep=",", row.names=NULL)

mydata1
Species.Abundance
1 Streptococcus;46927
2 Mycobacterium;11077
3 Atopobium;8951
4 Granulicatella;4685
5 Actinomyces;4016
6 Catonella;1904
7 Homo;1000
8 Macaca;688
9 unclassified (derived from Bacteria);568
10 Eubacterium;530
11 Clostridium;259
12 Gemella;253
13 Lactobacillus;124
14 Sorghum;115
15 Rothia;112
16 Bifidobacterium;99
17 unclassified (derived from Clostridiales);85
18 Bacillus;81
19 Canis;81
20 Ruminococcus;80
21 Schistosoma;80
22 Enterococcus;77
23 Danio;72
24 Neisseria;67
25 Pan;64
26 Lactococcus;62
27 unclassified (derived from Siphoviridae);62
28 Staphylococcus;61
29 unclassified (derived from Clostridiales Family XI. Incertae Sedis);58
30 Ciona;56
31 Mobiluncus;54
32 Veillonella;54
33 Prevotella;45
34 Peptostreptococcus;44
35 Olsenella;41
36 Loa;39
37 Nonionella;38
38 Oribacterium;37
39 Oryza;34
40 Shuttleworthia;33
41 Fusobacterium;32
42 Rattus;32
43 Moniezia;31
44 Collinsella;27
45 unclassified (derived from Erysipelotrichaceae);27
46 Bacteroides;24
47 Xylosandrus;23
48 Slackia;22
49 Coprococcus;22
50 Pongo;22
hist(mydata1$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in hist.default(mydata1$shannon, main = "Shannon diversity", xlab = "", :
'x' must be numeric

I am now getting the error above that the "x" in 'xlab' should be numeric....I hope this clarifies my position?

Regards,

gueyenono · December 15, 2020, 1:33am

This is how you can share your data with us so that we can better help you. After importing your data with:

mydata1 <-read.table("G:/CSV.files/104D0.csv", header=TRUE, sep=",", row.names=NULL)

Run the following code:

dput(mydata1)

Then copy and share the console output with us.

jod14139 · December 15, 2020, 1:37am

Alright. Thank you very much. Please see below:

dput(mydata1)
structure(list(Species.Abundance = c("Streptococcus;46927", "Mycobacterium;11077",
"Atopobium;8951", "Granulicatella;4685", "Actinomyces;4016",
"Catonella;1904", "Homo;1000", "Macaca;688", "unclassified (derived from Bacteria);568",
"Eubacterium;530", "Clostridium;259", "Gemella;253", "Lactobacillus;124",
"Sorghum;115", "Rothia;112", "Bifidobacterium;99", "unclassified (derived from Clostridiales);85",
"Bacillus;81", "Canis;81", "Ruminococcus;80", "Schistosoma;80",
"Enterococcus;77", "Danio;72", "Neisseria;67", "Pan;64", "Lactococcus;62",
"unclassified (derived from Siphoviridae);62", "Staphylococcus;61",
"unclassified (derived from Clostridiales Family XI. Incertae Sedis);58",
"Ciona;56", "Mobiluncus;54", "Veillonella;54", "Prevotella;45",
"Peptostreptococcus;44", "Olsenella;41", "Loa;39", "Nonionella;38",
"Oribacterium;37", "Oryza;34", "Shuttleworthia;33", "Fusobacterium;32",
"Rattus;32", "Moniezia;31", "Collinsella;27", "unclassified (derived from Erysipelotrichaceae);27",
"Bacteroides;24", "Xylosandrus;23", "Slackia;22", "Coprococcus;22",
"Pongo;22")), class = "data.frame", row.names = c(NA, -50L))

gueyenono · December 15, 2020, 1:45am

jod14139:

structure(list(Species.Abundance = c("Streptococcus;46927", "Mycobacterium;11077",
"Atopobium;8951", "Granulicatella;4685", "Actinomyces;4016",
"Catonella;1904", "Homo;1000", "Macaca;688", "unclassified (derived from Bacteria);568",
"Eubacterium;530", "Clostridium;259", "Gemella;253", "Lactobacillus;124",
"Sorghum;115", "Rothia;112", "Bifidobacterium;99", "unclassified (derived from Clostridiales);85",
"Bacillus;81", "Canis;81", "Ruminococcus;80", "Schistosoma;80",
"Enterococcus;77", "Danio;72", "Neisseria;67", "Pan;64", "Lactococcus;62",
"unclassified (derived from Siphoviridae);62", "Staphylococcus;61",
"unclassified (derived from Clostridiales Family XI. Incertae Sedis);58",
"Ciona;56", "Mobiluncus;54", "Veillonella;54", "Prevotella;45",
"Peptostreptococcus;44", "Olsenella;41", "Loa;39", "Nonionella;38",
"Oribacterium;37", "Oryza;34", "Shuttleworthia;33", "Fusobacterium;32",
"Rattus;32", "Moniezia;31", "Collinsella;27", "unclassified (derived from Erysipelotrichaceae);27",
"Bacteroides;24", "Xylosandrus;23", "Slackia;22", "Coprococcus;22",
"Pongo;22")), class = "data.frame", row.names = c(NA, -50L))

@jod14139 Your data is in a very odd format. Let's do this then. Can you upload your file to any cloud storage service (Google drive, dropbox, onedrive, ...) and then share the link here. I think that'll be easier.

jrkrideau · December 15, 2020, 1:52am

There is a problem with the column names. You only have one column called Species.Abundance when it looks like you need two columns named Species and Abundance

Have a look at the csv file and see if it is using a (.) rather than a ( as a separator in the first line.

I manually edited the earlier data you supplied and changed
Species.Abundance to Species;Abundance

and read in the data using :

dat1  <-  read.csv("jod.csv", sep = ";")

It looks okay.

jod14139 · December 15, 2020, 2:00am

Alright Sir,

Please use this link:

jod14139 · December 15, 2020, 2:03am

@jrkrideau,
You're right. I actually have the data in two columns, species and abundance. I then saved it from excel into csv format. So am quiet surprised that it came out that way after I imported in into R. However, when I check the original files, it still shows as Species;Abundance.

What could be the cause for this and how do i change it as you did? Do i use the command you just showed in your response, because the original files look OK....or I should revert the files to excel format?

jod14139 · December 15, 2020, 2:11am

Hello Sir,

I just did as you advised and got the response below:

dat1 <- read.csv("104D0.csv", sep = ";")
104D0
Error: unexpected symbol in "104D0"
"104D0"
[1] "104D0"
dat1
Species
1 Streptococcus
2 Mycobacterium
3 Atopobium
4 Granulicatella
5 Actinomyces
6 Catonella
7 Homo
8 Macaca
9 unclassified (derived from Bacteria)
10 Eubacterium
11 Clostridium
12 Gemella
13 Lactobacillus
14 Sorghum
15 Rothia
16 Bifidobacterium
17 unclassified (derived from Clostridiales)
18 Bacillus
19 Canis
20 Ruminococcus
21 Schistosoma
22 Enterococcus
23 Danio
24 Neisseria
25 Pan
26 Lactococcus
27 unclassified (derived from Siphoviridae)
28 Staphylococcus
29 unclassified (derived from Clostridiales Family XI. Incertae Sedis)
30 Ciona
31 Mobiluncus
32 Veillonella
33 Prevotella
34 Peptostreptococcus
35 Olsenella
36 Loa
37 Nonionella
38 Oribacterium
39 Oryza
40 Shuttleworthia
41 Fusobacterium
42 Rattus
43 Moniezia
44 Collinsella
45 unclassified (derived from Erysipelotrichaceae)
46 Bacteroides
47 Xylosandrus
48 Slackia
49 Coprococcus
50 Pongo
Abundance
1 46927
2 11077
3 8951
4 4685
5 4016
6 1904
7 1000
8 688
9 568
10 530
11 259
12 253
13 124
14 115
15 112
16 99
17 85
18 81
19 81
20 80
21 80
22 77
23 72
24 67
25 64
26 62
27 62
28 61
29 58
30 56
31 54
32 54
33 45
34 44
35 41
36 39
37 38
38 37
39 34
40 33
41 32
42 32
43 31
44 27
45 27
46 24
47 23
48 22
49 22
50 22
hist(dat1$shannon, main="Shannon diversity", xlab="", breaks=10)
Error in hist.default(dat1$shannon, main = "Shannon diversity", xlab = "", :
'x' must be numeric
dput
function (x, file = "", control = c("keepNA", "keepInteger",
"niceNames", "showAttributes"))
{
if (is.character(file))
if (nzchar(file)) {
file <- file(file, "wt")
on.exit(close(file))
}
else file <- stdout()
.Internal(dput(x, file, .deparseOpts(control)))
}
<bytecode: 0x000001c745e4efe8>
<environment: namespace:base>

The columns separated fine....but the alpha diversity (shannon index) is still not going through...

gueyenono · December 15, 2020, 2:17am

@jod14139 The solution for importing the data the right way is to use the read.csv2() function. It should work.

mydata <- read.csv2("104D7.csv")

However, there are problems with your code for plotting histograms. What exactly are you trying to do?

jod14139 · December 15, 2020, 2:22am

Excellent @gueyenono!!
You're right!!! I tried your new code and I got this:

mydata3 <-read.csv2("104D7.csv")
"104D7.csv"
[1] "104D7.csv"
mydata3
Species Abundance
1 Streptococcus 4893
2 Atopobium 280
3 Granulicatella 211
4 Gemella 172
5 Actinomyces 169
6 Lactococcus 89
7 Homo 73
8 Eubacterium 51
9 Macaca 36
10 unclassified (derived from Clostridiales) 29
11 Bifidobacterium 20
12 Clostridium 11
13 Rothia 9
14 Mycobacterium 9
15 Lactobacillus 9
16 Catonella 9
17 Roseburia 9
18 Candida 9
19 Veillonella 8
20 Canis 7
21 Bacillus 6
22 Oribacterium 6
23 Solobacterium 6
24 Pseudomonas 6
25 unclassified (derived from Bacteria) 6
26 Rattus 6
27 Oryza 6
28 Staphylococcus 5
29 Enterococcus 5
30 Danio 5
31 Sorghum 5
32 Fusobacterium 4
33 Neisseria 4
34 Haemophilus 4
35 Candidatus Phytoplasma 4
36 Schistosoma 4
37 Brassica 4
38 Collinsella 3
39 Prevotella 3
40 Peptostreptococcus 3
41 Bulleidia 3
42 Xylosandrus 3
43 Aegla 3
44 Pan 3
45 Pongo 3
46 Ricinus 3
47 Corynebacterium 2
48 Propionibacterium 2
49 Cyanothece 2
50 Listeria 2
dput
function (x, file = "", control = c("keepNA", "keepInteger",
"niceNames", "showAttributes"))
{
if (is.character(file))
if (nzchar(file)) {
file <- file(file, "wt")
on.exit(close(file))
}
else file <- stdout()
.Internal(dput(x, file, .deparseOpts(control)))
}
<bytecode: 0x000001c745e4efe8>
<environment: namespace:base>

I am trying to calculate diversity, abundance, and richness of my species in my microbiome data. To do this, I need to find their alpha & beta diversities as well as principal component analyses.
I am following tutorials for doing this from this website: Microbiota Analysis in R
It's the only website I have now.

gueyenono · December 15, 2020, 2:27am

I see. Now that the data has been imported properly, you can now run your analysis on it.

jod14139 · December 15, 2020, 2:31am

Thank you very much!

gueyenono · December 15, 2020, 2:34am

You're very welcome. Don't hesitate to post new questions on RStudio Community if you have any. Also, it would greatly help to learn how to post code in your questions. You can do so by moving to a new line and pressing: Ctrl + Shift + C

# This is a line of code