Problems with type of data, I am stuck reading data

shiny

#1

Hi. I have a very simple problem when an excel.xlsx file is converted to an txt- og osv-file and loaded into R, cannot start to produce my graphics, because R reject due to "data are not numeric".
Here is the script etc.:

> setwd("~/Desktop")
> library(ggplot2)
> filename <- "report2120.txt"
> my_data <- read.csv(filename, sep = "/t", header = TRUE)
Fejl i scan(file, what = "", sep = sep, quote = quote, nlines = 1, quiet = TRUE,  : 
  invalid 'sep' value: must be one byte
> my_data <- read.csv(filename, sep = "\t", header=TRUE)
> head(my_data)
                      ID Sulfur.dioxide Trimethylene.oxide X1.Propene..2.methyl.
1               rt (min)       3,194533           3,366183              3,349017
2 1-2ndlf1_040817_42.CDF   29088,291016       66303,851563          85989,312500
3 2-2ndlf1_150817_54.CDF   16776,804688       51661,789063         316874,437500
4 3-2ndlf1_240817_78.CDF   15850,670898      128605,125000         237439,984375
5 4-2ndlf2_040817_43.CDF   26271,083984       32317,853516          65547,023438
6 5-2ndlf2_150817_55.CDF   23765,802734      103366,726563         178750,453125
  Butane..2.methyl.       Acetone       Pentane Cyclopropane..ethylidene.
1          3,675183      3,778167      3,823950                  3,904050
2     297094,093750  82714,937500 100953,789063               7684,515625
3      62074,398438 116608,132813  42800,843750              18679,378906
4    1138562,375000  79097,937500 208050,593750              16209,337891
5     207188,984375 107744,523438  83528,554688              11778,996094
6     452412,531250 117449,390625 124497,210938              12994,667969
  Ethane..1.1.2.trichloro.1.2.2.trifluoro. Hexane..3.3.4.4.tetrafluoro. Methylene.chloride
1                                 4,064267                     4,132933           4,121483
2                             29344,169922                 18852,832031       23710,513672
3                             13301,733398                 16979,384766       98455,101563
4                            162439,812500                 29683,806641       24439,671875
5                             25597,494141                 16580,197266       14555,024414
6                             45374,042969                 23357,460938       24726,378906
  Carbon.disulfide Silanol..trimethyl. Cyclopropane..1.2.dimethyl...trans.         Urea
1         4,213050            4,396150                            4,470533     4,590700
2     13242,083984        39976,644531                        38549,113281  3202,225586
3      9064,365234        23474,128906                         4310,464355  3640,418457
4     22462,945313        47547,753906                       111134,203125  5730,806641
5     11252,458008        55003,242188                        34506,441406  2981,637451
6     19146,050781        22598,697266                        47463,332031 11173,574219
  Oxirane..2..1.1.dimethylethyl..3.methyl. Butane..2.methyl..1   X2.Butanone Furan..2.methyl.
1                                 4,682250            4,836733      4,802400         4,928283
2                             17261,277344        13726,242188  30403,994141      6670,623047
3                             52825,183594        46651,050781 536198,750000     18998,705078
4                             20459,929688        22859,890625  11091,367188      7335,367188
5                             19796,232422        17718,019531  37722,152344      8314,204102
6                             55074,734375        23672,992188  94077,328125      5960,987793
> is.numeric(my_data)
[1] FALSE

Anyone give me an advice? I am not a dummy R-user but I am not very technical...
Thanks Ole+


#2

Hi! Welcome!

It looks like your columns are getting merged together. I take it back! Now that I've edited your post to format the console output properly, it looks to me like the file imported OK. But I can't tell what format the variables are without seeing the output of str(my_data). You seem to be in a locale where commas are used as the decimal separator, but if R doesn't think that that's the case, it will have read in those numbers as text (and even worse, read.csv() will convert all text to factors by default, so they may actually be factors).

Can you provide the output of str(my_data)? To format it properly, just select the pasted text and click the little </> button at the top of the posting box.

You might take a look at the other functions in the read.table() family (which includes read.csv()). They are just aliases of each other with different default parameters set. You can of course set all the parameters yourself, but if one of the other defaults matches your situation, it might be simpler to use that. For instance, read.delim2() looks like it could be a better fit for your data?

By the way, is.numeric(my_data) returned FALSE, but that actually doesn't mean that the variables in your data frame aren't numeric. That code is asking if the data frame object as a whole is numeric, when you want to ask about the variables inside the data frame (a data frame is just a fancy list, and each variable is a list element).

# A data frame with only numeric variables
mydata <- data.frame(
  x = c(1:10),
  y = c(11:20)
)

# Asking if the data frame as a whole is numeric doesn't work
is.numeric(mydata)
#> [1] FALSE

# You can ask about specific variables
is.numeric(mydata$x)
#> [1] TRUE

# Or you can examine the structure
str(mydata)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ x: int  1 2 3 4 5 6 7 8 9 10
#>  $ y: int  11 12 13 14 15 16 17 18 19 20

# Or you can apply the `is.numeric()` function to each variable in turn
lapply(mydata, is.numeric)
#> $x
#> [1] TRUE
#> 
#> $y
#> [1] TRUE

Created on 2018-07-12 by the reprex package (v0.2.0).


#3

Hi, as Mara suggested, you have an issue with decimal separators. Try adding dec = "," on the read.csv line to tell R that your decimal separator is a comma:

my_data <- read.csv(filename, sep = "\t", header=TRUE, dec =",")

#4

If you want sep = "\t", you should probably use read.table, not read.csv, which is a version of the former with a preset sep = "," (plus a few more parameters). And if those commas are decimal separators, Fer is right—you need to tell R by setting dec = ",".