Excel column of numbers read as a list not vector in R

New to R so need a hand with converting excel to a ggplot please!

I have an excel spreadsheet of two columns: RSeq value (number) and tumour_type (Normal/Cancer)
I'm trying to convert the first column to a vector variable:
rseq=FILENAME[ ,1]
but this always saves a list not a vector- not sure why though! this makes it difficult to convert to a dataframe and make a box-plot though.
Secondly I convert the second column to a factor by:
tumour_type_factor=factor(tumour_type, levels=c( "Normal", "Cancer"))
But when I run the factor it comes up with
"N/A
levels = Normal, Cancer"
so clearly it can't read the values in the column properly.

When I generate a box plot of ggplot it returns: default method not implemented for type 'list'

I have managed to do it right one time but can't replicate it! Am I importing the data wrongly from excel? I have tried .xlsl and .csv formats?

Any help much appreciated! Thanks!

It's quite likely that this is the problem. Can you tell us how you're importing the Excel file into R?

I find the excel.xlsx in the files section, click import dataset, in the viewer it confirms that the first column is double and second is char.
First row as names option is ticked.
This is the code displayed in the code preview:
library(readxl)
FILENAME <- read_excel("~/PhD/TCGA Analysis/FILENAME.xlsx")
View(FILENAME)

Then I just click import.

Data import seems fine. I'm not so sure about your data transformations though. Don't see what purpose the vector rseq serves. You shouldn't need to convert tumour_type to factor either; geom_boxplot() works fine with character vectors.

I was able to generate a boxplot using some dummy numbers for RSeq using the code below. Can you see if it works with your data?

library(readxl)
library(ggplot2)

FILENAME <- read_excel("~/FILENAME.xlsx")
print(FILENAME)
#> # A tibble: 14 x 2
#>     Rseq tumour_type
#>    <dbl> <chr>      
#>  1    14 Normal     
#>  2    35 Cancer     
#>  3    47 Normal     
#>  4    32 Cancer     
#>  5    40 Normal     
#>  6    24 Normal     
#>  7    34 Cancer     
#>  8    14 Normal     
#>  9    41 Cancer     
#> 10    12 Cancer     
#> 11    28 Normal     
#> 12    49 Normal     
#> 13    14 Cancer     
#> 14    44 Normal

ggplot(FILENAME, aes(x = tumour_type, y = Rseq)) + 
  geom_boxplot()

Created on 2020-05-17 by the reprex package (v0.3.0)

Oh fab this is great! So when I learnt to do this we had to extract the data from a larger spreadsheet which wasn't tidy so because my original data is tidy it goes into ggplot fine?

Just another quick question- how do I change the x axis so it reads normal then cancer?

Thanks so much!

I'm not sure what your original data looks like but yes, ggplot2 always plays nicer with tidy data sets.

Re-ordering the X-axis labels will require converting tumour_type to factor and specifying the order of its levels. However, this can be done on-the-fly in the aes() call without modifying your data itself.

ggplot(FILENAME, aes(x = factor(tumour_type, 
                                levels = c("Normal", "Cancer")), y = Rseq)) + 
  geom_boxplot() +
  xlab("tumour_type")

Created on 2020-05-18 by the reprex package (v0.3.0)

Great this all worked fine!

Thanks very much,
Cait

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.