Error: Columns `x`, `y` must be 1d atomic vectors or lists

Ihnwigboji · October 1, 2018, 7:09pm

I am trying visualize a data set using ggplot2 but getting the above error. Here is the code I am using

Spectral <- read_excel("spectralData02.xlsx", sheet=1, col_name = TRUE)
library(ggplot2)
library(tidyverse)
library(plotly)
(packageVersion("ggplot2") <= "2.2.1")
Spectral %>% tidyr::gather("id", "value", 2:638)
Spectral[1]
dim(Spectral)
df <- ggplot(Spectral, aes (Spectral[1], Spectral[3]) ) + geom_line() + scale_x_continuous(limits = c(100, 1000)) + scale_y_continuous(limits = c(0, 2000))
df

Here is what the dataset Spectral looks like.

A tibble: 633,178 x 3
Wavelength (nm) id value

1 338. Brw_MISP_09_PL_199_11:00_2017_REFL 6.46
2 340. Brw_MISP_09_PL_199_11:00_2017_REFL 6.36
3 341. Brw_MISP_09_PL_199_11:00_2017_REFL 6.44
4 342. Brw_MISP_09_PL_199_11:00_2017_REFL 6.39
5 344 Brw_MISP_09_PL_199_11:00_2017_REFL 6.39
6 346. Brw_MISP_09_PL_199_11:00_2017_REFL 6.4
7 347 Brw_MISP_09_PL_199_11:00_2017_REFL 6.37
8 348. Brw_MISP_09_PL_199_11:00_2017_REFL 6.37
9 350. Brw_MISP_09_PL_199_11:00_2017_REFL 6.34
10 351. Brw_MISP_09_PL_199_11:00_2017_REFL 6.4

... with 633,168 more row

mara · October 1, 2018, 7:41pm

It's hard to tell without code formatting or the source data, but the way that bracket-notation works in R, Spectral[1] and Spectral[3] are actually data frames, as opposed to 1d atomic vectors or lists (see the vectors section of R for Data Science, for example).

Here's the difference between iris[1] and iris[,1]:

library(tidyverse)
head(iris[1])
#>   Sepal.Length
#> 1          5.1
#> 2          4.9
#> 3          4.7
#> 4          4.6
#> 5          5.0
#> 6          5.4
head(iris[,1])
#> [1] 5.1 4.9 4.7 4.6 5.0 5.4
class(iris[1])
#> [1] "data.frame"
class(iris[,1])
#> [1] "numeric"

^{Created on 2018-10-01 by the reprex package (v0.2.1.9000)}

Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

Ihnwigboji · October 1, 2018, 8:35pm

I used the reprex as you suggested and tried to use the method you suggested to distinguish between data frame and vectors/lists:

Spectral <- read_excel("spectralData02.xlsx", sheet=1, col_name = TRUE)
library(ggplot2)
library(tidyverse)
library(reprex)
Spectral %>% tidyr::gather("id", "value", 2:638)
#> # A tibble: 633,178 x 3
#> Wavelength (nm) id value
#>
#> 1 338. Brw_MISP_09_PL_199_11:00_2017_REFL 6.46
#> 2 340. Brw_MISP_09_PL_199_11:00_2017_REFL 6.36
#> 3 341. Brw_MISP_09_PL_199_11:00_2017_REFL 6.44
#> 4 342. Brw_MISP_09_PL_199_11:00_2017_REFL 6.39
#> 5 344 Brw_MISP_09_PL_199_11:00_2017_REFL 6.39
#> 6 346. Brw_MISP_09_PL_199_11:00_2017_REFL 6.4
#> 7 347 Brw_MISP_09_PL_199_11:00_2017_REFL 6.37
#> 8 348. Brw_MISP_09_PL_199_11:00_2017_REFL 6.37
#> 9 350. Brw_MISP_09_PL_199_11:00_2017_REFL 6.34
#> 10 351. Brw_MISP_09_PL_199_11:00_2017_REFL 6.4
#> # ... with 633,168 more rows
Spectral[1]
#> # A tibble: 994 x 1
#> Wavelength (nm)
#>
#> 1 338.
#> 2 340.
#> 3 341.
#> 4 342.
#> 5 344
#> 6 346.
#> 7 347
#> 8 348.
#> 9 350.
#> 10 351.
#> # ... with 984 more rows
dim(Spectral)
#> [1] 994 638
df <- ggplot(Spectral, aes (Spectral[,1], Spectral[,3]) ) + geom_line() + scale_x_continuous(limits = c(100, 1000)) + scale_y_continuous(limits = c(0, 2000))
df
#> Error: Columns x, y must be 1d atomic vectors or lists

mara · October 1, 2018, 10:27pm

I actually don't recommend using that notation for ggplot2, I was just trying to illustrate why you can't use the single bracket subset for passing columns as vectors.

The way you use reprex is around the code that you run— please take a minute to look at either the FAQ, or the quick demo of reprex in the webinar:

Since you haven't assigned your tidied dataframe to anything, there's a disconnect between what shows up when you're running Spectral %>% tidyr::gather("id", "value", 2:638), and the data frame you're passing in to ggplot, which is presumably in whatever format Spectral was in pre-tidying.

Ihnwigboji · October 2, 2018, 2:27am

Thank you Mara for the insight. I assigned my tidied data to p, which did not really make any difference. The format was still the same prior to the time it was not assigned to anything. I also tried using the reprex and here is the code again with the error message.

setwd("C:/users/Research/Documents/SpectraLibrary")
library(readxl)
Spectral <- read_excel("spectralData02.xlsx", sheet=1, col_name = TRUE)
library(ggplot2)
library(tidyverse)
library(reprex)
p <- Spectral %>% tidyr::gather("id", "value", 2:638) 
head(p)
#> # A tibble: 6 x 3
#>   `Wavelength (nm)` id                                 value
#>               <dbl> <chr>                              <dbl>
#> 1              338. Brw_MISP_09_PL_199_11:00_2017_REFL  6.46
#> 2              340. Brw_MISP_09_PL_199_11:00_2017_REFL  6.36
#> 3              341. Brw_MISP_09_PL_199_11:00_2017_REFL  6.44
#> 4              342. Brw_MISP_09_PL_199_11:00_2017_REFL  6.39
#> 5              344  Brw_MISP_09_PL_199_11:00_2017_REFL  6.39
#> 6              346. Brw_MISP_09_PL_199_11:00_2017_REFL  6.4
p[,1]
#> # A tibble: 633,178 x 1
#>    `Wavelength (nm)`
#>                <dbl>
#>  1              338.
#>  2              340.
#>  3              341.
#>  4              342.
#>  5              344 
#>  6              346.
#>  7              347 
#>  8              348.
#>  9              350.
#> 10              351.
#> # ... with 633,168 more rows
dim(p)
#> [1] 633178      3
df <- ggplot(p, aes (p[,1], p[,3]) ) + geom_line() + scale_x_continuous(limits = c(100, 1000)) + scale_y_continuous(limits = c(0, 2000))
df
#> Error: Columns `x`, `y` must be 1d atomic vectors or lists

^{Created on 2018-10-01 by the reprex package (v0.2.1)}

mara · October 2, 2018, 9:21am

You're almost there with the reprex, and then we can run your code to help you out. Because no one else has your hard drive, we don't have your data. We don't need all of it, but if you could put in a small sample (20 or so rows should work) using dput() or another one of the options described in the FAQ, your reproducible example will be self-contained!

mara · October 2, 2018, 9:34am

Here's a reprex with just the values from head() using the datapasta package for easy tibble pasting.

I didn't use the limits you've specified above, since it's harder to see, but you should be able to add those easily. Note that because Wavelength.(nm) is not a syntactically valid variable name, I had to use backticks around it.

For basic ggplot syntax, you might want to take a look at the R graphics cookbook, and/or the R for Data Science chapter on data visualization.

suppressPackageStartupMessages(library(tidyverse))
yourdata <- tibble::tribble(
  ~`Wavelength.(nm)`,                                  ~id, ~value,
                 338, "Brw_MISP_09_PL_199_11:00_2017_REFL",   6.46,
                 340, "Brw_MISP_09_PL_199_11:00_2017_REFL",   6.36,
                 341, "Brw_MISP_09_PL_199_11:00_2017_REFL",   6.44,
                 342, "Brw_MISP_09_PL_199_11:00_2017_REFL",   6.39,
                 344, "Brw_MISP_09_PL_199_11:00_2017_REFL",   6.39,
                 346, "Brw_MISP_09_PL_199_11:00_2017_REFL",    6.4
  )

ggplot(yourdata, aes(`Wavelength.(nm)`, value)) +
  geom_line()

^{Created on 2018-10-02 by the reprex package (v0.2.1.9000)}

Ihnwigboji · October 2, 2018, 3:40pm

Thank you very much Mara for your continued assistance. Let me try exactly what you did with the whole data and see, and then get back to you.

Thank you once again.

Ihnwigboji · October 3, 2018, 4:49am

Thank you very much it worked just fine for the whole data set.