Hi Everybody,

I am trying to create a histogram using the variable "height" for a Github diabetes dataset. Instead, I received the error message" 'data' must be a data frame". Could anyone explain what this message means and what I should do to resolve the problem?

library(tidyverse)
diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes[, "height"]
ggplot(just_height, aes(x = height)) + 
+     geom_histogram()
#> Error: `data` must be a data frame, or other object coercible by `fortify()`, not an integer vector.

Created on 2022-02-03 by the reprex package (v2.0.1)

Hello, I am trying to do a histogram of height in the dataset diabetes by using R. However, I met a question with it. When I load the code below, it tells me that " 'data' must bee a data frame, or other object coercible by fortify(), not an integer vector". Does anyone know why this error happen and how can I solve it?

library(reprex)
library(ggplot2)
diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes[, "height"]
ggplot(just_height, aes(x = height)) + 
  geom_histogram()
#> Error: `data` must be a data frame, or other object coercible by `fortify()`, not an integer vector.

Created on 2022-02-04 by the reprex package (v2.0.0)

Hello, I am trying to create a histogram with the height variable from the diabetes dataset located on Github but instead I am receiving the error saying : data must be a data frame, or other object coercible by fortify(), not an integer vector. How do I correct this error so that I can create the intended histogram?

library(tidyverse)
diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes[, "height"]
ggplot(just_height, aes(x = height)) + 
  geom_histogram()
#> Error: `data` must be a data frame, or other object coercible by `fortify()`, not an integer vector.

Created on 2022-02-05 by the reprex package (v2.0.1)

I am creating a histogram from a diabetes.csv file. The error message seems to be applicable for any dataset. I was expecting the histogram to display heights but ended up with an error message indicating 'data' must be a data frame. Does anyone know how I can solve this?

library(tidyverse)
diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes[, "height"]
ggplot(just_height, aes(x = height)) + 
  geom_histogram()
#> Error: `data` must be a data frame, or other object coercible by `fortify()`, not an integer vector.

I am looking to create a histogram illustrating heights within the diabetes data set. This code generates an error message saying that the data is not a data frame, coercible(?) by fortify(), or not an integer vector. Can anyone help me figure out what's needed for the desired result?

library(tidyverse)
diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes[, "height"]
ggplot(just_height, aes(x = height)) + 
  geom_histogram()
#> Error: `data` must be a data frame, or other object coercible by `fortify()`, not an integer vector.

I'm expecting the inside of the bars to be green, but it's only outlining the bars green. I also don't understand the warning regarding the 5 removed rows. Is it implying that some of the values were infinite?

library(tidyverse)

diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes["height"]
ggplot(just_height, aes(x = height)) + 
  geom_histogram(color = "green")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> Warning: Removed 5 rows containing non-finite values (stat_bin).

Created on 2022-02-08 by the reprex package (v2.0.1)

The dataframe just_height contains 2 NA values, which can be seen using na.omit:

nrow(just_height)
[1] 403 #Dataframe has 403 rows
nrow(na.omit(just_height))
[1] 398 #Only 398 rows if omitting NA values

That's what causes the non-finite values warning. For coloring the inside of the bars, choose fill. The warning about Addressing these two issues might look like this:

ggplot(na.omit(just_height), aes(x = height)) + 
   geom_histogram(fill = "green", color="black", bins=30)

Good luck in the course!

1 Like

Try leaving out the comma :
just_height <- diabetes[ "height"]

Hi @cwright1 -- As the title says, we're just practicing here. No need to answer questions

Thank you very much! :smiley:

diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes[, "height"]
ggplot(just_height, aes(x = height)) + 
  geom_histogram()
#> Error in ggplot(just_height, aes(x = height)): could not find function "ggplot"

Created on 2022-02-08 by the reprex package (v2.0.1)

Hi!
I was trying to create a histogram using this code and received the error in ggplot saying that it could not find function "ggplot". I have downloaded the library(tidyverse) and it said it attached the ggplot package but somehow my code is still giving me this error. How can I get this package to run? Thanks!

Hi @malcolm - practicing answers.

Best

Hello!
I wrote a code in R and was expecting to create a histogram using the height variable from my data set "diabetes". The code was running okay until I called ggplot function.I got an error that I data must be in data frame and not an integer vector. How can I fix this error?
Thank you for your help .

diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes[, "height"]
library(tidyverse)
ggplot(just_height, aes(x = height)) + 
  geom_histogram()
#> Error: `data` must be a data frame, or other object coercible by `fortify()`, not an integer vector.

Created on 2022-02-08 by the reprex package (v2.0.1)

Hi all!
I received this error in my code, and I can't seem to figure out why. I'm trying to create a histogram of the height variable, and I have installed the tidyverse package, so I am not sure why it cannot find the function "ggplot". Any ideas on how to fix this? Thanks!

diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes[, "height"]
ggplot(just_height, aes(x = height)) + 
  geom_histogram()
#> Error in ggplot(just_height, aes(x = height)): could not find function "ggplot"

Created on 2022-02-08 by the reprex package (v2.0.1)

I am creating a histogram from a diabetes.csv file. Does anyone know what happened to this?

library(tidyverse)
diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes[, "height"]
ggplot(na.omit(just_height), aes(x = height)) + 
  geom_histogram(bins=30)
#> Error: `data` must be a data frame, or other object coercible by `fortify()`, not an integer vector.

I was expecting the data to produce a histogram that displayed the height of the individual surveyed. Why wasn't the data presented in an acceptable data frame?

diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes[, "height"]
ggplot(just_height, aes(x = height)) + 
  geom_histogram()
#> Error in ggplot(just_height, aes(x = height)): could not find function "ggplot"

Created on 2022-02-09 by the reprex package (v2.0.1)

Blockquote

Hi everyone,
I am trying to build a histogram with the Github diabetes dataset by using the variable "heights." But I receive the error which said data must be a data frame, or other object coercible by fortify(), not an integer vector. What should I do?

library(tidyverse)
diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
just_height <- diabetes[, "height"]
ggplot(just_height, aes(x = height)) + 
geom_histogram()
#> Error: `data` must be a data frame, or other object coercible by `fortify()`, not an integer vector.

Created on 2022-02-09 by the reprex package (v2.0.1)

Hi, I am making a histogram displaying age using diabetes data found on github. However, I keep getting this error message stating that age is not found even though I see it in the R environment. How do I fix this error?

library(tidyverse)
diabetes <- read.csv("https://raw.githubusercontent.com/malcolmbarrett/au-stats412-612-01-reading_data/master/diabetes.csv")
       just_height <- diabetes["height"]
       ggplot(just_height, aes(x = age)) + 
         geom_histogram()
#> Error in FUN(X[[i]], ...): object 'age' not found

Created on 2022-02-10 by the reprex package (v2.0.1)

Thanks, everyone, for attempting reprexes, and welcome to RStudio Community!

Now, to fix the code!

First, a few of you missed an important detail: you didn't load ggplot2, so the error you got was about how R couldn't find that function. Remember, reprex runs R in a completely fresh session, so even if you've loaded a package in RStudio, it won't be available unless you include library() in your reprex code.

Technically, we could also make this example more minimal. The problem is not actually with the diabetes dataset, so we could use a built-in dataset to show the problem. However, our example is still reproducible (since the data is on GitHub and the code does read it correctly). Since you may not be sure if it's the data or code that's the problem, it's reasonable to include that dataset.

It turns out that the real issue is a quirk of base R. read.csv() returns a regular data frame, and when we try to subset a single column in a data frame, R converts the object to a vector, not a dataframe:

x <- data.frame(a = 1:5)

x
#>   a
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5

x[, "a"]
#> [1] 1 2 3 4 5

is.data.frame(x[, "a"])
#> [1] FALSE

ggplot() expects a data frame, not a vector. When you're dealing with base R, the solution is to set drop = FALSE, which keeps x as a data frame.

x[, "a", drop = FALSE]
#>   a
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5

is.data.frame(x[, "a", drop = FALSE])
#> [1] TRUE

Using this approach will fix our issue. I don't actually need the diabetes dataset here, so instead, I'll use the built-in cars dataset and make a histogram of the speed variable.

library(ggplot2)
just_speed <- cars[, "speed", drop = FALSE]
ggplot(just_speed, aes(x = speed)) + 
  geom_histogram()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Notably, the original code used read.csv(), which returns a regular data frame, but readr::read_csv() returns a tibble, a special case of the data frame. Tibbles don't have this behavior, and subsetting them always returns a tibble. We don't need drop = FALSE:

library(tidyverse)
y <- tibble(a = 1:5)
y[, "a"]
#> # A tibble: 5 x 1
#>       a
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4     4
#> 5     5

Using a tibble also solves our problem:

library(tidyverse)
cars <- as_tibble(cars)
just_speed <- cars[, "speed"]
ggplot(just_speed, aes(x = speed)) + 
  geom_histogram()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.