Is it possible to add a gauss curve in a histogram plot?

Hello R community.
I have 3 question regarding histogram a plot. I'm trying to add the line but I cant make it.
I wrote this:

normality_data<- read.table("/home/kgee/Desktop/data", fill=TRUE)

r<-hist(normality_data$V1,span,right=FALSE,col="pink",
main = "frequency distribution plot" ,cex.main=1.5,
xlab=" Expected values ",cex.lab=1.3,xaxt="n")
axis(1, at=r$mids, labels=levels(span))

which works .

r contains:
breaks :-> double [171]
courts :-> integer [170]
density : -> double [170]
mids :-> double [170]
xname ;-> character[1]
equidist :-> logical[1]

and normality_data contains one column of 10000 entries.

i tried these :

points(seq(min(span), max(span), length.out=500),
               dnorm(seq(min(span), max(span), length.out=500),
                     mean(span), sd(span)), type="l", col="blue")

 xfit<-seq(min(normality_data),max(normality_data),length=40)
        yfit<dnorm(xfit,mean=mean(normality_data$V1),sd=sd(normality_data$V1))
 yfit <- yfit*diff(r$mids)*length(normality_data$V1)
 lines(xfit, yfit, col="blue", lwd=2) 


lines(r, lty = 8, border = "black")
        lines (span,r$breaks)
    

lines(density(),col="black",lwd=4)

First question: Is it possible to add a line in this kind of plot ? ( Im trying not to use ggplot so I am wondering if / and how can I make the Gauss curve in this plot))
Second question : Is it possible to information on the x axis like the image below? (μ+2σ etc)
image
Third question: If the answer of the first question is positive, Is it possible to combine multiple lines in one plot?
Thanks in advance.

Hi @KGee: I'm sure there are folks who could help you find a base R solution, but I was curious to know whether you'd be open to a solution that uses the tidyverse package (which is what I'm more familiar with). In either case, could you post a sample of your data?

One way to do this would be to apply the dput() function to your data and then paste the output here, between a pair of triple backticks (```), like this:

```
[paste output of dput(head(normality_data, 50)) here]
```

That would help folks be able to recreate your work, and so be better able help you.

Sure! But first of all thank you the response.
this is a sample of my input:

 dput(head(normality_data, 50))
structure(list(V1 = c(49119L, 2799L, 19533L, 19059L, 24123L, 
-8553L, -3921L, 2199L, 14559L, 26403L, -2787L, -40191L, -273L, 
-53265L, 20661L, -4113L, 16359L, 7911L, 10083L, -10731L, -8661L, 
16701L, 10605L, 16107L, -2223L, -30225L, -12333L, 1761L, 22755L, 
-3525L, 14649L, 17451L, -28893L, 6351L, -11073L, 27279L, 20769L, 
13521L, -399L, -9393L, 21255L, -165L, -7431L, 8409L, 28287L, 
11331L, -18675L, -4713L, -15567L, -1749L)), .Names = "V1", row.names = c(NA, 
50L), class = "data.frame")

You're welcome! I just realized you specifically asked for a non-ggplot solution, so I'm afraid I won't be able to help further, but I did have a question to confirm what you're looking for: Should the Gaussian use the sample mean and standard deviation, or did you have something else in mind?

Im sorry if didn't express myself well. I tried to be analytical and maybe I "missed the target" . I will upload my output and I will give an example about what I want.
This is my output:


and I want to add in my output:

  1. The curve
  2. and if it possible the x axis with the Greek letters (above my axis as supplementary) as the [picture] (http://www.leansixsigmadefinition.com/wp-content/uploads/2019/04/gaussian_distribution.png)
  3. Because I have multiples plots like my if output I was wondering if there is a way to combine them ( only the curves) in one graph.

Sorry -- I missed your definition of xfit in your original post, which makes it clear you want the mean and standard deviation of the Gaussian to come from the sample.

  1. and 2. should be possible, but wanted to make sure I understood what you meant in 3.: One plot of several Gaussians?

1 --> I want to connect the histogram with a line so as to have the Gaussian curve.
2 --> I would like to add σ σ2 σ3 etc on the x axis above my actual x axis.
3 --> Exactly.

Here's a specific solution for Q1:

normality_data <- c(49119L, 2799L, 19533L, 19059L, 24123L, -8553L, -3921L, 2199L, 14559L, 26403L, -2787L, -40191L, -273L, -53265L, 20661L, -4113L, 16359L, 7911L, 10083L, -10731L, -8661L, 16701L, 10605L, 16107L, -2223L, -30225L, -12333L, 1761L, 22755L, -3525L, 14649L, 17451L, -28893L, 6351L, -11073L, 27279L, 20769L, 13521L, -399L, -9393L, 21255L, -165L, -7431L, 8409L, 28287L, 11331L, -18675L, -4713L, -15567L, -1749L)

hist(x = normality_data,
     breaks = 10,
     freq = FALSE,
     right = FALSE,
     col = "pink",
     main = "Histogram of Normality Data",
     xlab = "Normality Data",
     xaxt = "n")
curve(expr = dnorm(x = t,
                   mean = mean(x = normality_data),
                   sd = sd(x = normality_data)),
      from = min(normality_data),
      to = max(normality_data),
      n = 500,
      add = TRUE,
      xname = "t",
      col = "blue")

For Q3, you can add as many curve calls as you want afterwards to add more plots, if that is the question. I may have misinterpreted.

Q2 is not clear to me.

  1. If it is to add vertical or horizontal lines in the plot, that is doable. Use segments, as you want.
  2. If it is to add specific tick marks in horizontal axis, you can use axis.
  3. If it is to do both of the above and to use custom labels instead of the values at ticks, then I do not know a solution.

Hope this helps, at least partially. :slight_smile:

1 Like

I know you're looking for a non-ggplot solution, @KGee, but just in case you're curious or might change your mind, here's a start, without the text from your linked image:

normality_data <- 
  structure(list(V1 = c(49119L, 2799L, 19533L, 19059L, 24123L, 
                        -8553L, -3921L, 2199L, 14559L, 26403L, -2787L, -40191L, -273L, 
                        -53265L, 20661L, -4113L, 16359L, 7911L, 10083L, -10731L, -8661L, 
                        16701L, 10605L, 16107L, -2223L, -30225L, -12333L, 1761L, 22755L, 
                        -3525L, 14649L, 17451L, -28893L, 6351L, -11073L, 27279L, 20769L, 
                        13521L, -399L, -9393L, 21255L, -165L, -7431L, 8409L, 28287L, 
                        11331L, -18675L, -4713L, -15567L, -1749L)), .Names = "V1", row.names = c(NA, 
                                                                                                 50L), class = "data.frame")
### end of 'structure()' command
library(tidyverse)

# collect mean, standard deviation points, and max value of desired gaussian
mu <- normality_data %>% pull(V1) %>% mean()
sigma <- normality_data %>% pull(V1) %>% sd()
sigmas <- seq(mu - 3 * sigma, mu + 3 * sigma, sigma)
gauss_top <- dnorm(mu, mean = mu, sd = sigma)

# set heights of lines to add at each sigma
sigma_heights <- gauss_top + abs(-3:3) * gauss_top / 8

normality_data %>% 
  ggplot(aes(V1)) +
  geom_histogram(aes(y = ..density..)) +
  scale_x_continuous(
    sec.axis = 
      dup_axis(
        name = NULL,
        breaks = sigmas,
        labels = 
          c(quote(mu - 3*sigma), 
            quote(mu - 2*sigma), 
            quote(mu - sigma), 
            quote(mu), 
            quote(mu + sigma), 
            quote(mu + 2*sigma) ,
            quote(mu + 3*sigma) 
          )
      )
  ) +
  stat_function(fun = dnorm, args = list(mean = mu, sd = sigma)) +
  geom_segment(
    data = 
      tibble(sigma = sigmas, height = sigma_heights),
    aes(x = sigma, xend = sigma, y = 0, yend = height)
  )
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2020-03-09 by the reprex package (v0.3.0)

2 Likes

Thank you very much for the answer. It works . I have my curve now :smiley: Also thnx for the suggestions. I have to "play" with axis segments and axis then. Maybe I wasn't clear about what I was looking and I make It a bit bit complicated but as I saw @dromano gave me exactly my Q2 output.

You gave me EXACTLY what I was looking for, even if you used ggplot. Thanks a lot. I will try to edit ggplot scale_x_continuous to fit in my histogram. At least you help me understanding the logic behind sigma. But once again thnx alot!!!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.