plotting multiple dnorm curve in a single plot or in facet_warp ?

--Hi all,

i have a dataframe such as:

label MEAN SD
a 2.065674 0.09413228
b 1.901538 0.09447044
c 1.915913 0.08947252
d 1.982803 0.08588928
e 1.879955 0.100583
f 2.013313 0.08939728
g 2.024716 0.08721181
h 2.049955 0.1066361
i 1.987234 0.08739622
j 1.914976 0.0890025
k 1.917258 0.07917008
l 1.997179 0.09614308
m 1.896392 0.09042831
n 1.989861 0.08303668
o 2.036034 0.09770369
p 2.04184 0.09616447
q 1.81552 0.09031452
r 2.144952 0.1205006
s 1.761111 0.0778553
t 1.80824 0.09179853
u 2.141951 0.1218049
v 2.063237 0.08188441
w 2.077294 0.1216493
s 1.860908 0.1058656

and i want to plot all the dnom curves first on a single plot and then plot all curves in a facet_warp.
but i don't find easy soluces.
Help ?

thanks


1 Like

Hi @mslider try to put a reproducible example of data.

Do you need facet in label?

--Hi,

i have already put the dataset. See dataframe.
I don't have code, this is why i have posted my question. I'm just waiting help to write a piece of code to plot the curves.

mslider

The first request, plotting all of the densities in one plot, can easily be done with purrr::pmap() as explained by the fabulous Albert Rapp in this blog post with toddler drawing in ggplot2:

# Set up some Data
Data <- tibble::tibble(
  label = letters[1:26],
  mean  = rnorm(26,2,.5),
  sd    = rnorm(26,0.1,0.05)
)
# load ggplot2
library('ggplot2')
# create the density curves
layers <- Data |>
  purrr::pmap(~ stat_function(
    mapping = aes(col = ..1),
    fun = dnorm, args = list(mean = ..2, sd = ..3),
    xlim = c(0,3)
  ))
# plot all at once
ggplot() +
  layers

The second one is less obvious and also more tidious, but works as follows:

### With facet_wrap
Data |>
  # extract mean and sd rowwise
  dplyr::rowwise() |>
  # add the density and corresponding xvals as list columns
  dplyr::mutate(
    xvals   = list(seq.default(0,3,length.out = 100)),
    density = list(dnorm(seq.default(0,3,length.out = 100),
                         mean = mean, sd = sd))) |>
  # unnest the list columns
  tidyr::unnest(cols = 4:5) |>
  ggplot() + 
  geom_line(aes(xvals,density, col = label)) +
  facet_wrap(~label)

If you omit the facet_wrap(~label) at the end you also have you all at once plot, but it doesn't work with stat_function but instead geom_line in this case, which may be (or may be not) less preferable for actual function drawings.

Kind regards

--Hi,

i don't understand in the first plot why you recompute mean and sd with:

mean  = rnorm(26,2,.5),
 sd    = rnorm(26,0.1,0.05)

mean and sd are already fixed in the dataset such as:
label MEAN SD
a 2.065674 0.09413228
b 1.901538 0.09447044
c 1.915913 0.08947252
d 1.982803 0.08588928
e 1.879955 0.100583
....


Because you did not provide it in a correct way, like e.g. with dput() to make it easy employing your actual data.
Since Iam not willing to copy paste it and create the data by myself or to load the (incorrect) data with read.table(), I just made sample data up by myself to have a quickly usable data.frame. you can simply ignore this part and employ the rest for your problem or provide your data in an appropriate way, so I can just create the data in my environment.

-Hi,
i have try this code below to make the plot using facet_wrap but the curves are not displayed

library(tibble)
library("dplyr")
df <- data.frame(SDMEAN)
df <- tibble::rownames_to_column(df, "chr")

df
   chr     MEAN         SD
1   1p   2.0644  0.1004449
2   1q 1.895831 0.09315114
3   2q 1.915312 0.08752741
4   3q 1.982091 0.08549012
5   4q 1.875531 0.09817661
6   5q 2.010669 0.09015205
7   6q 2.025745  0.0879155
8   7p 2.045913  0.1037064
9   7q 1.985089 0.08629556
10  8q 1.910797 0.08815759
11  9q 1.913497 0.08001214
12 11p 1.995751 0.09542187
13 12p 1.894449 0.08874608
14 13q 1.989719 0.08154688
15 14q 2.033255 0.09670343
16 15q 2.040735 0.09386254
17 16q 1.814215  0.0896026
18 17p  2.14339  0.1192855
19 17q  1.75875 0.07775115
20 18q 1.803201 0.09080507
21 19p 2.138235  0.1212603
22 20q 2.127806  0.1104077
23 22q 2.074763  0.1211784
24  Xp 1.857411   0.105657

library(ggplot2)

pg <- ggplot(df, aes(x = chr))
pg <- pg + geom_density()
pg <- pg + stat_function(fun=dnorm, colour='red', args=list(mean=df$MEAN, sd=df$SD))
pg <- pg + facet_wrap(~chr)

something missing but i don't know where .

thanks

Preliminary
You did again not provide your data in a usable way. Hence I only copy your first three values.
To show your data in a better way, which can be used by all those people willing to help you, use this in the future:

# how to reliably share your data
dput(input)
#> structure(list(chr = c("1p", "1q", "2q"), MEAN = c(2.0644, 1.895831, 
#> 1.915312), SD = c(0.1004449, 0.09315114, 0.08752741)), class = "data.frame", row.names = c(NA, 
#> -3L))

As you can see from your code, you do indeed get somewhat like a dnorm plot, but it contains all three means and sd (in my case) at once and jumps up and down. Thats your problem and since you define the same plot for all groups, facet_wrap() just prints the same plot for all labels

# load ggplot2
library("ggplot2")

# your code
ggplot(input, aes(x = chr)) +
  # geom_density() + # unnecessary
  stat_function(fun=dnorm, colour='red', args=list(mean=input$MEAN, sd=input$SD)) +
  facet_wrap(~chr)

Created on 2022-12-14 by the reprex package (v2.0.1)

As I have shown in my last posting, this is how you actually should do it with facet_wrap():

# again as in my previous post
input |>
  # extract mean and sd rowwise
  dplyr::rowwise() |>
  # add the density and corresponding xvals as list columns
  dplyr::mutate(
    xvals   = list(seq.default(0,3,length.out = 100)),
    density = list(dnorm(seq.default(0,3,length.out = 100),
                         mean = MEAN, sd = SD))) |>
  # unnest the list columns
  tidyr::unnest(cols = 4:5) |>
  ggplot() + 
  geom_line(aes(xvals,density, col = chr)) +
  facet_wrap(~ chr)

Created on 2022-12-14 by the reprex package (v2.0.1)

Kind regards

Of course, I'm going to provide the list of my data in the form of a csv file named here: test.csv
the file format is as follows:

label;MEAN;SD
A;2.0644;0.1004449
B;1.895831;0.09315114
C;1.915312;0.08752741
D;1.982091;0.08549012
E;1.875531;0.09817661
F;2.010669;0.09015205
G;2.025745;0.0879155
H;2.045913;0.1037064
I;1.985089;0.08629556
J;1.910797;0.08815759
K;1.913497;0.08001214
L;1.995751;0.09542187
M;1.894449;0.08874608
N;1.989719;0.08154688
O;2.033255;0.09670343
P;2.040735;0.09386254
Q;1.814215;0.0896026
R;2.14339;0.1192855
S;1.75875;0.07775115
T;1.803201;0.09080507
U;2.138235;0.1212603
V;2.127806;0.1104077
W;2.074763;0.1211784
X;1.857411;0.105657

### then i use theR code to load the data:
data=read.table("test.csv", header = TRUE, sep = ";", dec = ".",row.names=1)
df <- tibble::rownames_to_column(data, "labels")

Hello mslider,

I am a bit puzzled right now and just want to ask you a question: What do you expect me to answer (since the data "problem" was just a minor sidenote for your future requests and not the main problem)? Your question got answered in one of my posts above, technically twice. If you have any problems with the code, please provide the corresponding error messages you receive or describe, what is not like you would have expected it.

Kind regards

the problem here is that we generate random values ( seq.default(0,3,length.out = 100) to calculate the density.
Is it possible to use dplyr::colwise() if the input data has this format:

A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;R;S;T;U;V;W;X
0.00;1.95;1.88;2.02;2.02;1.96;2.04;2.00;1.90;2.07;0.00;1.91;1.99;0.00;2.02;1.98;1.84;1.93;0.00;0.00;2.03;1.94;1.93;1.90
0.00;2.05;2.09;1.98;1.98;2.04;1.96;1.99;2.08;1.93;1.98;2.06;1.97;0.00;1.99;2.02;1.95;2.02;0.00;1.93;1.96;2.00;1.97;2.04
0.00;2.05;1.82;2.04;1.92;1.97;1.98;1.86;1.90;2.04;1.90;1.86;1.90;0.00;1.96;1.86;0.00;1.95;1.91;1.93;1.95;1.95;1.95;1.95
0.00;1.95;1.75;2.01;1.82;2.13;2.02;2.16;2.06;2.00;1.85;2.03;1.92;0.00;2.05;2.14;1.75;2.30;1.73;1.72;2.27;2.21;2.14;1.96

here we need to compute mean and sd from each column then we use the values from the columns to compute dnorm such as: density = list(dnorm(column A, mean=mean(columnA,na.rm=TRUE), sd=sd(columnA,na.rm=TRUE)))

Simulating the range is basically the same what stat_function does if you look at the source code of StatFunction on GitHub. So why would you not do that?

We are not simulating random values as you have said. We create the range with seq.default() and the corresponding density values (which are not random at all) with dnorm(). There is exactly 0 randomness in this code. Maybe you are misunderstanding one of the base R functions for the normal distribution?

in density formula density = list(dnorm ... how to replace seq.default(0,3,length.out = 100) by the list of values from each column ?

I think we are running in circles. There is no need to use density or whatever you try to do. You can just use the working code from above, which does precisely what you required. It plots all specified density with mean and standard deviation from your data.frame and facets it by the chr column.

If you don't want the 0 and 3 as placeholder values, you can use something like MEAN - 3*SD and MEAN + 3*SD to adjust it dinamically around the center of your distribution. But that's it - I think you are overcomplicating it really with what you are trying to do (or I am totally misunderstanding what you say, but I cannot make any more sense of it).

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.