 # How to create this plot in ggplot ?

Hi, first I would like to add a legend to this plot in order to look like that: I have gone this far:

``````x3 <- seq(0, 20, by = 0.001)
x <- rgamma(1000, scale = 1, shape = 2.3)
d <- c(x+max(x), max(x)-x)
plot(ecdf(d), do.points=FALSE, verticals=TRUE, xlim=c(5,20), main="eCDF(d) betwen 5-20")

lines(x3, pnorm(x3, mean=mean(d), sd=sqrt(var(d))), lty=3, col = "red")
``````

which gives me this:

Please help me to add a legend to this plot: - normal, -examined, like on a picture above.
Second I would like to recreate this base R plot in ggplot, how do I do it ?
regards,

Hi, I tried to recreate this plot in ggplot:

``````x1 <- seq(5.001, 20, by = 0.001)

x3 <- seq(0, 20, by = 0.001)

x2 <- seq(5, 20, by = 0.01)

x <- rgamma(1000, scale = 1, shape = 2.3)

d <- c(x+max(x), max(x)-x)

ggplot(d1, aes(d)) + stat_ecdf(geom = "line", color = "blue", size=0.5) + xlim(5, 20) +
ggplot(x3,aes(x2)) +
stat_function(fun = pnorm, args = list(mean=mean(d), sd=sqrt(var(d)))) + ylab("Fn(x)")
``````

But what I have accomplished was two separate ggplots.

When I tried to do it on a single plot I kept receiving errors. How to superimpose these plots or melt them to one plot ? In other words how to draw two eCDFs (actually one eCDF and one CDF) on a single plot ?
Any help will be greatly appreciated.

You've called ggplot twice.

I think of ggplot as saying get a piece of graph paper. So you asked for two pieces of graph paper so you drew two plots.

ggplot() +
geom_line() + #draw a line on the graph
geom_line() # draw another line on the same graph.

So in your case;

ggplot ()+
stat_ecdf () +
stat_function()

I don't quite understand the maths etc of what you are doing, so I can't comment on the actual plot but if x2 needs to be in an aes for stat_function put the aes inside stat_function(aes(x2))

The base legend. A lot of people might simply use a small line and text. Have you tried using legend()?

Why are you trying to draw the same thing in base and in ggplot?

Thank you,
So I have done:

``````ggplot(d1, aes(d)) + stat_ecdf(geom = "line", color = "blue", size=0.5) + xlim(5, 20) +
stat_function(fun = pnorm, args = list(mean=mean(d), sd=sqrt(var(d)))) + ylab("Fn(x)") + ggtitle("eCDF(d) betwen 5-20")
``````

which gives me this:

I got a warning as well: "Warning message:
Removed 22 rows containing non-finite
values (stat_ecdf)."
I do not know how important is this ?

Now, my question is how to add a legend to this plot about those lines eg: line1 (blue), line2(black) ?

Because I want to learn how to do this in both.
regards,

This shows that 22 rows of data can't be plotted. Since you are learning both you are presumably trying to figure out which is best. Did base R tell you it couldn't plot 22 values? Thought not! That's not because it could (they are non-finite) - it just didn't tell you. OK. Fixes ? 1. Look at the data and understand why. I suspect it's to do with points at zero. In which was filtering those off would help avoid the error. Or you say it's just a warning and carry on.

OK. Legends. Ggplot can't make up it's mind what to call them. Legends and guides. And Ggplot puts them outside the plot by default. Let's start with that.

Add this to the plot

+scale_colour_manual("Legend title", values = c("ECDF" = "red", "blue"))

Now on each of the stat_ add inside the aes() colour="the name of line", and remove the colour e.g.

``````stat_ecdf(geom = "line", size=0.5, aes(colour="ECDF")) +

ggplot(d1, aes(d)) +
stat_ecdf(geom = "line", size=0.5, aes(colour="ECDF") +
xlim(5, 20) +
stat_function(fun = pnorm, args = list(mean=mean(d), sd=sqrt(var(d))), aes(colour="FUNC") +
ylab("Fn(x)") +
ggtitle("eCDF(d) betwen 5-20")+
scale_colour_manual("Legend title", values = c("ECDF" = "red", "FUNC" = "blue"))
``````

If that works then you can position the legend using:

• theme(legend.position = c(5, .95))

Thank you very much for your time,
I have used your code and added a few parenthesis (RStudio was waiting for them):

``````ggplot(d1, aes(d)) +
stat_ecdf(geom = "line", size=0.5, aes(colour="ECDF")) +
xlim(5, 20) +
stat_function(fun = pnorm, args = list(mean=mean(d), sd=sqrt(var(d))), aes(colour="FUNC")) +
ylab("Fn(x)") +
ggtitle("eCDF(d) betwen 5-20")+
scale_colour_manual("Legend title", values = c("ECDF" = "red", "FUNC" = "blue"))
``````

that gives me:

so if you compare a previous plot, they are different:

Why is that ? Lines are a bit off.

Yes, I was curoius, because base R did not tell about it anything. Thank you for the explanation.

You know rgamma is generating random numbers? so if you re-ran the x <- rgamma step you get new numbers?

``````> x <- rgamma(1000, scale = 1, shape = 2.3)
> summary(x)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.1052  1.1477  2.0469  2.3103  3.0727 12.5775
> x <- rgamma(1000, scale = 1, shape = 2.3)
> summary(x)
Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.01617  1.17184  1.91189  2.30493  3.08347 11.37628
``````

your code isn't full reproducable as you dont define d1.

If you want to stop random numbers being "random" use seed(1000) before setting the values and it will keep them constant:

``````> set.seed(1000)
> x <- rgamma(1000, scale = 1, shape = 2.3)
> summary(x)
Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.04251  1.17330  1.99847  2.30789  2.98665 11.24993
> set.seed(1000)
> x <- rgamma(1000, scale = 1, shape = 2.3)
> summary(x)
Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.04251  1.17330  1.99847  2.30789  2.98665 11.24993
``````

Yes, thank you, you are right:

``````options(scipen=99999)
set.seed(2222)

x <- rgamma(1000, scale = 1, shape = 2.3)

d <- c(x+max(x), max(x)-x)

d1 <- as.data.frame(d)
``````

Thank you again for solution.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.