create a ggplot using two lists

I am trying to create histogram using ggplot of two lists. At the moment I am using the base function plot. Please can someone explain how to using ggplot? I would like the y axis to show the density.

data1=data.matrix(samples1)
data2=data.matrix(samples_cp)

hist(data1[,4], freq = F, col="blue", main ="1990 Prediction")
hist(data2[,4], freq = F, col="red", add=T)

The easiest way to plot the two distributions is ggplot is to combine the two data sets. How to do this combination depends on just how your data are structured. If you explain more about your data, someone might provide more specific advice.

library(ggplot2)
DF1 <- data.frame(A = rep("Dat1", 20), Value = rnorm(20))
DF2 <- data.frame(A = rep("Dat2", 20), Value = rnorm(20, 0.5, 1))
AllDat <- rbind(DF1, DF2)
ggplot(AllDat, aes(x = Value, fill = A)) + geom_histogram(binwidth = 0.2, position = "dodge")

Created on 2020-02-23 by the reprex package (v0.3.0)

Other possibilities might be to mimic adding two histograms, as you did, or using frequency polygons instead. I'll use @FJCC's defintions of DF1 and DF2, to illustrate.

The first results in one histogram obscuring the other:

DF1 <- data.frame(A = rep("Dat1", 20), Value = rnorm(20))
DF2 <- data.frame(A = rep("Dat2", 20), Value = rnorm(20, 0.5, 1))

library(ggplot2)
ggplot() +
  geom_histogram(aes(Value, y = ..density.., fill = 'DF1'), data = DF1) +
  geom_histogram(aes(Value, y = ..density.., fill = 'DF2'), data = DF2)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


By changing the transparency of the bars, this might be improved, or you could add position = "dodge" as @FJCC does. Frequency polygons allow for a different kind of visibility:

  
ggplot() +
  geom_freqpoly(aes(Value, y = ..density.., color = 'DF1'), data = DF1) +
  geom_freqpoly(aes(Value, y = ..density.., color = 'DF2'), data = DF2)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2020-02-23 by the reprex package (v0.3.0)

Thanks for your answer. The data is in a list format. Is it possible to convert straight to data frame?

It may be easy to convert your data from a list to a data frame, in fact a data frame is basically a list, but we would have to know the structure of the list. Try running the commands

str(samples1)
str(samples_cp)

When you paste the result into you reply, please place three back ticks, ```, on the lines before and after the output so that it gets formatted as code.

samples1

List of 4
 $ : 'mcmc' num [1:1000, 1:4] 0.437 0.461 0.471 0.562 0.511 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:4] "a1" "mu" "sd" "ypred"
  ..- attr(*, "mcpar")= num [1:3] 2002 4000 2
 $ : 'mcmc' num [1:1000, 1:4] 0.496 0.614 0.399 0.467 0.585 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:4] "a1" "mu" "sd" "ypred"
  ..- attr(*, "mcpar")= num [1:3] 2002 4000 2
 $ : 'mcmc' num [1:1000, 1:4] 0.665 0.509 0.353 0.563 0.568 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:4] "a1" "mu" "sd" "ypred"
  ..- attr(*, "mcpar")= num [1:3] 2002 4000 2
 $ : 'mcmc' num [1:1000, 1:4] 0.521 0.52 0.493 0.649 0.532 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:4] "a1" "mu" "sd" "ypred"
  ..- attr(*, "mcpar")= num [1:3] 2002 4000 2
 - attr(*, "class")= chr "mcmc.list"

str(samples_cp)

List of 4
 $ : 'mcmc' num [1:1000, 1:5] 0.206 0.292 0.154 0.251 0.108 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:5] "a1" "b1" "mu" "sd" ...
  ..- attr(*, "mcpar")= num [1:3] 10020 30000 20
 $ : 'mcmc' num [1:1000, 1:5] 0.306 0.16 0.317 0.204 0.367 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:5] "a1" "b1" "mu" "sd" ...
  ..- attr(*, "mcpar")= num [1:3] 10020 30000 20
 $ : 'mcmc' num [1:1000, 1:5] 0.369 0.133 0.133 0.181 0.235 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:5] "a1" "b1" "mu" "sd" ...
  ..- attr(*, "mcpar")= num [1:3] 10020 30000 20
 $ : 'mcmc' num [1:1000, 1:5] 0.165 0.318 0.254 0.264 0.254 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:5] "a1" "b1" "mu" "sd" ...
  ..- attr(*, "mcpar")= num [1:3] 10020 30000 20
 - attr(*, "class")= chr "mcmc.list"

I think

DF_samples1 <- dplyr::bind_rows(samples1)

will give you the data frame you want from that data set and you can run a similar command with the other list. You need to have the dplyr package, of course. You can then add a column to each data frame with a command like

DF_samples1$Source <- "Samples1"

Finally, bind the two data frame with

AllData <- dplyr::bind_rows(DF_samples1, DF_samples_cp)

When running: DF_samples1 <- dplyr::bind_rows(samples1)

I am getting an error message:

Error: Argument 1 must be a data frame or a named atomic vector, not a mcmc.list

This output is an mcmc.list from modelling using Bayesian techniques (well that's my guess since you haven't told us how it was produced). You will need special tools to process the output and visualize the results. Check out the {MCMCvis} and {coda} packages on CRAN which are designed to handle mcmc.list output.
HTH

Yes it is an MCMC output of Bayesian model

Try using

DF_samples1 = as.data.frame(data.matrix(samples1))
DF_samples_cp = as.data.frame(data.matrix(samples_cp))

DF_samples1$Source <- "samples1"
DF_samples_cp$Source <- "samples_cp"

AllData <- dplyr::bind_rows(DF_samples1, DF_samples_cp)

I think you can then use ggplot as previously explained.

Thanks for your help

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.