Histogram and density curve

If I have two columns, one represents depth and the other one represent an study site. In both columns I have 1000 rows. 1000 sites and 1000 data numbers. The first 500 pieces of information represent a different site than the other 500.

How can I split the data to get two histograms with a normal density curve for each site as a panel. I used this but this puts all the data in the same histogram.

library(lattice)
histogram(~depth, data = oceans, type= 'density',main="Ocean depth",
panel = function(x, ...){
panel.histogram(x, col = "blue", ...)
panel.densityplot(x, lwd = 2, col = 'red')
})

This picture is not my work, is just the idea but inly for two sites

R
https://www.google.com/url?sa=i&url=http%3A%2F%2Fzoonek2.free.fr%2FUNIX%2F48_R%2F04.html&psig=AOvVaw10s_1OQQejjBEOFXMLLege&ust=1604677327139000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCMDX46Di6-wCFQAAAAAdAAAAABAD

Hi!
you can add a grouping variable (in this case the column that contains the site information) to the histogram like this: histogram(~numeric | grouping, data=...).
Here is an example based on how I understood the structure of your oceans dataframe.


library(lattice)
oceans<-data.frame(depth=sample(100:10000,1000,replace = T),
                   Site=rep(c("Site A","Site B"),each=500))
head(oceans)
#>   depth   Site
#> 1  5685 Site A
#> 2  3390 Site A
#> 3  5016 Site A
#> 4  9608 Site A
#> 5  7843 Site A
#> 6  5412 Site A
histogram(~depth|Site, data = oceans, type= 'density',main="Ocean depth",
          panel = function(x, ...){
            panel.histogram(x, col = "blue", ...)
            panel.densityplot(x, lwd = 2, col = 'red')
          })

Created on 2020-11-05 by the reprex package (v0.3.0)

1 Like

Thanks but the Id of each site is 1 to 499, 500 to 1000. Each number is a depth in place 1 and 2. Place 1 has 500 places and the 2nd place has another 500, it is not letter as such.

I am not quite sure how to understand your data.
Could you please provide a small dataset that has the same structure as your original data? Something like 3-5 observations for each site would probably be enough.

Sure, 1 to 1000 is the data

Screenshot (36)

So rows 1:500 correspond to the first site and rows 501:1000 are the values for the second site, right?

yes, thats right ,,,,,,

Perfect!

Then you can just add a grouping variable, i.e. a column that indicates the site of the observation to your oceans data and then use the code I have shown you before.

If you just use this code:

oceans$Site<-rep(c("Site A","Site B"),each=500)

You will get a column called Site which contains 500 times "Site A" and 500 times "Site B". You can, of course, use other names or numbers for the site labels.
Here's the whole example:

library(lattice)
#generate dummy data
set.seed(123) 
oceans<-data.frame(ID=1:1000,
                   depth=rnorm(1000,mean = 100,sd=25))

head(oceans)
#>   ID     depth
#> 1  1  85.98811
#> 2  2  94.24556
#> 3  3 138.96771
#> 4  4 101.76271
#> 5  5 103.23219
#> 6  6 142.87662
# add Site column as grouping variable
oceans$Site<-rep(c("Site A","Site B"),each=500)

#plot the histogram
histogram(~depth|Site, data = oceans, type= 'density',main="Ocean depth",
          scales=list(alternating=FALSE),
          panel = function(x, ...){
            panel.histogram(x, col = "blue", ...)
            panel.densityplot(x, lwd = 2, col = 'red')
          }
          )

Created on 2020-11-05 by the reprex package (v0.3.0)

2 Likes

work great, fast question, what you recommend to select the correct number in mean and sd ?

height=rnorm(926,mean = 100,sd=25))

I used to rnorm() function to generate some dummy data since I did not have access to your actual data. The mean and standard deviation I have used there were rather arbitrarily, I just chose 100 and 25 because I thought that maybe an appropriate magnitude for depth data.

But since you have your own data with actual measurements, I don't see the reason why you want to use rnorm.

1 Like

Thank you for your time, was a great help in this learning process.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.