 # Histogram and density curve

If I have two columns, one represents depth and the other one represent an study site. In both columns I have 1000 rows. 1000 sites and 1000 data numbers. The first 500 pieces of information represent a different site than the other 500.

How can I split the data to get two histograms with a normal density curve for each site as a panel. I used this but this puts all the data in the same histogram.

library(lattice)
histogram(~depth, data = oceans, type= 'density',main="Ocean depth",
panel = function(x, ...){
panel.histogram(x, col = "blue", ...)
panel.densityplot(x, lwd = 2, col = 'red')
})

This picture is not my work, is just the idea but inly for two sites Hi!
you can add a grouping variable (in this case the column that contains the site information) to the histogram like this: `histogram(~numeric | grouping, data=...)`.
Here is an example based on how I understood the structure of your `oceans` dataframe.

``````
library(lattice)
oceans<-data.frame(depth=sample(100:10000,1000,replace = T),
Site=rep(c("Site A","Site B"),each=500))
#>   depth   Site
#> 1  5685 Site A
#> 2  3390 Site A
#> 3  5016 Site A
#> 4  9608 Site A
#> 5  7843 Site A
#> 6  5412 Site A
histogram(~depth|Site, data = oceans, type= 'density',main="Ocean depth",
panel = function(x, ...){
panel.histogram(x, col = "blue", ...)
panel.densityplot(x, lwd = 2, col = 'red')
})
`````` Created on 2020-11-05 by the reprex package (v0.3.0)

1 Like

Thanks but the Id of each site is 1 to 499, 500 to 1000. Each number is a depth in place 1 and 2. Place 1 has 500 places and the 2nd place has another 500, it is not letter as such.

I am not quite sure how to understand your data.
Could you please provide a small dataset that has the same structure as your original data? Something like 3-5 observations for each site would probably be enough.

Sure, 1 to 1000 is the data So rows `1:500` correspond to the first site and rows `501:1000` are the values for the second site, right?

yes, thats right ,,,,,,

Perfect!

Then you can just add a grouping variable, i.e. a column that indicates the site of the observation to your oceans data and then use the code I have shown you before.

If you just use this code:

``````oceans\$Site<-rep(c("Site A","Site B"),each=500)
``````

You will get a column called `Site` which contains 500 times `"Site A"` and 500 times `"Site B"`. You can, of course, use other names or numbers for the site labels.
Here's the whole example:

``````library(lattice)
#generate dummy data
set.seed(123)
oceans<-data.frame(ID=1:1000,
depth=rnorm(1000,mean = 100,sd=25))

#>   ID     depth
#> 1  1  85.98811
#> 2  2  94.24556
#> 3  3 138.96771
#> 4  4 101.76271
#> 5  5 103.23219
#> 6  6 142.87662
# add Site column as grouping variable
oceans\$Site<-rep(c("Site A","Site B"),each=500)

#plot the histogram
histogram(~depth|Site, data = oceans, type= 'density',main="Ocean depth",
scales=list(alternating=FALSE),
panel = function(x, ...){
panel.histogram(x, col = "blue", ...)
panel.densityplot(x, lwd = 2, col = 'red')
}
)
`````` Created on 2020-11-05 by the reprex package (v0.3.0)

2 Likes

work great, fast question, what you recommend to select the correct number in mean and sd ?

height=rnorm(926,mean = 100,sd=25))

I used to `rnorm()` function to generate some dummy data since I did not have access to your actual data. The mean and standard deviation I have used there were rather arbitrarily, I just chose 100 and 25 because I thought that maybe an appropriate magnitude for depth data.

But since you have your own data with actual measurements, I don't see the reason why you want to use `rnorm`.

1 Like

Thank you for your time, was a great help in this learning process.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.