Histogram and density curve

anon15837621 · November 5, 2020, 4:05pm

If I have two columns, one represents depth and the other one represent an study site. In both columns I have 1000 rows. 1000 sites and 1000 data numbers. The first 500 pieces of information represent a different site than the other 500.

How can I split the data to get two histograms with a normal density curve for each site as a panel. I used this but this puts all the data in the same histogram.

library(lattice)
histogram(~depth, data = oceans, type= 'density',main="Ocean depth",
panel = function(x, ...){
panel.histogram(x, col = "blue", ...)
panel.densityplot(x, lwd = 2, col = 'red')
})

This picture is not my work, is just the idea but inly for two sites

https://www.google.com/url?sa=i&url=http%3A%2F%2Fzoonek2.free.fr%2FUNIX%2F48_R%2F04.html&psig=AOvVaw10s_1OQQejjBEOFXMLLege&ust=1604677327139000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCMDX46Di6-wCFQAAAAAdAAAAABAD

jms · November 5, 2020, 4:40pm

Hi!
you can add a grouping variable (in this case the column that contains the site information) to the histogram like this: histogram(~numeric | grouping, data=...).
Here is an example based on how I understood the structure of your oceans dataframe.


library(lattice)
oceans<-data.frame(depth=sample(100:10000,1000,replace = T),
                   Site=rep(c("Site A","Site B"),each=500))
head(oceans)
#>   depth   Site
#> 1  5685 Site A
#> 2  3390 Site A
#> 3  5016 Site A
#> 4  9608 Site A
#> 5  7843 Site A
#> 6  5412 Site A
histogram(~depth|Site, data = oceans, type= 'density',main="Ocean depth",
          panel = function(x, ...){
            panel.histogram(x, col = "blue", ...)
            panel.densityplot(x, lwd = 2, col = 'red')
          })

^{Created on 2020-11-05 by the reprex package (v0.3.0)}

anon15837621 · November 5, 2020, 4:45pm

Thanks but the Id of each site is 1 to 499, 500 to 1000. Each number is a depth in place 1 and 2. Place 1 has 500 places and the 2nd place has another 500, it is not letter as such.

jms · November 5, 2020, 4:50pm

I am not quite sure how to understand your data.
Could you please provide a small dataset that has the same structure as your original data? Something like 3-5 observations for each site would probably be enough.

anon15837621 · November 5, 2020, 4:55pm

Sure, 1 to 1000 is the data

Screenshot (36)

jms · November 5, 2020, 5:06pm

So rows 1:500 correspond to the first site and rows 501:1000 are the values for the second site, right?

anon15837621 · November 5, 2020, 5:08pm

yes, thats right ,,,,,,

jms · November 5, 2020, 5:33pm

Perfect!

Then you can just add a grouping variable, i.e. a column that indicates the site of the observation to your oceans data and then use the code I have shown you before.

If you just use this code:

oceans$Site<-rep(c("Site A","Site B"),each=500)

You will get a column called Site which contains 500 times "Site A" and 500 times "Site B". You can, of course, use other names or numbers for the site labels.
Here's the whole example:

library(lattice)
#generate dummy data
set.seed(123) 
oceans<-data.frame(ID=1:1000,
                   depth=rnorm(1000,mean = 100,sd=25))

head(oceans)
#>   ID     depth
#> 1  1  85.98811
#> 2  2  94.24556
#> 3  3 138.96771
#> 4  4 101.76271
#> 5  5 103.23219
#> 6  6 142.87662
# add Site column as grouping variable
oceans$Site<-rep(c("Site A","Site B"),each=500)

#plot the histogram
histogram(~depth|Site, data = oceans, type= 'density',main="Ocean depth",
          scales=list(alternating=FALSE),
          panel = function(x, ...){
            panel.histogram(x, col = "blue", ...)
            panel.densityplot(x, lwd = 2, col = 'red')
          }
          )

^{Created on 2020-11-05 by the reprex package (v0.3.0)}

anon15837621 · November 5, 2020, 6:12pm

work great, fast question, what you recommend to select the correct number in mean and sd ?

height=rnorm(926,mean = 100,sd=25))

jms · November 5, 2020, 6:28pm

I used to rnorm() function to generate some dummy data since I did not have access to your actual data. The mean and standard deviation I have used there were rather arbitrarily, I just chose 100 and 25 because I thought that maybe an appropriate magnitude for depth data.

But since you have your own data with actual measurements, I don't see the reason why you want to use rnorm.

anon15837621 · November 5, 2020, 6:32pm

Thank you for your time, was a great help in this learning process.

system · November 12, 2020, 6:32pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.