Mean for CTD profil

Hello
I would like to do a Temp and Salinity profil for my different Area
So I need to do, for each my area the meaning of station. Like for my Area A1, I want to have only one depth at 10m.
For example for the Temp of A1 do the mean using the Temp of my Station1 at 10m and the Temp of my Station2 at 10m
And I would like to do that for all my station and for both Temperature and Salinity

df <- tribble(
  ~Area, ~station, ~depth, ~Temp, Sal
  "A1","St1",10, 2.5, 34.58,
  "A1","St1",20, 2.8, 34.78,
  "A1","St1",30, 2.6, 34.61,
  "A1","St2",10, 2.6,  34.25,
  "A1","St2",20, 2.48, 35.02,
  "A1","St2",30, 2.54, 34.74,
  "A2","St3",10, 2.8, 34.61,
  "A2","St3",20, 2.74,  35.13,
  "A2","St3",30, 3.05,  33.89,
  "A2","St4",10, 3.18,  34023,
  "A2","St4",20, 1.05,  34.11,
  "A2","St4",30, 2.06, 35.17,
  "A3","St5",10, 1.26,  35.06,
  "A3","St5",20, 3.15,  34.99,
  "A3","St5",30, 2.87,  35.32,
)

I hope I was clear
Thanks

And if I have a "date" column with multiple date for one station, how can I do the mean of my temperature/salinity according to my depth for all the date of my station. Like have only one value row at 10m and that row would be the mean of all the value at 10m of all the dates

As a beginner to R you may benefit from studying this useful book.
https://r4ds.had.co.nz/
Particularly chapter 5, which teaches how to do these kinds of manipulations.

your solution will be to group_by() Area and depth, and summarise() taking the average i.e mean() of temperature, and taking mean()of salinity.

1 Like

Thanks for you response
It's what I did before but when I want to plot my profil with geom_line after, I have this :

df <- df%>% 
  group_by(Area, depth) %>% 
  summarise(AvgTemp = mean(Temp), AvgSal = mean(Sal))

df %>% 
  ggplot(aes( x=AvgTemp ,y=depth, color=Area)) +
  geom_line(size=2) +  
  ylab(label = "Depth(m)") + 
  xlab(label = "Temperature(°C)") + 
  scale_y_reverse() 

from your example data and your code:

df <- tribble(
  ~Area, ~station, ~depth, ~Temp, ~Sal,
  "A1","St1",10, 2.5, 34.58,
  "A1","St1",20, 2.8, 34.78,
  "A1","St1",30, 2.6, 34.61,
  "A1","St2",10, 2.6,  34.25,
  "A1","St2",20, 2.48, 35.02,
  "A1","St2",30, 2.54, 34.74,
  "A2","St3",10, 2.8, 34.61,
  "A2","St3",20, 2.74,  35.13,
  "A2","St3",30, 3.05,  33.89,
  "A2","St4",10, 3.18,  34023,
  "A2","St4",20, 1.05,  34.11,
  "A2","St4",30, 2.06, 35.17,
  "A3","St5",10, 1.26,  35.06,
  "A3","St5",20, 3.15,  34.99,
  "A3","St5",30, 2.87,  35.32
)

df <- df%>% 
  group_by(Area, depth) %>% 
  summarise(AvgTemp = mean(Temp), AvgSal = mean(Sal))


df %>% 
  ggplot(aes( x=AvgTemp ,y=depth, color=Area)) +
  geom_line(size=2) +  
  ylab(label = "Depth(m)") + 
  xlab(label = "Temperature(°C)") + 
  scale_y_reverse() 

I'm not seeing anything objectionable here.
Is the issue with your real data, that you have a greater variety of depths and so only a very few temperature values are averaged together hence the line variance ?

1 Like

Indeed in my data frame I have a wider variety of depths. And for the same station I can have several different dates so I can have several values at 10 meters for the same station.
How can I adjust this?

If I was you I would consider deciding on ranges of depth, sort the depths you have into those, the fewer the better, as more will be being averaged within them.

I would be careful to keep your full data and your summarised data seperate i.e.

df2<- df%>% 
  group_by(Area, depth) %>% 
  summarise(AvgTemp = mean(Temp), AvgSal = mean(Sal))

so that you can continue to refer to df
you should look at a histogram of df$depth to help decide a binning strategy

hist(df$depth)

assuming df is your un-summarised data

1 Like

ggplot2 has some handy functions
Discretise numeric data into categorical — cut_interval • ggplot2 (tidyverse.org)
i.e. cut_width()

1 Like

Thanks I see what you mean
Of course I created a different df
I did something like this :

df2<- df%>% 
 group_by(Station, depth) %>% 
 summarise(AvgTemp = mean(Temp), AvgSAL = mean(SAL)) 

So now I have only one Temp value and only Sal value for only one depth for each station
But I don't have my Area column anymore..
Can I just join the Area column from df to my df2 according to the station name ?

Are you doing something related to what we previously discussed, or something new ?
I ask because you have swapped Station for Area and you haven't explained why... if we are staying on the same track this seems like a clear mistake

Yes I keep trying to do what we said.
I thought that since I have several dates for each station and therefore several times the same depth per station, doing this manipulation gives me the average of my temperature and salinity for each different depth (all dates included).
And then I just have to link my areas and then make the average by depth of all my stations included in my areas.
Otherwise I really don't see any other options, I just want to make a salinity and temperature profile for my areas... I didn't think it would be so complicated :cry:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.