Hi there!
So, I'm working on this project where the premise is as follows:
There are 10 subjects in a study (or, well, in this simplified dummy example I have created here -- code will be pasted below for reproducibility). These 10 subjects collected two samples per day (once in the morning before a meal, then later at night, after a meal) for two consecutive weeks (14 days total, thus each subject has 28 samples contributed/observations/measurements making 280 samples total in the data set).
Each day they collected a FASTED sample (upon waking) and a FED sample. It is important to note that the meal between collections differed between weeks.
On week 1, donors ate a meal without fish oil, and on week 2, donors ate a meal with a serving size of fish oil. 5 biomarkers of interest were measured.
The end goal of this study is to do the following matched pairs t-tests per biomarker:
-
AVG_FED_WK1 vs. AVG_FAST_WK1
-
AVG_FED_WK2 vs. AVG_FAST_WK2.
-
AVG_FED_WK1 vs AVG_FED_WK2 (aka AVG_FED_NO_FISH_OIL vs. AVG_FED_WITH_FISH_OIL), and
-
AVG_FAST_WK1 vs AVG_FAST_WK2 (aka AVG_FAST_NO_FISH_OIL vs. AVG_FAST_WITH_FISH_OIL)
Here is the code I produced that is referenced at the top of the post:
# I'm sure there's overall a much cleaner/better way to produce the following code
# but this is just what was immediately obvious to me.
PARENT_SAMPLE_NAME <- c()
for (number in 1:280){
PARENT_SAMPLE_NAME[number] = paste("PSN", sep = "_", number)
}
WEEK <- rep(c(rep("WEEK1", 14), rep("WEEK2", 14)), 10)
FED_FASTED <- rep(c("FASTED","FED"), 140)
FISH_OIL <- rep(c(rep("NO", 14), rep("YES", 14)), 10)
DONOR_ID <- rep(c("S01",
"S02",
"S03",
"S04",
"S05",
"S06",
"S07",
"S08",
"S09",
"S10"),
28)
SUBJECT_ID <- DONOR_ID[order(DONOR_ID)]
X01 <- exp(rnorm(n = 280, mean = 0, sd = 1))
X02 <- exp(rnorm(n = 280, mean = 0, sd = 1))
X03 <- exp(rnorm(n = 280, mean = 0, sd = 1))
X04 <- exp(rnorm(n = 280, mean = 0, sd = 1))
X05 <- exp(rnorm(n = 280, mean = 0, sd = 1))
newColumn1 <- paste(dummydf$WEEK, sep = "_" , dummydf$FED_FASTED)
newColumn2 <- paste(dummydf$FED_FASTED, sep = "_", dummydf$FISH_OIL)
dummydf <- data.frame(PARENT_SAMPLE_NAME,
SUBJECT_ID,
WEEK,
FED_FASTED,
FISH_OIL,
newColumn1,
newColumn2,
X01,
X02,
X03,
X04,
X05)
View(dummydf)
Here is the part I'm struggling with. I first need to average each subject’s fed measurements within each week and average all the fed measurements within each week. So for each donor, I will now have four measurements,
AVG_FED_WK1,
AVG_FAST_WK1,
AVG_FED_WK2,
AVG_FAST_WK2 (or however you want to name them).
I don't know how to go about doing this. Intuition tells me there could be some apply()
type argument used here, but maybe not. Perhaps there's a more efficient way with dplyr
? If my understanding is correct, I need to produce a "collapsed" data set of sorts, where instead of 10 observations of each biomarker per subject, there are only 4 and it is the average of the relevant observations per group.
This would mean that the dataset now shrinks from 280 rows to 70 rows, right? Furthermore, it seems I need to produce new columns within my data set to handle each of the two sets of t-tests (one column for the Week 1: Fed/Fasted and Week 2: Fed/Fasted t-tests, and then another for the FED: Yes/No fish oil and FASTED: Yes/No fish oil t-test).
Perhaps naively, I attempted to do this via using the paste()
command and basically just merging the two relevant columns together to produce the group names that I think I need, so now it's just a question of how do I "collapse" this data set and produce a set of averages per group?
I apologize if anything is unclear. Please let me know if so and I'd be happy to provide further clarification. I appreciate you taking the time to read this question and consider its contents. Thank you!
All the best,
-Radon.