How to restructure data for comparing Gp/cond/event

Greetings,

I have two groups (gp1, gp2), two conditions (pos, neg), and two events (pre, post)
The conditions were counter-balanced over two visits and provided two scores during each visit (v1pre, v1post, v2pre, v2post).

Below is an example of the data.

My goal = group by condition by event

However, never having done this before in RS, I don't know how to structure the data nor which commands to use for the actual analyses.

Any help with this would be greatly appreciated.

~ Jason the #rstatsnewbie

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyverse)
library(reshape2)
#> 
#> Attaching package: 'reshape2'
#> The following object is masked from 'package:tidyr':
#> 
#>     smiths
library(reprex)


zdat <- 
structure(list(
    ID = structure(1:10, 
                   .Label = c("IP004", "IP005", "IP007", "IP008", "IP009",
                           "IP010", "IP012", "IP013", "IP015", "IP016"), 
                   class = "factor"), 
    group = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), 
                      .Label = c("gp1", "gp2"), 
                      class = "factor"), 
    v1.cond = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L), 
        .Label = c("pos", "neg"), class = "factor"), 
    v2.cond = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L), 
                        .Label = c("pos", "neg"), 
                        class = "factor"), 
    v1pre = c(7, 1, 0, 0, 1, 0, 0, 4, 0, 6), 
    v1post = c(9, 5, 5, 5.5, 5, 2, 7, 9, 5, 6), 
    v2pre = c(1, 5, 1, 3, 5, 0, 3, 0, 5, 0), 
    v2post = c(5, 6, 4, 5.5, 5, 4, 4, 3, 7, 3)), 
    row.names = c(NA, -10L), 
    class = "data.frame")


#       Convert to long format
zdat.melt <- melt(zdat, id.vars = c("ID", "group", "v1.cond", "v2.cond")) 
colnames(zdat.melt)
#> [1] "ID"       "group"    "v1.cond"  "v2.cond"  "variable" "value"

names(zdat.melt)[names(zdat.melt)=="variable"] <- "event"
names(zdat.melt)[names(zdat.melt)=="value"] <- "score"

Created on 2020-12-20 by the reprex package (v0.3.0)

library(dplyr)
zdat.melt %>% select(-v2.cond) %>% group_by(v1.cond,event)
zdat.melt %>% select(-v1.cond) %>% group_by(v2.cond,event)

Thank you for the response.
I realize I didn’t articulate my goals very well, or, I don't understand how to organize the data. Do the data need to be in long or wide format for the comparisons?

Visit (1 or 2) is not of interest
Variables of interest are:

  • Group
  • event (pre and post)
  • condition (pos and neg)

Planned comparisons are:

  • Group differences in pre-scores (collapsed across visits and conditions)
  • Group differences in post-scores (collapsed across visits and conditions)
  • Group differences in post-scores (condition = pos; collapsed across visits)
  • Group differences in post-scores (condition = neg; collapsed across visits

~ Jason the #rstatsnewbie

Oh, a contingency table?

suppressPackageStartupMessages({
  library(dplyr)
  library(reshape)
})
zdat <-
  structure(list(
    ID = structure(1:10,
      .Label = c(
        "IP004", "IP005", "IP007", "IP008", "IP009",
        "IP010", "IP012", "IP013", "IP015", "IP016"
      ),
      class = "factor"
    ),
    group = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
      .Label = c("gp1", "gp2"),
      class = "factor"
    ),
    v1.cond = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L),
      .Label = c("pos", "neg"), class = "factor"
    ),
    v2.cond = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L),
      .Label = c("pos", "neg"),
      class = "factor"
    ),
    v1pre = c(7, 1, 0, 0, 1, 0, 0, 4, 0, 6),
    v1post = c(9, 5, 5, 5.5, 5, 2, 7, 9, 5, 6),
    v2pre = c(1, 5, 1, 3, 5, 0, 3, 0, 5, 0),
    v2post = c(5, 6, 4, 5.5, 5, 4, 4, 3, 7, 3)
  ),
  row.names = c(NA, -10L),
  class = "data.frame"
  )


#       Convert to long format

# convention disfavors using . as a name delimiter
zdat_melt <- melt(zdat, id.vars = c("ID", "group", "v1.cond", "v2.cond"))
colnames(zdat_melt)
#> [1] "ID"       "group"    "v1.cond"  "v2.cond"  "variable" "value"
# > [1] "ID"       "group"    "v1.cond"  "v2.cond"  "variable" "value"

# alternative renaming
colnames(zdat_melt) <- c("ID","group","v1_cond","v2_cond","event","score")

zdat_melt %>% 
  select(-c(v1_cond,v2_cond)) %>%
  filter(group == "gp1") %>%
  select(event,score) -> grp1

zdat_melt %>% 
  select(-c(v1_cond,v2_cond)) %>%
  filter(group == "gp2") %>%
  select(event,score) -> grp2


table(grp1)
#>         score
#> event    0 1 3 4 5 7 9
#>   v1pre  3 1 0 0 0 1 0
#>   v1post 0 0 0 0 3 1 1
#>   v2pre  0 2 1 0 2 0 0
#>   v2post 0 0 0 2 2 1 0
table(grp2)
#>         score
#> event    0 1 2 3 4 5 5.5 6 9
#>   v1pre  2 1 0 0 1 0   0 1 0
#>   v1post 0 0 1 0 0 1   1 1 1
#>   v2pre  3 0 0 1 0 1   0 0 0
#>   v2post 0 0 0 2 1 0   1 1 0

Created on 2020-12-20 by the reprex package (v0.3.0.9001)

Thank you. This has been very helpful regarding data manipulation.

I am struggling with how to run the actual statistics I want, or maybe I can't.

For example, everybody completed both conditions (pos, neg) and everyone has a pre and post for those conditions. Presently, I conceptualize the data like this (in my head), with a score for each of these:

  • gp1 pos pre
  • gp1 pos post
  • gp1 neg pre
  • gp1 neg post
  • gp2 pos pre
  • gp2 pos post
  • gp2 neg pre
  • gp2 neg post

My questions:

  • Is there a group difference in pos pre scores
  • is there a group difference in neg pre scores
  • Is there a group difference in pos post scores
  • is there a group difference in neg post scores

In writing this I guess I'm essentially trying to do a 2x2x2 ANOVA. I just have to figure out:

  • The correct format for the data
  • How to do the anova, which I'm researching.

Thanks to any and all for help or a point into the right direction.

~ Jason the #rstatsnewbie

In R it helps to recall school algebra: f(x) = y, where the three objects (in R, everything is an object), are

x, which is what's at hand, in this case the contingency tables
y, which is what's desired, in this case the result of some statistical test
f, is the function that provides the return y value from x

Because objects, including functions, can contain other objects, working a problem in R is mainly an exercise in understanding what can be done with x, what y has to look like and what function f is available or can be composed to work the transformation. We say that f is composable—functions called first-class objects, because they can be the arguments to other functions, just like school f(g(x)).

Statistics benefits from a similar approach, which you started by framing the questions as

Is there a group difference in pos pre scores, etc.

is a good place to start because it immediately raises another question:

Of course there is a difference—the group's aren't identical, so what kind of difference is of interest? Mean, median, range, quantiles?

Fortunately, the analysis is the same for all four questions. Start by reframing the question:

Are the differences between groups merely random?

This leads naturally to the framing of dueling hypothesis, the null and alternative hypothesis

H_0 the null hypothesis: there is no difference
H_1 the alternative hypothesis: there is a difference

How to tell?

We need a yardstick for inter-group differences, one that provides both a test statistic and a criterion by which we can decide if the value of the test statistic at a given level of confidence, called \alpha requires us to accept the null and reject the alternative or vice-versa. The probability P equals 1-\alpha and is unfortunately named "significance." A principled approach requires selecting \alpha in advance, and a conventional choice is \alpha = 0.05 often shorthanded as a "95% confidence interval." (If you think of it though, you wouldn't want to get into a drinking game where the loser has to hold their choice of one of four five-shot revolvers with a single bullet among them, point it to their head and pull the trigger. That's why I call \alpha = 0.05 passing the laugh test.)

Looking at this in terms of the data, we have a collection of scores from two groups to look at with respect to pos or neg and pre or post. Let's look at just one: group1 vs group2 pos pre scores.

scores can take possible values of c(0,1,2,3,4,5,5.5,6,7,9) and events have possible values of c(v1pre, v1post, v2pre, v2post). The cross tabulations should show

 slots   <- c(0,1,2,3,4,5,5.5,6,7,9)
 v1pre1  <- c(3,1,0,0,0,0,0.0,0,1,0) # group1
 v1pre2  <- c(2,1,0,0,0,1,0.0,1,0,0) # group2

(We'll need to circle back later to fix x, because the contingency tables only show the scores for each group separately because the group scores don't completely overlap, but let's focus on finding the right test, first, using the two v1pre vectors.

As before, let's see what kinds of differences exist, because if they're identical \dots

v1pre1  <- c(3,1,0,0,0,0,0.0,0,1,0) # group1
v1pre2  <- c(2,1,0,0,0,1,0.0,1,0,0) # group2
v1pre1 - v1pre2
#>  [1]  1  0  0  0  0 -1  0 -1  1  0

Created on 2020-12-20 by the reprex package (v0.3.0.9001)

So, good, they are different somehow. But different how?

Here we need one of the foundational tools of statistics—aggregation, a reduction of many observation into a single number. It could be anything, evens and odds say, but it's hard to argue with the mean.

R has a built-in function for determining the probability that the difference in means of two vectors is equal to zero. I'm going to fudge the data a little because

# actual data from the example (padded to equal length)
v1pre1  <- c(3,1,0,0,0,0,0.0,0,1,0) # group1
v1pre2  <- c(2,1,0,0,0,1,0.0,1,0,0) # group2
mean(v1pre1) == mean(v1pre2)
#> [1] TRUE

# tweaked to make means differ to illustrate the test
v1pre1  <- c(2,1,1,0,0,0,0.0,0,1,0) # group1
v1pre2  <- c(3,1,0,0,0,1,0.0,1,0,0) # group2
mean(v1pre1) == mean(v1pre2)
#> [1] FALSE

t.test(v1pre1,v1pre2)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  v1pre1 and v1pre2
#> t = -0.26414, df = 16.493, p-value = 0.7949
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -0.9006395  0.7006395
#> sample estimates:
#> mean of x mean of y 
#>       0.5       0.6

Created on 2020-12-20 by the reprex package (v0.3.0.9001)

This tells us that we cannot reject H_0, that the true difference of the means is zero.

Phil Spector, from Cal, has a great explainer for the t-test

What if you didn't want the mean but the score-weighted mean?

1 Like

Thank you doesn’t seem close to adequate to express my gratitude for the time and care you’ve put into answering my questions. I will take the rest of today and work through your response to be more precise and articulate about my questions and how to use R to answer them.

Thank you again.
~ Jason the #rstatsnewbie

1 Like

Greetings,

I’ve tried to incorporate everything I learned in this thread into the attached file. Eventually I am able to do the t-tests I want. However, I am fairly confident a better way exists, I will just have to keep looking and go through trial/error.

I am completely open to suggestions on what's wrong and where to look for improvements.

~ Jason the #rstatsnewbie

---
title: "forum_help"
author: "Jason"
date: "12/22/2020"
output:
  html_document:
    toc: yes
    number_sections: yes
editor_options:
  chunk_output_type: console
---


#```{r setup, include=FALSE}
#knitr::opts_chunk$set(echo = TRUE)
#```

#```

# LIBRARIES
#```{r Libraries, eval=TRUE, echo=TRUE, results=FALSE, collapse=TRUE, message=FALSE}

suppressPackageStartupMessages({
	library(dplyr)
	library(tidyverse)
	library(reshape)
	library(reprex)
	library(here)
	here()
	dr_here(show_reason = FALSE)
})

#```

#		CONTINGENCY TABLE
#```{r ContingencyTable, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

zdat <-
  structure(list(
    ID = structure(1:10,
      .Label = c(
        "IP004", "IP005", "IP007", "IP008", "IP009",
        "IP010", "IP012", "IP013", "IP015", "IP016"
      ),
      class = "factor"
    ),
    group = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
      .Label = c("gp1", "gp2"),
      class = "factor"
    ),
    v1.cond = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L),
      .Label = c("pos", "neg"), class = "factor"
    ),
    v2.cond = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L),
      .Label = c("pos", "neg"),
      class = "factor"
    ),
    v1pre = c(7, 1, 0, 0, 1, 0, 0, 4, 0, 6),
    v1post = c(9, 5, 5, 5.5, 5, 2, 7, 9, 5, 6),
    v2pre = c(1, 5, 1, 3, 5, 0, 3, 0, 5, 0),
    v2post = c(5, 6, 4, 5.5, 5, 4, 4, 3, 7, 3)
  ),
  row.names = c(NA, -10L),
  class = "data.frame"
  )

#       Convert to long format

# convention disfavors using . as a name delimiter
zdat_melt <- melt(zdat, id.vars = c("ID", "group", "v1.cond", "v2.cond"))
colnames(zdat_melt)

# alternative renaming
colnames(zdat_melt) <- c("ID","group","v1_cond","v2_cond","event","score")

#```


#		POSITIVE PRE
## Gp1 Pos Pre
#```{r Gp1PosPre, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

#				VISIT-1
#		positive and pre only
gp1v1pos_pre <-  zdat_melt %>% dplyr::select( - v2_cond)  %>% 
	filter(group == "gp1") %>% 
	filter(v1_cond == "pos") %>% 
	filter(event == "v1pre")
gp1v1pos_pre$event <- str_replace_all(gp1v1pos_pre$event, c("v1pre" = "pre"))

#	Change name to facilitate the full join later
names(gp1v1pos_pre)[names(gp1v1pos_pre) == 'v1_cond'] <- 'cond'

#	Tibble needed for joining later
gp1v1pos_pre <- as_tibble(gp1v1pos_pre)

################	VISIT-2		################
#		positive and pre only
gp1v2pos_pre <-  zdat_melt %>% dplyr::select( - v1_cond)  %>% 
	filter(group == "gp1") %>% 
	filter(v2_cond == "pos") %>% 
	filter(event == "v2pre")
gp1v2pos_pre$event <- str_replace_all(gp1v2pos_pre$event, 
									  c("v2pre" = "pre"))

#	Change name to facilitate the full join later
names(gp1v2pos_pre)[names(gp1v2pos_pre) == 'v2_cond'] <- 'cond'

#	Tibble needed for joining later
gp1v2pos_pre <- as_tibble(gp1v2pos_pre)


##############		JOIN GROUPS		############
#	Join the two tables representing Gp1 positive pre scores
gp1_pos_pre <- full_join(gp1v1pos_pre, gp1v2pos_pre)	

#		CLEAN UP: REMOVE OLD DATA
# rm(gp1v1pos_pre, gp1v2pos_pre)

#```


## Gp2 Pos Pre
#```{r Gp2PosPre, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

####			GROUP-2 POSITIVE PRE	####
#				VISIT-1
#		positive and pre only
gp2v1pos_pre <-  zdat_melt %>% dplyr::select( - v2_cond)  %>% 
	filter(group == "gp2") %>% 
	filter(v1_cond == "pos") %>% 
	filter(event == "v1pre")
gp2v1pos_pre$event <- str_replace_all(gp2v1pos_pre$event,
									  c("v1pre" = "pre"))

#	Change name to facilitate the full join later
names(gp2v1pos_pre)[names(gp2v1pos_pre) == 'v1_cond'] <- 'cond'

#	Tibble needed for joining later
gp2v1pos_pre <- as_tibble(gp2v1pos_pre)


################	VISIT-2		################
#		positive and pre only
gp2v2pos_pre <-  zdat_melt %>% dplyr::select( - v1_cond)  %>% 
	filter(group == "gp2") %>% 
	filter(v2_cond == "pos") %>% 
	filter(event == "v2pre")
gp2v2pos_pre$event <- str_replace_all(gp2v2pos_pre$event, 
									  c("v2pre" = "pre"))

#	Change name to facilitate the full join later
names(gp2v2pos_pre)[names(gp2v2pos_pre) == 'v2_cond'] <- 'cond'

#	Tibble needed for joining later
gp2v2pos_pre <- as_tibble(gp2v2pos_pre)


##############		JOIN GROUPS		############
#	Join the two tables representing Gp2 positive pre scores

gp2_pos_pre <- full_join(gp2v1pos_pre, gp2v2pos_pre)	

#		CLEAN UP: REMOVE OLD DATA
# rm(gp2v1pos_pre, gp2v2pos_pre)

#```


#		POSITIVE POST
## Gp1 Pos Post
#```{r Gp1PosPost, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

########		VISIT-1		################
#		positive and post only

gp1v1pos_post <-  zdat_melt %>% dplyr::select( - v2_cond) %>% 
	filter(group == "gp1") %>% 
	filter(v1_cond == "pos") %>% 
	filter(event == "v1post")
gp1v1pos_post$event <- str_replace_all(gp1v1pos_post$event,
									  c("v1post" = "post"))

#	Change name to facilitate the full join later
names(gp1v1pos_post)[names(gp1v1pos_post) == 'v1_cond'] <- 'cond'

#	Tibble needed for joining later
gp1v1pos_post <- as_tibble(gp1v1pos_post)


########		VISIT-2		################
#		positive and post only

gp1v2pos_post <-  zdat_melt %>% dplyr::select( - v1_cond)  %>% 
	filter(group == "gp1") %>% 
	filter(v2_cond == "pos") %>% 
	filter(event == "v2post")
gp1v2pos_post$event <- str_replace_all(gp1v2pos_post$event, 
									  c("v2post" = "post"))

#	Change name to facilitate the full join later
names(gp1v2pos_post)[names(gp1v2pos_post) == 'v2_cond'] <- 'cond'

#	Tibble needed for joining later
gp1v2pos_post <- as_tibble(gp1v2pos_post)


##############		JOIN GROUPS		############
#	Join the two tables representing Gp1 positive post scores

gp1_pos_post <- full_join(gp1v1pos_post, gp1v2pos_post)

#		CLEAN UP: REMOVE OLD DATA
#rm(gp1v1pos_post, gp1v2pos_post)

#```

# Gp2 Pos Post

#```{r gp2PosPost, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

########		VISIT-1		################
#		positive and post only
gp2v1pos_post <-  zdat_melt %>% dplyr::select( - v2_cond) %>% 
	filter(group == "gp2") %>% 
	filter(v1_cond == "pos") %>% 
	filter(event == "v1post")
gp2v1pos_post$event <- str_replace_all(gp2v1pos_post$event,
									   c("v1post" = "post"))

#	Change name to facilitate the full join later
names(gp2v1pos_post)[names(gp2v1pos_post) == 'v1_cond'] <- 'cond'

#	Tibble needed for joining later
gp2v1pos_post <- as_tibble(gp2v1pos_post)

################	VISIT-2		################
#		positive and post only
gp2v2pos_post <-  zdat_melt %>% dplyr::select( - v1_cond)  %>% 
	filter(group == "gp2") %>% 
	filter(v2_cond == "pos") %>% 
	filter(event == "v2post")
gp2v2pos_post$event <- str_replace_all(gp2v2pos_post$event, 
									   c("v2post" = "post"))

#	Change name to facilitate the full join later
names(gp2v2pos_post)[names(gp2v2pos_post) == 'v2_cond'] <- 'cond'

#	Tibble needed for joining later
gp2v2pos_post <- as_tibble(gp2v2pos_post)

##############		JOIN GROUPS		############
#	Join the two tables representing Gp2 positive pre scores

gp2_pos_post <- full_join(gp2v1pos_post, gp2v2pos_post)

#		CLEAN UP: REMOVE OLD DATA
# rm(gp2v1pos_post, gp2v2pos_post)

#```


#		NEGATIVE PRE
## Gp1 Neg Pre
#```{r Gp1negPre, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

########		VISIT-1		################
#		negative and pre only
gp1v1neg_pre <-  zdat_melt %>% dplyr::select( - v2_cond)  %>% 
	filter(group == "gp1") %>% 
	filter(v1_cond == "neg") %>% 
	filter(event == "v1pre")
gp1v1neg_pre$event <- str_replace_all(gp1v1neg_pre$event,
									  c("v1pre" = "pre"))

#	Change name to facilitate the full join later
names(gp1v1neg_pre)[names(gp1v1neg_pre) == 'v1_cond'] <- 'cond'

#	Tibble needed for joining later
gp1v1neg_pre <- as_tibble(gp1v1neg_pre)

################	VISIT-2		################
#		negative and pre only
gp1v2neg_pre <-  zdat_melt %>% dplyr::select( - v1_cond)  %>% 
	filter(group == "gp1") %>% 
	filter(v2_cond == "neg") %>% 
	filter(event == "v2pre")
gp1v2neg_pre$event <- str_replace_all(gp1v2neg_pre$event, 
									  c("v2pre" = "pre"))

#	Change name to facilitate the full join later
names(gp1v2neg_pre)[names(gp1v2neg_pre) == 'v2_cond'] <- 'cond'

#	Tibble needed for joining later
gp1v2neg_pre <- as_tibble(gp1v2neg_pre)

##############		JOIN GROUPS		############
#	Join the two tables representing Gp1 negative pre scores	#	
gp1_neg_pre <- full_join(gp1v1neg_pre, gp1v2neg_pre)	

#		CLEAN UP: REMOVE OLD DATA
# rm(gp1v1neg_pre, gp1v2neg_pre)

#```


## Gp2 Neg Pre
#```{r gp2negPre, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

########		VISIT-1		################
#		NEGATIVE and pre only

gp2v1neg_pre <-  zdat_melt %>% dplyr::select( - v2_cond)  %>% 
	filter(group == "gp2") %>% 
	filter(v1_cond == "neg") %>% 
	filter(event == "v1pre")
gp2v1neg_pre$event <- str_replace_all(gp2v1neg_pre$event,
									  c("v1pre" = "pre"))

#	Change name to facilitate the full join later
names(gp2v1neg_pre)[names(gp2v1neg_pre) == 'v1_cond'] <- 'cond'

#	Tibble needed for joining later
gp2v1neg_pre <- as_tibble(gp2v1neg_pre)

################	VISIT-2		################
#		negative and pre only

gp2v2neg_pre <-  zdat_melt %>% dplyr::select( - v1_cond)  %>% 
	filter(group == "gp2") %>% 
	filter(v2_cond == "neg") %>% 
	filter(event == "v2pre")
gp2v2neg_pre$event <- str_replace_all(gp2v2neg_pre$event, 
									  c("v2pre" = "pre"))

#	Change name to facilitate the full join later
names(gp2v2neg_pre)[names(gp2v2neg_pre) == 'v2_cond'] <- 'cond'

#	Tibble needed for joining later
gp2v2neg_pre <- as_tibble(gp2v2neg_pre)

##############		JOIN GROUPS		############
#	Join the two tables representing Gp2 negative pre scores

gp2_neg_pre <- full_join(gp2v1neg_pre, gp2v2neg_pre)	

#		CLEAN UP: REMOVE OLD DATA
# rm(gp2v1neg_pre, gp2v2neg_pre)

#```

#		NEGATIVE POST
## Gp1 Neg post
#```{r Gp1negpost, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

########		VISIT-1		################
#		negative and post only

gp1v1neg_post <-  zdat_melt %>% dplyr::select( - v2_cond) %>% 
	filter(group == "gp1") %>% 
	filter(v1_cond == "neg") %>% 
	filter(event == "v1post")
gp1v1neg_post$event <- str_replace_all(gp1v1neg_post$event,
									   c("v1post" = "post"))

#	Change name to facilitate the full join later
names(gp1v1neg_post)[names(gp1v1neg_post) == 'v1_cond'] <- 'cond'

#	Tibble needed for joining later
gp1v1neg_post <- as_tibble(gp1v1neg_post)

################	VISIT-2		################
#		negative and post only

gp1v2neg_post <-  zdat_melt %>% dplyr::select( - v1_cond)  %>% 
	filter(group == "gp1") %>% 
	filter(v2_cond == "neg") %>% 
	filter(event == "v2post")
gp1v2neg_post$event <- str_replace_all(gp1v2neg_post$event, 
									   c("v2post" = "post"))

#	Change name to facilitate the full join later
names(gp1v2neg_post)[names(gp1v2neg_post) == 'v2_cond'] <- 'cond'

#	Tibble needed for joining later
gp1v2neg_post <- as_tibble(gp1v2neg_post)

##############		JOIN GROUPS		############
#	Join the two tables representing Gp1 negative post scores

gp1_neg_post <- full_join(gp1v1neg_post, gp1v2neg_post)

#		CLEAN UP: REMOVE OLD DATA
# rm(gp1v1neg_post, gp1v2neg_post)

#```


## Gp2 Neg post
#```{r gp2negpost, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

########		VISIT-1		################
#		negative and post only

gp2v1neg_post <-  zdat_melt %>% dplyr::select( - v2_cond) %>% 
	filter(group == "gp2") %>% 
	filter(v1_cond == "neg") %>% 
	filter(event == "v1post")
gp2v1neg_post$event <- str_replace_all(gp2v1neg_post$event,
									   c("v1post" = "post"))

#	Change name to facilitate the full join later
names(gp2v1neg_post)[names(gp2v1neg_post) == 'v1_cond'] <- 'cond'

#	Tibble needed for joining later
gp2v1neg_post <- as_tibble(gp2v1neg_post)

################	VISIT-2		################
#		negative and post only

gp2v2neg_post <-  zdat_melt %>% dplyr::select( - v1_cond)  %>% 
	filter(group == "gp2") %>% 
	filter(v2_cond == "neg") %>% 
	filter(event == "v2post")
gp2v2neg_post$event <- str_replace_all(gp2v2neg_post$event, 
									   c("v2post" = "post"))

#	Change name to facilitate the full join later
names(gp2v2neg_post)[names(gp2v2neg_post) == 'v2_cond'] <- 'cond'

#	Tibble needed for joining later
gp2v2neg_post <- as_tibble(gp2v2neg_post)

##############		JOIN GROUPS		############
#	Join the two tables representing Gp2 negative post scores

gp2_neg_post <- full_join(gp2v1neg_post, gp2v2neg_post)

#		CLEAN UP: REMOVE OLD DATA
# rm(gp2v1neg_post, gp2v2neg_post)

#```

#		JOINING
##		By GROUPS
#```{r Gp1FullJoin, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

zgp1_neg <- full_join(gp1_neg_post, gp1_neg_pre)

zgp1_pos <- full_join(gp1_pos_post, gp1_pos_pre)

# zgp1_full <- full_join(zgp1_neg, zgp1_pos)


zgp2_neg <- full_join(gp2_neg_post, gp2_neg_pre)

zgp2_pos <- full_join(gp2_pos_post, gp2_pos_pre)

# zgp2_full <- full_join(zgp2_neg, zgp2_pos)


# zz_full <- full_join(zgp1_full, zgp2_full)

# 
# rm(zgp1_full, zgp2_full, zgp1_neg, zgp1_pos)
# rm(zgp2_full, zgp2_full, zgp2_neg, zgp2_pos)

#```

##		By POSITIVE
#```{r PosPre, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

####		Pos Pre 		####
zzz_pos_pre <- full_join(gp1_pos_pre, gp2_pos_pre)
zzz_pos_pre <- rename(zzz_pos_pre, c(score="pospre"))

####		Pos Post		####
zzz_pos_post <- full_join(gp1_pos_post, gp2_pos_post)
zzz_pos_post <- rename(zzz_pos_post, c(score="pospost"))

colnames(zzz_pos_pre)
colnames(zzz_pos_post)

####			POSITIVE FULL		####
##		CREATING A DATA FRAME POSITIVE PRE & POST BY GROUP
z <- zzz_pos_post %>% dplyr::select("ID", "pospost")

zzzz_pos_full <- full_join(zzz_pos_pre, z)

zzzz_pos_full <- zzzz_pos_full %>% dplyr::select( -c(cond, event))

colnames(zzzz_pos_full)

#```

##		By NEGATIVE
#```{r NegPre, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

###			NEG Pre				####
zzz_neg_pre <- full_join(gp1_neg_pre, gp2_neg_pre)

zzz_neg_pre <- rename(zzz_neg_pre, c(score="negpre"))

####		NEG POST			####
zzz_neg_post <- full_join(gp1_neg_post, gp2_neg_post)

zzz_neg_post <- rename(zzz_neg_post, c(score="negpost"))

####		NEG FULL		####
##		CREATING A DATA FRAME NEGATIVE PRE & POST BY GROUP
z2 <- zzz_neg_post %>% dplyr::select("ID", "negpost")

zzzz_neg_full <- full_join(zzz_neg_pre, z2)

zzzz_neg_full <- zzzz_neg_full %>% dplyr::select( -c(cond, event))

#```

##		FULL
#```{r FullMood, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

z_FULL <- full_join(zzzz_pos_full, zzzz_neg_full)

#```

#		T-TESTS
#```{r ttests, eval=TRUE, echo=TRUE, message=FALSE, comment = ""}

####			T-TESTS: GROUP BY THE 4 CONDITIONS
attach(z_FULL)
t.test(pospre ~ group, mu=0, alt="two.sided", conf=0.95, var.eq=F, paired=F)
t.test(pospost ~ group, mu=0, alt="two.sided", conf=0.95, var.eq=F, paired=F)
t.test(negpre ~ group, mu=0, alt="two.sided", conf=0.95, var.eq=F, paired=F)
t.test(negpost ~ group, mu=0, alt="two.sided", conf=0.95, var.eq=F, paired=F)




#------END---------

Better is a relevant term. For a beginner in R breaking down the process into minute steps, as done here, is very helpful in making everything fully transparent.

For a next step, I suggest taking the repetitive code and creating functions. This can be done easily in RStudio with Code | Extract Function

#	Change name to facilitate the full join later
names(gp2v2neg_pre)[names(gp2v2neg_pre) == 'v2_cond'] <- 'cond'


set_cond_name <- function(x,y,z) names(x)[names(x) == 'y'] <- 'z'

set_cond_name(gp2v2neg_pre,"v2_cond","cond")


1 Like

Thank you. Knowing how to improve my code is always appreciated. I will use this as a way to start learning about functions in R.

Cheers,
~ Jason the #rstatsnewbie

1 Like