How to print the proportion of values in a list where P < 0.05?

So in other words,

  1. I have a sample size of 1 to 29.
  2. I am looping through each sample size 1000 times.
  3. I am working out the proportion of P-values < 0.05 over the 1000 iterations.
  4. I am doing step 3 for each sample size.

Thus: N = 1, Proportion of P<0.05= ...., N=2, Proportion of P<0.05= ...., N=3, Proportion of P<0.05= ......,

When I print the ANOVA table I get:

             Df Sum Sq Mean Sq F value   Pr(>F)    

x1 1 6281 6281 43.017 0.000177 ***
groupcategory 1 114 114 0.781 0.402535
x1:groupcategory 1 32 32 0.223 0.649708
Residuals 8 1168 146

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

And I want to select only the 3rd P value (0.649708)...

But the function data5 <- data4$"Pr(>F)"[3] gives me 'NULL' ?!

And I'm not sure why

To answer the immediate question, you want to generate 29 values of

sum(d$data6<0.05)/nrow(d)

I would then define a vector with 29 elements before the two for loops

FracP <- vector(mode = "numeric", length = 29)

You can then store the calculated value with

FracP[i] <- sum(d$data6<0.05)/nrow(d)

putting that where you currently store that value in the variable data3.

It is not a good idea to use the variable i as the index of both for loops. That is confusing at best and I expect it will cause weird loop behavior.

I do not see where you use x, your sample size, anywhere.

Concerning getting the desired P value, data4 is a list with one element that is a data frame. Try using,

data4[[1]]["x1:groupcategory", "Pr(>F)"]

FracP[i] <- sum(d$data6<0.05)/nrow(d)

The purpose of the [I] is to...? Or should it just be FracP <- sum(d$data6<0.05)/nrow(d)

since FracP <- vector(mode="numeric",length=29)

Thanks so much!

Oh I see, so the FracP[I] allows us to store the proportion value as an index in the vector FracP.

I wrote: x <- 1:29 in the first paragraph - I wanted to iterate through 29 values, and so I stated for (I in x) as I wanted the loop to apply to each numerical value in 1 to 29.

I have modelled this simulation on a binomial distribution, where y = rbinom(e, y, z) is the number of correct responses across Y trials. The number of trials per subject is 60.
e refers to the number of observations per subject, which in this case is 6. Therefore y is the number of correct responses across 60 trials per subject. Z is the probability of success per trial which I have defined in the first paragraph.

Therefore, I want to run the loop when considering 2 subjects, 3 subjects, 4 subjects, 5 subjects until I get to 29 subjects.

I have appended my R script below:

coef1 <- 5
coef2 <- -0.03
coef3 <- 10
coef4 <- -0.05
distances <- c(60,90,135,202.5,303.75,455.625)
x1 <- distances
z <- coef1 + coef2x1
z1 <- coef3 + coef4
x1
n_trials <- 60
x <- 1:29
i <- 1:1000
e <- 6
groupcategory = c(1,1,1,1,1,1,2,2,2,2,2,2)
d = data.frame(x=rep(0,1000)
FracP <- vector(mode="numeric",length=29)

for (i in x) {
for (j in i){
pr = 1/(1+exp(-z))
y = rbinom(e,n_trials,pr)
df0 = data.frame(x1, y)
pr = 1/(1 + exp(-z1))
y = rbinom(e,n_trials,pr1)
df1 = data.frame(x1, y)
df5 = rbind(df1,df0)
df6 = cbind(df5,groupcategory)
data2 <- aov(y~x1+groupcategory+x1:groupcategory,data=df6)
data4 <- summary(data2)
data5 <- data4[[1]]["x1:groupcategory","Pr(>F)"]
data6 <- data.frame(data5)
cbind(data6,d)
}
FracP[i] <- sum(d$data6<0.05)/nrow(d)
}

If a for loop starts with

for (j in i)

and i is just a number, the for loop will run only once. Do you mean

for (j in 1:i)

yes! Oops!

And I changed the first paragraph again by amending the d=data.frame command

d = data.frame(x=rep(0,1000))

as I want to put the 1000 iterations into d

would that be the same for 1:x

so at the end of the command, the object 'FracP' should the list of proportions for each N number.

i <- 1:1000

and so for (j in 1:i) will complete the loop 1000x times?

Now I got tangled up. I missed that you defined i as 1:1000 and as the index of the outside for loop. Change one of those so that there is no conflict.

OneTo1000 <- 1:1000
OneTo29 <- 1:29
.
.
.
for ( i in OneTo29) {
  for (j in OneTo1000) {
.
.

Do not prefiill the data frame with zeros. You are using rbind to build the data frame and your data will be stacked under all of those zeros.

library(tidyverse)

coef1 <- 5
coef2 <- -0.03
coef3 <- 5
coef4 <- -0.03
distances <- c(60,90,135,202.5,303.75,455.625)
x1 <- distances
z <- coef1 + coef2*x1
z1 <- coef3 + coef4*x1
n_trials <- 60
oneto29 <- 29
oneto1000 <- 10
e <- 6 
groupcategory = c(1,1,1,1,1,1,2,2,2,2,2,2)
d = NULL
FracP <- vector(mode="numeric",length=29)
pr = 1/(1+exp(-z))
pr1 = 1/(1+exp(-z1))

for (i in 1:oneto29) {
  for (j in 1:oneto1000){
    df <- c()
    for (k in 1:oneto29){
      rbind(df, data.frame(x1 = rep(distances, 2),
                     y = c(rbinom(e,n_trials,pr), rbinom(e,n_trials,pr1)),
                     groupcategory = groupcategory, id = c(rep(k*2-1, 6), rep(k*2, 6))))
    }
    df_aov <- aov(y~x1*groupcategory+Error(id/(x1*groupcategory)),data=df)
    df_aov_sum <- summary(df_aov)
    pvalue <- df_aov_sum[[5]]
    pvalue[[1]]["groupcategory","Pr(>F)"]
    d = rbind(d,data.frame(pvalue))
  }

  count <- plyr::ldply(d,function(c) sum(c<=0.05))
  FracP[i] <- count$V1/oneto1000 
  d <- NULL
}

This is my renewed script at the moment. In other words, a vector of twenty-nine P-values will be printed at the end. With each P-value corresponding to a particular sample size. The first P-value corresponding to 1 subject, the 2nd P-value corresponding to 2-subjects and so on. The y =rbinom(x,y,z) command only considers the proportion of correct trials for a single subject, and so I manipulated the formula above to consider 2-subjects, then 3-subjects, on each iteration!

Does that look right?

A few things look wrong to me.

  1. Shouldn't the loop over k run from 1 to i?
for (k in 1:i){

Looping over 1:29 every time seems to defeat the purpose of having the variable i

  1. The result of the rbind() inside of the k for loop is not assigned to any variable. Should it be assigned to df?
df <- rbind(df, data.frame(x1 = rep(distances, 2), #added assignment
                           y = c(rbinom(e,n_trials,pr), rbinom(e,n_trials,pr1)),
                           groupcategory = groupcategory, id = c(rep(k*2-1, 6), rep(k*2, 6))))
  1. The subsetting of pvalue to get the desired value is not assigned to any variable. Maybe you mean
pvalue <- pvalue[[1]]["groupcategory","Pr(>F)"]

When you say (K in 1:i)[

is it because you only want to store the data for that particular given sample size on each round and not every sample size?

Yes, that is my understanding but it is your code. Be sure that anything I suggest is correct for your application.

df <- rbind(df, data.frame(x1 = rep(distances, 2),
y = c(rbinom(e,n_trials,pr), rbinom(e,n_trials,pr1)),
groupcategory = groupcategory, id = c(rep(k, 12))))

When I enter this formula, I get:

x1 y groupcategory id
1 60.000 59 1 1
2 90.000 50 1 1
3 135.000 43 1 1
4 202.500 16 1 1
5 303.750 0 1 1
6 455.625 0 1 1
7 60.000 60 2 2
8 90.000 60 2 2
9 135.000 60 2 2
10 202.500 60 2 2
11 303.750 60 2 2
12 455.625 59 2 2

But surely the ID number should be the same for the 12 values...

Yes, you should get a single id value and that is what I get if I comment out the lines for looping and run a single iteration of that code.

coef1 <- 5
coef2 <- -0.03
coef3 <- 5
coef4 <- -0.03
distances <- c(60,90,135,202.5,303.75,455.625)
x1 <- distances
z <- coef1 + coef2*x1
z1 <- coef3 + coef4*x1
n_trials <- 60
oneto29 <- 29
oneto1000 <- 10
e <- 6 
groupcategory = c(1,1,1,1,1,1,2,2,2,2,2,2)
d = NULL
FracP <- vector(mode="numeric",length=29)
pr = 1/(1+exp(-z))
pr1 = 1/(1+exp(-z1))

#for (i in 1:oneto29) {
#  for (j in 1:oneto1000){
    df <- c()
    k <- 1
#    for (k in 1:i){ #changed to 1:i
      #df <- rbind(df, data.frame(x1 = rep(distances, 2), #added assignment
      #                     y = c(rbinom(e,n_trials,pr), rbinom(e,n_trials,pr1)),
      #                     groupcategory = groupcategory, id = c(rep(k*2-1, 6), rep(k*2, 6))))
      df <- rbind(df, data.frame(x1 = rep(distances, 2),
                                 y = c(rbinom(e,n_trials,pr), rbinom(e,n_trials,pr1)),
                                 groupcategory = groupcategory, id = c(rep(k, 12))))
    #}
df
#>         x1  y groupcategory id
#> 1   60.000 57             1  1
#> 2   90.000 53             1  1
#> 3  135.000 45             1  1
#> 4  202.500 15             1  1
#> 5  303.750  0             1  1
#> 6  455.625  0             1  1
#> 7   60.000 56             2  1
#> 8   90.000 56             2  1
#> 9  135.000 45             2  1
#> 10 202.500 15             2  1
#> 11 303.750  0             2  1
#> 12 455.625  0             2  1

Created on 2020-02-03 by the reprex package (v0.3.0)

1 Like

Hey thanks!

I am now plotting a heat map, showing how the proportion of P-values (power) changes depending on the coefficient 3 and coefficient 4.

library(tidyverse)

n_people <- c(2:20)
coef1 <- 5
coef2 <- -0.05
coef3 <- 5
coef4 <- -0.02
distances <- c(60,90,135,202.5,303.75,455.625)
n_trials <- 60
oneto29 <- 29
oneto1000 <- 5
e <- 6
groupcategory = c(1,1,1,1,1,1,2,2,2,2,2,2)

x1 <- distances
z <- coef1 + coef2*x1
Datarray <- array(dim=c(length(coef3s), length(coef4s),length(n_people)))

coef3s <- c(1:11)
coef4s <- seq(from = -0.01, to = -0.1, length.out = 10)

coef3_counter =1
for (coef3 in coef3s) {
coef4_counter =1
for (coef4 in coef4s) {
z1 <- coef3 + coef4*x1
d = NULL
pr = 1/(1+exp(-z))
pr1 = 1/(1+exp(-z1))

counter=1
for (i in n_people) {
  for (j in 1:oneto1000){
    df <- c()
    for (k in 1:i){
      df <- rbind(df, data.frame(x1 = c(rep(distances, 2)),
                                 y = c(rbinom(e,n_trials,pr), rbinom(e,n_trials,pr1)),
                                 groupcategory = groupcategory, id = c(rep(k,12))))     
      # y = c(rbinom(e,n_trials,pr), rbinom(e,n_trials,pr1)),
      #groupcategory = groupcategory, id = c(rep(k*2-1, 6), rep(k*2, 6))))
    }
    df_aov <- aov(y~x1*groupcategory+Error(id/(x1*groupcategory)),data=df)
    df_aov_sum <- summary(df_aov)
    pvalue <- df_aov_sum[[5]]
    pvalue <- pvalue[[1]]["groupcategory","Pr(>F)"]
    d = rbind(d,data.frame(pvalue))
  }
  count <- plyr::ldply(d,function(c) sum(c<=0.05))
  Datarray[coef3_counter,coef4_counter,counter] <- count$V1/oneto1000
  counter = counter +1
  d = NULL 
  
  
}
coef4_counter = coef4_counter + 1

}
coef3_counter = coef3_counter + 1
}

hv <- heatmap(object2, xlab = "Coef3", ylab = "Coef4",main = "Sample Size N=16",cex.axis=1.2)

I am unable to capp the legend values for my correlation matrix below: (?)

ggcorrplot(object2, aes(x=Coef3,y=Coef4), limits=FALSE)

It produces the correlation matrix, but the legend has values less than 0, and I only want it to have values 0 and above? Also, the y and x axis are also not labelling?

@FJCC

This question seems to be completely different from the preceding question. You should start a new thread, especially since I know nothing about the ggcorrplot() function and may have little to contribute. In that post, please make it clear how to produce the object you are trying to graph. In the above post, I do not see how to make object2. Since your question is about plotting only positive correlations and changing the plot labeling, the object you plot can be a small, simple correlation matrix produced with a few lines of code. You will attract more and better answers if your post is easy to understand because it has a minimal example.

1 Like

I am sorry for causing you trouble! I will create a new thread immediately! @FJCC

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.