Variable names within a loop

Hi there,
I am switching from Stata, therefore please be kind with me.
I got a dataset of ratings in multiply categories and want to analyse them. My loop worked well, but now I want to know if the Interrater-Agreement is better when we exclude the first two observation of each rater.
Whatever I do if i write a simple code everything works well, exept I insert it within a loop. In Stata I could use a foreach loop and everywhere where I write `i' the it would use the current variable. This seems to be different in R. Can you give me a hint how the code should look like that every "i" is "v1" in the first round "v2" and so on.

Thank you

df.raw <- data.frame(
  raterid = c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10),rep(6,10)),
  videoid = c(1,2,3,4,5,6,7,8,9,10),
  num = sample(1:10),
  v1 = sample(1:2, 3, replace=TRUE),
  v2 = sample(1:2, 3, replace=TRUE),
  v3 = sample(1:2, 3, replace=TRUE),
  v4 = sample(1:2, 3, replace=TRUE),
  v5 = sample(1:2, 3, replace=TRUE),
  v6 = sample(1:2, 3, replace=TRUE)
  )

#works
df<- df.raw %>%
    select("raterid", "videoid", v1) %>%
    spread("raterid", v1)
# dosn´t work
n <- c("v1") #Later, I want to insert all dimensions here.
for(i in n){
  df <- df.raw %>%
    mutate(i = replace(i, num < 3, NA)) %>%
  select("raterid", "videoid", i) #%>% works after problem is solved
  #spread("raterid", i)
}

Edit: Kind of Reprex & minor changes

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

Hi @nirgrahamuk,

I produced an example. However, "num" is not equal every ten rows, but there are 1 to 10 in every ten rows. However, my problem is the difference between the working single and the loop.

Hi @Rapha,

I tried some tidyverse "magic" with purrr, nesting, gathering and spreading

If I understand you correctly, this is the desired outcome

desired_outcome_for_v1 <- df.raw %>%
    select("raterid", "videoid", v1) %>%
    spread("raterid", v1)

Gathering to get the names v1 to v6 into rows:

> df.raw %>% 
+     gather(key = "v_name", value = "rating", v1:v6)
# A tibble: 360 x 5
   raterid videoid   num v_name rating
     <dbl>   <dbl> <int> <chr>   <int>
 1       1       1     2 v1          2
 2       1       2     9 v1          2
 3       1       3     5 v1          1
 4       1       4     3 v1          2
 5       1       5     7 v1          2
 6       1       6    10 v1          1
 7       1       7     1 v1          2
 8       1       8     4 v1          2
 9       1       9     6 v1          1
10       1      10     8 v1          2
# ... with 350 more rows

Then grouping and nesting:

> df.raw %>% 
+     gather(key = "v_name", value = "rating", v1:v6) %>% 
+     group_by(v_name) %>% 
+     nest()
# A tibble: 6 x 2
  v_name data             
  <chr>  <list>           
1 v1     <tibble [60 x 4]>
2 v2     <tibble [60 x 4]>
3 v3     <tibble [60 x 4]>
4 v4     <tibble [60 x 4]>
5 v5     <tibble [60 x 4]>
6 v6     <tibble [60 x 4]>

And then, finally, applying your select and spread to each sub-dataframe

nested_outcome <- df.raw %>% 
    gather(key = "v_name", value = "rating", v1:v6) %>% 
    group_by(v_name) %>% 
    nest() %>% 
    mutate(DESIRED_TABLE = map(data, ~.x %>% 
                                   select(raterid, videoid, rating) %>% 
                                   spread(raterid, rating)))
nested_outcome


# A tibble: 6 x 3
  v_name data              DESIRED_TABLE    
  <chr>  <list>            <list>           
1 v1     <tibble [60 x 4]> <tibble [10 x 7]>
2 v2     <tibble [60 x 4]> <tibble [10 x 7]>
3 v3     <tibble [60 x 4]> <tibble [10 x 7]>
4 v4     <tibble [60 x 4]> <tibble [10 x 7]>
5 v5     <tibble [60 x 4]> <tibble [10 x 7]>
6 v6     <tibble [60 x 4]> <tibble [10 x 7]>

And checking if correct (hopefully)

# check for desired outcome
all(nested_outcome$DESIRED_TABLE[[1]] == desired_outcome_for_v1)

Or in a wider form:

nested_outcome_wide <- nested_outcome %>% 
    select(-data) %>% 
    spread(v_name, DESIRED_TABLE)

# A tibble: 1 x 6
  v1                v2                v3                v4                v5                v6               
  <list>            <list>            <list>            <list>            <list>            <list>           
1 <tibble [10 x 7]> <tibble [10 x 7]> <tibble [10 x 7]> <tibble [10 x 7]> <tibble [10 x 7]> <tibble [10 x 7]>

Is that the result you are looking for? Because I was not quite sure what your desired outcome was.

Hi @smichal,

thank you, that´t impressive and well explained. I got the similar results with my for()-loop, but it only saved the results temporally. This will cut down a bunch of code later on.
However, my problem with the code was later on. I want to exclude all observations within the first two video (num <=2) of every rater in every dimension. I tried:

for(i in n){
  df <- df.raw %>%
    mutate(i = replace(i, num < 3, NA)) %>%
  select("raterid", "videoid", i)

Where "n" are the variables v1 - v6
My loop generated tables where the whole column v1 were NA not those which contained observations of the first two videos. When I used the actual variable name "v1" instead of "i" it worked, but killed the possibility of a loop. Therefore my actual question was how to insert "i" properly as a identifier of a variable instead of characters.
I still want to know it, but I liked your solution. Therefore a new question: How to enter a function which replaces the rating value with NA if the video is one of the first two (num <=2).

nested_outcome <- df.raw %>% 
    gather(key = "v_name", value = "rating", v1:v6) %>% 
    group_by(v_name) %>% 
    nest() %>% 
    mutate(DESIRED_TABLE = map(data, ~.x %>% 
                                   select(raterid, videoid, rating) %>% 
                                   spread(raterid, rating)))

And if you or anybody is bored. I used a for()-loop to replace all values "-98" with NA in my actual dataset. It worked, but it looks overly complicated. Is there a better way?

n <- c(1,2,3,4,5,6)
for(i in n){
   nested_outcome[[3]][[i]][nested_outcome[[3]][[i]] == -98] <- NA
}
Thank you
Rapha

I think you would modify smichal solution like so:

(df<- df.raw %>% 
 gather(key = "v_name", value = "rating", v1:v6)  %>% 
    mutate(rating = ifelse(num<=2,NA,rating)) %>%
        group_by(v_name) %>% 
        nest() %>%  mutate(DESIRED_TABLE = map(data, ~.x %>% 
                                             select(raterid, videoid, rating) %>% 
                                             spread(raterid, rating))))

inserting a mutate between gather and grouping

Hi @Rapha,

I'm getting a bit puzzled about what you actually want to achieve.

Is it something like this?

# explicit
df.raw %>% 
    mutate(v1 = ifelse(df.raw$num < 3, NA, v1),
           # ... and so on
           v6 = ifelse(df.raw$num < 3, NA, v6))

If yes, you could use mutate_at in the first place without my lengthy nested solution:

df.raw %>% 
    # Apply a function to the variables v1 to v6
    # The "." in the function refers back to df.raw at the beginning of the pipe
    mutate_at(vars(v1:v6), function(.x){ifelse(.$num < 3, NA, .x)})

# A tibble: 60 x 9
   raterid videoid   num    v1    v2    v3    v4    v5    v6
     <dbl>   <dbl> <int> <int> <int> <int> <int> <int> <int>
 1       1       1     5     2     1     2     1     2     2
 2       1       2     1    NA    NA    NA    NA    NA    NA
 3       1       3    10     1     2     1     2     1     1
 4       1       4     4     2     1     2     1     2     2
 5       1       5     6     2     2     1     1     1     2
 6       1       6     3     1     2     1     2     1     1
 7       1       7     9     2     1     2     1     2     2
 8       1       8     2    NA    NA    NA    NA    NA    NA
 9       1       9     8     1     2     1     2     1     1
10       1      10     7     2     1     2     1     2     2
# ... with 50 more rows