Iterate through rows

dplyr
tidyr
purrr

#1

hello! i have data where respondents have a 0-1 relative ranking of item preference; i want to convert this data to respondent-level ranks. for example:

matrix(runif(9), nrow = 3)
a b c
0.2117986 0.4388764 0.4204525
0.5432499 0.9655715 0.9454483
0.7874891 0.3301020 0.4823072

I’d like the data above to become:

a b c
3 1 2
3 1 2
1 3 2

My hunch was to use pmap and rank, but I consistently receive an error having to do with unused arguments (not sure if it’s coming from using pmap or rank.

Help? Should I be using a different function or package? Would rather not use apply, though I know how to.


#2

Maybe there is shorter way. But this should work.

library(tidyr)
library(dplyr)
proba <- matrix(runif(9), nrow = 3)

  proba %>%
  as.tibble() %>% 
  rowid_to_column(var = "rowid") %>% 
  gather(key = letter, value = prob, -rowid) %>%
  group_by(rowid) %>%  
  mutate(rank = rank(prob)) %>%  
  select(-prob) %>% 
  spread(key = letter, value = rank)


#3

There are some oddities here because rank needs a non-list vector, but pmap_df (really dplyr::bind_rows) needs a list, so you need something like

library(tidyverse)

df <- data_frame(a = c(0.2117986, 0.5432499, 0.7874891), 
                 b = c(0.4388764, 0.9655715, 0.330102), 
                 c = c(0.4204525, 0.9454483, 0.4823072))

pmap_df(df, ~as.list(rank(-c(...))))
#> # A tibble: 3 x 3
#>       a     b     c
#>   <dbl> <dbl> <dbl>
#> 1     3     1     2
#> 2     3     1     2
#> 3     1     3     2

#4

wow. thanks! it certainly would get the job done but seems suuuuper excessive


#5

This worked perfectly! Can you explain what the c(...) is indicating? I understand you negate it so that the largest numbers are ranked first.


#6

The pmap_df line could be rewritten more verbosely like this:

temp_fun <- function(...) {
  arg_vector <- c(...)
  rank_vector <- rank(-arg_vector)
  rank_list <- as.list(rank_vector)
  return(rank_list)
}

pmap_df(df, temp_fun)

The ... is the argument list that pmap_df passes to temp_fun. So, for the first row, pmap_df calls temp_fun as:

temp_fun(a = 0.2117986, b = 0.4388764, c = 0.4204525)

The function then passes that argument list to c, which converts it to a vector. The rest of the processing is more standard.


#7

I like @alistaire’s solution, but I also like using these interesting questions to show off lesser known functions.

purrr::transpose() is a pretty neat one if you’ve never used it. Essentially for this data frame it turns each row into its own named list, and then combines the list-rows together in one master list. The documentation says it turns a list ‘inside-out’.

magrittr::multiply_by() is also nice for multiplying by a number in a pipe chain.

library(dplyr)
library(purrr)
library(magrittr)

df <- data_frame(a = c(0.2117986, 0.5432499, 0.7874891), 
                 b = c(0.4388764, 0.9655715, 0.330102), 
                 c = c(0.4204525, 0.9454483, 0.4823072))

df %>%
  transpose() %>%
  map(~unlist(.x) %>% multiply_by(-1) %>% rank() %>% as.list()) %>%
  bind_rows()
#> # A tibble: 3 x 3
#>       a     b     c
#>   <dbl> <dbl> <dbl>
#> 1  3.00  1.00  2.00
#> 2  3.00  1.00  2.00
#> 3  1.00  3.00  2.00

#8

@davis if you tweak your example to use map_df then bind_rows is no longer needed (in the interests of further showing off some of the lesser known functions! :smile: )

library(dplyr)
library(purrr)
library(magrittr)

df <- data_frame(a = c(0.2117986, 0.5432499, 0.7874891), 
                 b = c(0.4388764, 0.9655715, 0.330102), 
                 c = c(0.4204525, 0.9454483, 0.4823072))

df1 <- df %>%
  transpose() %>%
  map(~unlist(.x) %>% multiply_by(-1) %>% rank() %>% as.list()) %>%
  bind_rows()

df2 <- df %>%
  transpose() %>%
  map_df(~unlist(.) %>% multiply_by(-1) %>% rank() %>% as.list())

identical(df1, df2)
#> [1] TRUE

#9

c(...) collects the parameters passed to the function into a vector. If you don’t, and just wrote rank(...) (ignoring the negation for a second), R will splice each parameter into the call, calling

rank(a = 0.2117986, b = 0.4388764, c = 0.4204525)
#> Error in rank(a = 0.2117986, b = 0.4388764, c = 0.4204525): unused arguments (a = 0.2117986, b = 0.4388764, c = 0.4204525)

which fails, as there aren’t such parameters in rank. Even unnamed, the values won’t work nicely for the first three parameters that do exist:

rank(0.2117986, 0.4388764, 0.4204525)
#> Error in match.arg(ties.method): 'arg' must be NULL or a character vector

Instead, c collects the dots into a vector, which is passed to the first x parameter of rank, akin to

rank(c(a = 0.2117986, b = 0.4388764, c = 0.4204525))
#> a b c 
#> 1 3 2

Why that result has to be coerced to a list for bind_rows to work is unclear. It’s true that it’s rare to have a vector that should be row-bound into a data frame (usually the parameters are data frames or lists so they can handle multiple types), but it’s entirely possible, as this example shows. In base R, rbind handles the cases identically, returning matrices:

rbind(c(a = 1, b = 2, c = 3), 
      c(a = 4, b = 5, c = 6))
#>      a b c
#> [1,] 1 2 3
#> [2,] 4 5 6

rbind(list(a = 1, b = 2, c = 3), 
      list(a = 4, b = 5, c = 6))
#>      a b c
#> [1,] 1 2 3
#> [2,] 4 5 6

Given that bind_rows—like all dplyr verbs—always returns a data frame, I’m not sure there’s a rationale for not handling this case.


#10

transpose would definitely work–I didn’t think of it but it strikes me as being kind of inelegant…but I suppose beggars can’t be choosers