relocate: rearranges column order

August · September 21, 2020, 5:26pm

Hi All,

I was doing rowwise two proportion z-tests, which I then wanted to join to the tibble the results were built from. So I left join these together (in the same pipe), and then relocate the results to the other side of the data frame as you can see in the code below...

results_df %>%
    dplyr::left_join(dat_df, by = c("formula"="row_number")) %>%
    dplyr::relocate(6:length(results_df), after = last_col())  
  })

The problem is, that the relocate messes the order of the columns (the left join does not (tested)), even though I only want to move the first six columns and keep the order of the others.
Here is the the order of the col names before creating the prop.tests...

[1] "day"               "ITEM76222_0"       "ITEM76222_1"       "ITEM78454_0"       "ITEM78454_1"       "LOG76222_0"       
 [7] "LOG76222_1"        "LOG78454_0"        "LOG78454_1"        "can"               "canlog"            "tot_sales_np"     
[13] "tot_sales_pr1"     "tot_sales_pr2"     "log_tot_sales_np"  "log_tot_sales_pr1" "log_tot_sales_pr2" "row_number"

After the proptests (following the left join), we can see the order is maintained.

[1] "row_number"        "prop_NL_NP1"       "prop_NH_NP1"       "base_chi"          "base_pval1"        "day"              
 [7] "ITEM76222_0"       "ITEM76222_1"       "ITEM78454_0"       "ITEM78454_1"       "LOG76222_0"        "LOG76222_1"       
[13] "LOG78454_0"        "LOG78454_1"        "can"               "canlog"            "tot_sales_np"      "tot_sales_pr1"    
[19] "tot_sales_pr2"     "log_tot_sales_np"  "log_tot_sales_pr1" "log_tot_sales_pr2"

But in the last image after doing relocate we can see the order is broken after the column tot_sales_pr_1, where it should have tot_sales_pr_2 it has the log version.

[1] "day"               "ITEM76222_0"       "ITEM76222_1"       "ITEM78454_0"       "ITEM78454_1"       "LOG76222_0"       
 [7] "LOG76222_1"        "LOG78454_0"        "LOG78454_1"        "can"               "canlog"            "tot_sales_np"     
[13] "tot_sales_pr1"     "log_tot_sales_pr2" "row_number"        "prop_NL_NP1"       "prop_NH_NP1"       "base_chi"         
[19] "base_pval1"        "tot_sales_pr2"     "log_tot_sales_np"  "log_tot_sales_pr1"

Why is this this happening, and can it be avoided?

It is worth pointing out that I prefer this to be done in one pipe, but I suppose it is not necessary. I have added the original code slice to you can see what I am doing in case it helps.

df_list2 <- purrr::pmap(list(df_list, NP_conditions, PR__conditions, item_names), function(first, second, third, forth){
  if(second == 2 & third == 2 | forth[[2]] == forth[[3]]) 
    purrr::pmap(first,~{prop.test(x = c(..2, ..4), n = c(..12, ..12), correct = "FALSE")}) %>%
    purrr::map_df(broom::tidy, .id = 'formula') %>%
    mutate(formula = as.integer(formula)) %>%
    dplyr::select(-one_of(c('parameter','conf.low','conf.high','method','alternative'))) %>%
    dplyr::rename(prop_NL_NP1 = estimate1 , prop_NH_NP1 = estimate2 , 
                  base_chi = statistic, base_pval1 = p.value) %>%
    dplyr::left_join(first, by = c("formula"="row_number")) %>%
    dplyr::rename(row_number=formula) %>%
    dplyr::relocate(6:length(first), after = last_col()) 
})

simple repex of df in df_list

dat <- data.frame(Date = sample(seq(as.Date('2019/01/01'), as.Date('2020/01/01'), by="day"), 100),
                  item_A1 = rnorm(n = 100, mean = 1000, sd = 2),
                  item_B1 = rnorm(n = 100, mean = 500, sd = 2),
                  item_B2 = rnorm(n = 100, mean = 700, sd = 2)) %>%
       mutate(tot1=item_A1 + item_B1, tot2=item_A1 + item_B2)

Hope that is enough information. Please let me know if you need more? Or some clarity.

smichal · September 21, 2020, 7:13pm

Could you check with your full dataset if you can use select instead of relocate?

This is an example:

library(tidyverse)
# this is the result after the left_join. Say, columns 2:4 come from left_join
df <- tibble(one = 1, two = 2, three = 3, four = 4, five = 5, six = 6)
df
#> # A tibble: 1 x 6
#>     one   two three  four  five   six
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     2     3     4     5     6
  
# image columns 2:4 are from left_join
df %>% 
  # select the complement of 2:4 (i.e. everything but 2:4), then 2:4 afterwards
  select(!2:4, 2:4)
#> # A tibble: 1 x 6
#>     one  five   six   two three  four
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     5     6     2     3     4

^{Created on 2020-09-21 by the reprex package (v0.3.0)}

August · September 22, 2020, 3:03pm

Hi smichal,

That worked perfectly. I had used the following as a work around instead...

#function required for reordering
movetolast <- function(data, move) {
  data[c(setdiff(names(data), move), move)]
}

#followed by adding the following to the pipe (which is the renamed statistical output from the prop.tests
movetolast(c("row_number", "prop_NL_NP1", "prop_NH_NP1", "base_chi", "base_pval1"))

I'm much happier using the dplyr solution, thank you! But still it would be nice to know why it is happening , odd one.

francisbarton · September 25, 2020, 7:13pm

Hi @August

A couple of suggestions:

As I understand it, you have a 19-col results_df, which you left_join to an original dat_df (presumably of length 4 as the joined table is of length 22. The relocate command you give moves columns 6:19 (19 is length of results_df) to the end of the joined df. That's columns day to tot_sales_pr2... but these are cols 1:14 in your third list. This is because you've mis-specified the .after command: it has a dot at the beginning. It seems that relocate has ignored your after without giving an error (?) and performed its default behaviour, which is move the specified columns to the start of the df.
I am guessing instead of specifying column 19 (length(results_df)) you maybe meant to specify the last column of the current (joined) df. You can do this by using the dplyr::last_col() shortcut, or probably alternatively by using length(.) but I haven't checked this.
I also couldn't work out from your original example how you had got row_number in your joined df, as it was formula on the LHS of your by vector. I am wondering if dplyr::right_join() might be an appropriate alternative function to look at?

system · October 2, 2020, 7:13pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.