organizing columns by percent true

help · August 24, 2022, 6:44pm

I have data looks like the following

name life_period yrs_school yrs_school1 yrs_school2....... yrs_school1_match ...
Ana     3             12         12        11                   TRUE

imagine I have 50 different columns from yrs_school1 to yrs_school50_ila. they are all named random things at the end of each column. I want to see if yrs_school == one of the other columns. the result is found in columns named like years_school1_match. yrs_school1_match is true if yrs_school==yrs_school1. there are 50 different match columns like the one seen above. I am interested in re-ordering the yrs_school1 to yrs_school50_ila columns based on the percent that they are true in their match columns, removing any na from the percent true. the school is ordered from most % true to least % true. the ones with the highest percentage will be before the ones that are not. note that name, life_period, and yrs_school remain in the front of the output data frame. is this something possible to do in tidy verse?thank u

nirgrahamuk · August 24, 2022, 10:23pm

library(tidyverse)

(mydata <- tibble::tribble(
  ~name, ~life_per, ~yrs_school, ~yrs_school1, ~yrs_school2, ~yrs_school3, ~yrs_school4, ~yrs_school5, ~yrs_school6,
  "Anna", 3, 5L, 17L, 8L, 1L, 5L, 5L, 11L,
  "Anna", 3, 2L, 20L, 2L, 19L, 2L, 15L, 9L,
  "Anna", 3, 1L, 1L, 3L, 5L, 9L, 20L, 20L,
  "Anna", 3, 6L, 17L, 5L, 15L, 6L, 6L, 6L,
  "Anna", 3, 8L, 1L, 13L, 6L, 8L, 1L, 5L,
  "Anna", 3, 7L, 3L, 7L, 16L, 18L, 6L, 11L
))

(mydat_with_match <- mydata |> mutate(
  across(
    .cols = starts_with("yrs_school") & !ends_with("yrs_school"),
    .fns = ~ yrs_school == .x,
    .names = "{col}_match"
  )
))

(summary_of_match <- summarise(
  mydat_with_match,
  across(
    .cols = starts_with("yrs_school") & ends_with("match"),
    .fns = ~ mean(.x, na.rm = TRUE)
  )
))

(order_to_use <- pivot_longer(summary_of_match, cols = everything()) |>
  arrange(desc(value)) |>
  pull(name) |>
  str_replace_all("_match", ""))

(fin <- mydat_with_match |> relocate(all_of(order_to_use),
  .after = "yrs_school"
))

help · August 24, 2022, 11:47pm

hi nirgrahamuk, thank u. what is the pull(name) doing exactly? I am not sure I ge t why its there.

nirgrahamuk · August 25, 2022, 7:43am

Its gets a column from a data frame

Andrzej · August 25, 2022, 3:09pm

Hi Nir, what does it mean this tilde style here ? And my second question: what is denoted by .x here ? Usually it means "every element of a vector". Is that also the case here ?

nirgrahamuk · August 25, 2022, 7:18pm

from the purrr docs:

For unary functions, ~ .x + 1 is equivalent to function(.x) .x + 1

base R introduced another syntax for anonymous functions which would be

 .fns =\(x)mean(x, na.rm = TRUE)

both this and the tilde just mean

   .fns =function(x)mean(x, na.rm = TRUE)

Andrzej · August 25, 2022, 8:31pm

Thank you, I am always a bit confused about .x, .y, .data etc.

system · September 15, 2022, 8:32pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.