Creating summary with recoding variables

,

Hi all, as i have a dataframe

First four columns are Categories,Last four columns are calculated variables.
calculated variables can have values 1,2,3,4 or 1-9

i am trying to create a dynamic function like
function(data,calcuation_var,grouping_var)

also creating dynamic recoding for creating new groups
db$new_var <- recode(db$region,c(1,2)~"A",c(4,5)~"B",c(6,7,8)~"C")

new variable can be A, B, C,.........N
but i am stuck at from where to start, how to start

the required output should be like for Col2

	      A	B	  C
1	    12%	41%	23%
2	    7%	10%	6%
3	    34%	16%	9%
4	    47%	33%	62%
N   	53	56	119

% values are (Percentage of occurrence for categories accordingly), N is the Total number of responses.

:Note Please provide a simplest solution as I am new to R , so that i can modify or give theme going forward.
please let me if any more explanation required.

Are you familiar at all with tidyverse / dplyr ?
you might be in danger of reinventing the wheel here, since the point of this hugely popular packages is to enable quite easy summarisations with an easy to use syntax.

yes i am pretty familiar with that both, but need a approach to do that to start, rest modification i can do.

library(tidyverse)
set.seed(42)
(input_df <- tibble(
  id = 1:20,
  region = sample.int(5, 20, replace = TRUE),
  gender = sample.int(2, 20, replace = TRUE),
  sector = sample.int(3, 20, replace = TRUE),
  col1 = sample.int(6, 20, replace = TRUE),
  col2 = sample.int(7, 20, replace = TRUE),
  col3 = sample.int(8, 20, replace = TRUE),
  col4 = sample.int(16, 20, replace = TRUE)
) %>% mutate(across(starts_with("col"), ~ ifelse(. > 4, NA, .))) %>%
  mutate(across(starts_with("col"), forcats::as_factor)))


(recoded_df <- mutate(input_df,
  newvar = case_when(
    between(region, 1, 2) ~ "A",
    between(region, 4, 5) ~ "B",
    between(region, 6, 7) ~ "not seen",
    TRUE ~ "region3"
  )
))

(long_counts <- recoded_df %>% group_by(col1, newvar) %>%
  summarise(n = n()))
(total_col_counts <- group_by(long_counts, newvar) %>% summarise(sum_n = sum(n)))
(long_counts_x <- left_join(
  long_counts,
  total_col_counts
) %>% mutate(col_pcnt = paste0(round(100 * n / sum_n, digits = 2), "%")))



(tidied_df <- pivot_wider(long_counts_x, id_cols = col1, names_from = newvar, values_from = col_pcnt))

(summary_row <- pivot_wider(total_col_counts, names_from = newvar, values_from = sum_n, values_fn = as.character))

(collated_df <- bind_rows(tidied_df, cbind(col1 = "Totals:", summary_row)))
 
(cleaned_df <- mutate(collated_df,
                      across(.fns = ~if_else(is.na(.),'',.))))

Getting error on
Error in across(starts_with("col"), ~ifelse(. > 4, NA, .)) :
could not find function "across"

i also tried to install dplyr and tidy verse from devtools but still getting error

you could use mutate_at() instead of mutate with across.
To use across() you would install the dev version of dplyr from github

Thanks for you consistent reply

still getting error Error: starts_with() must be used within a selecting function.

for single variable i have created below, do we any solution where i can update something in my current function.....

 data <- data[!is.na(data[[var]]), ]
  T1 <- as.data.frame(table(data[[var]]))
  all <- sum(T1[, 2])
  T1 <- T1 %>% mutate(
    !!Name_of_variable := as.character(Var1),
    "Percent" = format(round(Freq * 100 / all,1),nsmall = 1),
    "N" = as.numeric(Freq)
  ) %>%
    select(!!Name_of_variable,"Percent","N")
  names(T1)[2] <- "  " # update the name of Header in double quotes
  T1[ ,2]<-sapply(T1[,2], function(x) ifelse(mask_m(x,all)=="--","--",paste0(mask_m(x,all),"%")))
  
  T1<-T1%>% select(-N)
  
  T1<- rbind(c("N",all),T1)
 

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.