add parameter to display stats

str_guru · November 16, 2022, 5:05pm

I want to add one more parameter in my function to display stats like q25,q75,mean, median.
for example if i want to display only mean median the give parameter like stats =c(mean,median) or only show stats =c(mean,median).

the function is working fine but also want to add one more parameter like to display stats as required.

tt2(data = listd,var = "sale",Name_of_variable = "listd",decimal = TRUE,stats=c(mean,media))

df <- data.frame(Name = c("asdf","kjhgf","cvbnm","rtyui","cvbnm","jhfd","cvbnm","sdfghj","cvbnm","dfghj","cvbnm"),
                 sale=c(27,28,27,16,14,25,14,14,19,18,28),
                 city=c("CA","TX","MN","NY","TX","MT","HU","KL","TX","SA","TX"),
                 Dept = c("HH","MM","NN","MM","AA","VV","MM","HU","JJ","MM","ZZ"))


df1<- df
df$cc1<-1
df2<- subset(df, Dept == 'MM')
df$cc2<-ifelse(df$Dept == 'MM',1,NA)
lst<-list(df$cc1, df$cc2)
listd<-list("ALL" = df1, "MM" =df2)

#I want to run my function for listd so that i can get a  combined summary for all variables in listd
tt2<-function(data,var,footer,Name_of_variable,decimal){
  for (d in 1:length(data)) {
    cat('\n\n#### ', names(data)[d], '\n\n')
    md<-data[[d]]
    table_list<-list()
    for (i in 1:length(d))
      table_list[[i]]<-t1(md,var,footer,decimal,Name_of_variable)
    tt<- do.call(rbind,table_list)
  } 
  cat(knit_print(tt))
  cat('\n\n')
}
t1<-function(dataset,var,Suff,decimal,Name_of_variable){
  numdig <- if (decimal == TRUE) {1} else {0}
  var <- rlang::parse_expr(var) 
summ_tab1<- dataset %>% filter(!is.na(!!var)) %>%   summarise(
  q25 = format(round(quantile(!! var,  type=6, probs = seq(0, 1, 0.25), na.rm=TRUE)[2],digits = numdig),nsmall = numdig),
  Median = format(round(quantile(!! var, type=6, probs = seq(0, 1, 0.25), na.rm=TRUE)[3],digits = numdig),nsmall = numdig),
  Average = format(round( mean(!! var, na.rm=TRUE),digits = numdig),nsmall = numdig),
  q75 = format(round(quantile(!! var, type=6, probs = seq(0, 1, 0.25), na.rm=TRUE)[4],digits = numdig) ,nsmall = numdig),
  N = sum(!is.na(!!var)))
summ_tab<-summ_tab1 %>%  
  mutate(" "=!!Name_of_variable,
         q25 = q25,
         Median =Median,
         Average =Average,
         q75 = q75)%>%
  dplyr::rename(
    `25th percentile` = q25,
    `75th percentile` = q75)%>%select(" ",N,everything())
summ_tab1
}


tt2(data = listd,var = "sale",Name_of_variable = "listd",decimal = TRUE)

FactOREO · November 16, 2022, 7:00pm

Hello,

first of all: The function you gave gives me an error from cat(), that it cannot handle lists. I recreated your t1() function, to do what you want:

t1 <- function(dataset, var, Suff, decimal, stats = 'all'){
  # Suff is not used here?
  # replaced Name_of_variable with the val call
  
  # check, if provided stats vector is valid
  if( !all(stats %in% c('N','min','q25','mean','q75','max','median','all')) ) stop('valid stats are\nN, min, q25, mean, q75, max, median or a single all')
  if('all' %in% stats & length(stats) > 1) stats <- 'all'
  # unnecessary brackets removed
  numdig <- if (decimal == TRUE) 1 else 0
  var <- rlang::parse_expr(var)
    
  ### new version
  filtered_data <- dataset %>%
    filter(!is.na(!!var))
  
  ### Summary statistics
  # vector length 1
  N         <- nrow(filtered_data)
  names(N)  <- 'N'
  # named vector length 5
  qs        <- filtered_data %>% pull(!!var) %>% quantile(probs = seq.default(0,1,0.25))
  names(qs) <- c('min','q25','mean','q75','max')
  # vector length 1
  md        <- filtered_data %>% pull(!!var) %>% median()
  names(md) <- 'median'
  ### Combine them
  vals_woN  <- format(round(c(qs, md), digits = numdig), nsmall = numdig)
  
  ### the selection part
  if ('all' %in% stats){
    c(as.character(var), N, vals_woN[order(vals_woN)])
    } else {
    if ('N' %in% stats){
      # if N is wanted, push it to the front
      keep <- setdiff(which(names(val_vec) %in% stats),'N')
      c(as.character(var), N, val_vec[order(val_vec[keep])])
    } else {
      keep <- which(names(val_vec) %in% stats)
      c(as.character(var), val_vec[order(val_vec[keep])])
    }
  }
}

The output looks like this with your data.frame df from above:

t1(dataset = df, var = 'sale', decimal = TRUE,stats = 'all')
#>             N    min    q25   mean median    q75    max 
#> "sale"   "11" "14.0" "15.0" "19.0" "19.0" "27.0" "28.0"

Since you only care about visualisation, I didn't care about coercing them to actual numbers and combined everything as characters (what format() would do anyways). If you want to go through the variable 'sale' only, you can just use lapply() on your listd list object. It will look like this:

lapply(listd, t1, var = 'sale', decimal = TRUE, stats = 'all')
#> $ALL
#>             N    min    q25   mean median    q75    max 
#> "sale"   "11" "14.0" "15.0" "19.0" "19.0" "27.0" "28.0" 
#> 
#> $MM
#>             N    min    q25   mean median    q75    max 
#> "sale"    "4" "14.0" "15.5" "17.0" "17.0" "20.5" "28.0"

Your second function just doesn't makes sence for me, since you do not loop across a vector of (numeric) variables for your statistics, neither do you get the same (correct) output as lapply(), but an error and weird formatting instead.

Maybe this already solves your issue or it is at least to some extend a starting point for further investigations

Kind regards

system · December 28, 2022, 7:01pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.