Problem with dplyresque operations on columns.

Hi,

I got a tibble with an unspecific numbers of numeric columns and I would like to do two thinks in a dplyr-pipe way:

  • multiply each columns by a different number, and
  • calculate a few characteristics (like mean, median, sd, etc) of every column.

E.g.

dat <- tribble(~A1, ~A2, ~A3,
2, 3, 4,
10, 4, 8)

multi <- c(3, 10, 17)

  • now multiply A1 by 3, A2 by 10 and A3 by 17 an afterwards
  • calculate mean, median, sd, etc of the columns A1, A2 and A3.

Is there any nice way to do something like that in an general way?
Thanks for the help.

Sonobox.


library(tidyverse)

dat <- tribble(~A1, ~A2, ~A3,
               2, 3, 4,
               10, 4, 8)

multi <- c(3, 10, 17)

dat |> map2_dfc(multi,\(x,y){x*y}) |>
  summarise_all(list(mean=mean,
                     median=median))

@nirgrahamuk gave the dplyresque; here is a more succinct version that does not rely on dplyr other than generating the motivating data.

dat <- as.matrix(dplyr::tribble(~A1, ~A2, ~A3,
               2, 3, 4,
               10, 4, 8))

mult_by <- c(3,10,7)

DescTools::Desc(dat[,1:3]*c(3,10,7))
#> ------------------------------------------------------------------------------ 
#> dat[, 1:3] * c(3, 10, 7) (matrix, array)
#> 
#> Summary: 
#> n: 235, rows: 2, columns: 3
#> 
#> Pearson's Chi-squared test:
#>   X-squared = 55.283, df = 2, p-value = 9.896e-13
#> Log likelihood ratio (G-test) test of independence:
#>   G = 61.144, X-squared df = 2, p-value = 5.285e-14
#> Mantel-Haenszel Chi-squared:
#>   X-squared = 32.913, df = 1, p-value = 9.639e-09
#> 
#> Phi-Coefficient        0.485
#> Contingency Coeff.     0.436
#> Cramer's V             0.485
#> 
#>                                        
#>                 A1     A2     A3    Sum
#>                                        
#> A     freq       6     21     40     67
#>       perc    2.6%   8.9%  17.0%  28.5%
#>       p.row   9.0%  31.3%  59.7%      .
#>       p.col   5.7%  63.6%  41.7%      .
#>                                        
#> B     freq     100     12     56    168
#>       perc   42.6%   5.1%  23.8%  71.5%
#>       p.row  59.5%   7.1%  33.3%      .
#>       p.col  94.3%  36.4%  58.3%      .
#>                                        
#> Sum   freq     106     33     96    235
#>       perc   45.1%  14.0%  40.9% 100.0%
#>       p.row      .      .      .      .
#>       p.col      .      .      .      .
#> 

Roll your own

# rows
apply(dat,1,sd)
#> [1] 1.00000 3.05505
# columns
apply(dat,2,sd)
#>        A1        A2        A3 
#> 5.6568542 0.7071068 2.8284271

# function to do both

calc_sd_both <- function(x) c(apply(x,1,sd),apply(x,2,sd))

calc_sd_both(dat)
#>                            A1        A2        A3 
#> 1.0000000 3.0550505 5.6568542 0.7071068 2.8284271

Created on 2023-02-04 by the reprex package (v2.0.1)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.