Global normalization of one column

Hi there,

I would like to apply a global normalization of one column of a tibble. I used mutate_at with a normalization function as following

normalize2 <- function(x, na.rm = T) (x  / max(x, na.rm = T))
mutate_at('avg', normalize2) %>% 

It did normalization but within a subset according to other columns. So the "normalized" column avg has multiple "1"

I googled a bit and found the following code which worked,

mutate_at('avg', ~(scale(.) %>% as.vector)) 

but I don't get the general logic of muate_at then. Could someone please explain how come this code worked and how to make my version work in the same way without considering the conditions of other columns?

More importantly, what is the general way to tell the mutate function to apply a function globally on a column instead of considering other conditions? I realize this is very dangerous at least for me because my first impression is it should be applied just globally.

Thanks,
ZC

Your function normalize2 and the scale() function calculate different things. Below are comparisons of normalize2, scale and a normalize3 function that matches scale().

I am not sure what you mean by "global normalization". Do the examples below do that? If so, you may need to run ungroup() on your data frame before you normalize the avg column.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
DF <- data.frame(A = 1:5, B = 2:6, avg = 3:7)
normalize2 <- function(x, na.rm = T) (x  / max(x, na.rm = T))
DF %>% mutate_at('avg', normalize2) #scale to the max value
#>   A B       avg
#> 1 1 2 0.4285714
#> 2 2 3 0.5714286
#> 3 3 4 0.7142857
#> 4 4 5 0.8571429
#> 5 5 6 1.0000000
DF %>% mutate_at('avg', ~(scale(.) %>% as.vector)) # scale to mean = 0 sd = 1
#>   A B        avg
#> 1 1 2 -1.2649111
#> 2 2 3 -0.6324555
#> 3 3 4  0.0000000
#> 4 4 5  0.6324555
#> 5 5 6  1.2649111
DF %>% mutate_at('avg', ~ scale(.)) #same as above, simplified
#>   A B        avg
#> 1 1 2 -1.2649111
#> 2 2 3 -0.6324555
#> 3 3 4  0.0000000
#> 4 4 5  0.6324555
#> 5 5 6  1.2649111

normalize3 <- function(x, na.rm = TRUE) (x - mean(x, na.rm = TRUE))/sd(x, na.rm = TRUE)
DF %>% mutate_at('avg', normalize3) #manual version of mean = 0 sd = 1
#>   A B        avg
#> 1 1 2 -1.2649111
#> 2 2 3 -0.6324555
#> 3 3 4  0.0000000
#> 4 4 5  0.6324555
#> 5 5 6  1.2649111

Created on 2020-08-16 by the reprex package (v0.3.0)

Thanks a lot for the detailed reply. Sorry for the misleading function normalize2, I should have used a same operations. Yes, ungroup() did work. Actually the tibble I am using is after full_joint() of two tibbles. I did do group_by() in one of these tibbles and I guess the joint tibble still consider that group_by() operations

Thanks,
ZC

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.