# How can I write a loop in R for the following problem?

I want to write a for loop for my problem. I want to do column-based normalization within each year group, so I want to write a for loop function that first filters the year does the normalization (with my function lapply(tmp[2:3], function(tmp) bestNormalize(tmp , standardize=TRUE, quiet = TRUE)) for each column and then pass to next year and so on and want to save the results to a list. My data look like

Year Score 1 Score 2
2012 34 45
2012 41 46
2013 31 44
2013 44 33
2014 35 56
2014 42 21

I wrote this but it gives me the final year only, I am a newbie and could not find the similar example as my case, can someone help me?

``````i=2012
for (i in 1:3){
tmp = newdf[newdf\$Year==i+2011,]
abc = lapply(tmp[2:3], function(tmp) bestNormalize(tmp , standardize=TRUE, quiet = TRUE))
print(abc)

}

``````

Hello,

could you give some more details or a wanted outcome? So do you want to standardize (e.g. subtract mean and divide by standard deviation) all values from the data.frame (e.g. Score 1 and Score 2) within a given year? Or only all values from Score 1 and Score 2 by group separately?

I assume the result would be a list with a data.frame for every year, containing 2 columns (Score 1 and Score 2 normalized?). But maybe you can clarify it a bit, so I can think of an optimal solution.

Thanks and kind regards

As a first try, maybe this is what you want:

``````library(collapse)
#> collapse 1.8.6, see ?`collapse-package` or ?`collapse-documentation`
#>
#> Attache Paket: 'collapse'
#> Das folgende Objekt ist maskiert 'package:stats':
#>
#>     D

data <- data.frame(
year = c(2012,2012,2013,2013,2014,2014),
score_1 = c(34,41,31,44,35,42),
score_2 = c(45,46,44,33,56,21)
)

bestNormalize <- function(x,standardize=TRUE,quiet = TRUE){
# do some stuff
result <- (x - mean(x))/sd(x)
return(result)
}

data |>
fgroup_by(year) |>
fmutate(
score_1_norm = bestNormalize(score_1),
score_2_norm = bestNormalize(score_2)
) |>
fungroup() |>
rsplit(~ year)
#> \$`2012`
#>   score_1 score_2 score_1_norm score_2_norm
#> 1      34      45   -0.7071068   -0.7071068
#> 2      41      46    0.7071068    0.7071068
#>
#> \$`2013`
#>   score_1 score_2 score_1_norm score_2_norm
#> 1      31      44   -0.7071068    0.7071068
#> 2      44      33    0.7071068   -0.7071068
#>
#> \$`2014`
#>   score_1 score_2 score_1_norm score_2_norm
#> 1      35      56   -0.7071068    0.7071068
#> 2      42      21    0.7071068   -0.7071068
``````

Created on 2022-08-18 by the reprex package (v2.0.1)

The result is a list, named with the corresponding years. The given values are normalized using the defined function.

Kind regards

I found I had to use out_of_sample param as with 2 entries per variable per year, there was insufficient data to do k-fold stuff. I thought I should use \$x.t to get just the transformed data

``````library(tidyverse)
library(bestNormalize)
in_df<- tribble(~Year,~Score1	,~Score2,
2012,34	,45,
2012,41	,46,
2013,31	,44,
2013,44	,33,
2014,35	,56,
2014,42	,21)

in_df |> group_by(Year) |>
summarise(across(starts_with("Score"),
~bestNormalize(.x, quiet = TRUE,
out_of_sample = FALSE)\$x.t))

``````
``````# A tibble: 6 x 3
# Groups:   Year 
Year Score1 Score2
<dbl>  <dbl>  <dbl>
1  2012 -0.707 -0.707
2  2012  0.707  0.707
3  2013 -0.707  0.707
4  2013  0.707 -0.707
5  2014 -0.707  0.707
6  2014  0.707 -0.707``````

Hi thank you, this is "list with a data.frame for every year, containing 2 columns" exactly what I want but each score needs to be normalized within each year group.

Then you can use the code above, or with bestNormalize (didn't know this is an actual library):

``````library(collapse)
#> collapse 1.8.6, see ?`collapse-package` or ?`collapse-documentation`
#>
#> Attache Paket: 'collapse'
#> Das folgende Objekt ist maskiert 'package:stats':
#>
#>     D
library(bestNormalize)

data <- data.frame(
year = c(2012,2012,2013,2013,2014,2014),
score_1 = c(34,41,31,44,35,42),
score_2 = c(45,46,44,33,56,21)
)

data |>
fgroup_by(year) |>
fsummarise(
score_1 = bestNormalize(score_1, out_of_sample = FALSE, quiet = TRUE)\$x.t,
score_2 = bestNormalize(score_2, out_of_sample = FALSE, quiet = TRUE)\$x.t) |>
rsplit(~ year)
#> \$`2012`
#>      score_1    score_2
#> 1 -0.7071068 -0.7071068
#> 2  0.7071068  0.7071068
#>
#> \$`2013`
#>      score_1    score_2
#> 1 -0.7071068  0.7071068
#> 2  0.7071068 -0.7071068
#>
#> \$`2014`
#>      score_1    score_2
#> 1 -0.7071068  0.7071068
#> 2  0.7071068 -0.7071068
``````

Created on 2022-08-18 by the reprex package (v2.0.1)

The result is a list (as required), with the standardized outputs for every year group (as above as well). As @nirgrahamuk mentioned, `out_of_sample = FALSE` has to be called as well.

Kind regards

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.