Loop with multiple groups

Hyto · June 3, 2022, 4:01pm

Hi,
I am trying to build a loop which calculates the value of a GDP at date t thanks to the value at t-1 and the growth rate. However, I would like this loop to make the computation in every country independently. I could not find any easy way to do this and my skills are not developed enough to use lapply alone yet. Do you have any hint to do this ?

The loop I have for now:

for (i in 2:length(iso)) {
  GDP_value[i] <- (GDP_value[i-1])*(1+GDP_growth[i])
}

What I want is the loop to start again when iso have new value and so does not use the previous value of GDP.

dvetsch75 · June 3, 2022, 4:11pm

Do you have another vector that stores countries? It isn't clear to me from your reprex how you know which country you are calculating GDP for?

Hyto · June 3, 2022, 4:17pm

I am sorry, I am not very clear indeed.
So my data structure is as following: a first column named iso with the name of countries, a second with the year, a third one with the gdp growth for each country every year and finally the gdp value at the first year of the period. So for example, I have the gdp value of England in 2016, its growth and I calculate the GDP value of 2017 using the loop. However when the country changes, the loop still uses the GDP value of the previous row (so using the GDP value of another country), which is my problem.

myData = data.frame(
  iso=c("AE", "AE", "AF", "AF"),
  year=c(2016, 2017, 2016, 2017),
  GDP_growth=c(0.1, 0.2, 0.3, 0.4),
  GDP_value=c(10, NA, 10, NA)
)

EDIT:
I tried the following idea but with no result.

myData = data.frame(
  iso=c("AE", "AE", "AF", "AF"),
  year=c(2016, 2017, 2016, 2017),
  GDP_growth=c(0.1, 0.2, 0.3, 0.4),
  GDP_value=c(10, NA, 10, NA)
)

my_function = function(x){
  sapply(2:length(x), function(GDP_growth){
    lag(x)*(1+GDP_growth)
  })
}

myData <- myData %>% 
  group_by(iso) %>%
  mutate(GDP_value=my_function(GDP_value))

dvetsch75 · June 6, 2022, 2:28pm

Thanks for this - I would definitely suggest NOT using a loop. Looping over a dataframe typically leads to code that is slow and difficult to debug. I would use dplyr::group_by, dplyr::lag, and dplyr::case_when. It isn't clear to me what year the GDP_growth variable applies to, so you may need to remove the lag around GDP_growth.

But the basic premise is that you need to first group by country so you only calculate growth for each group instead of across the whole dataset, then you want to use case_when so that you don't accidentally change GDP_value for the rows that you already know that value. Then it's just a matter of doing the multiplication with the appropriate lags . Hope that helps.

library(dplyr)

myData <- data.frame(
    iso=c("AE", "AE", "AF", "AF"),
    year=c(2016, 2017, 2016, 2017),
    GDP_growth=c(0.1, 0.2, 0.3, 0.4),
    GDP_value=c(10, NA, 10, NA)
)

myData %>% 
    group_by(iso) %>% 
    mutate(
        GDP_value = case_when(
            !is.na(GDP_value) ~ GDP_value,
            is.na(GDP_value) ~ (1 + lag(GDP_growth)) * lag(GDP_value)
        )
    )
#> # A tibble: 4 x 4
#> # Groups:   iso [2]
#>   iso    year GDP_growth GDP_value
#>   <fct> <dbl>      <dbl>     <dbl>
#> 1 AE     2016        0.1        10
#> 2 AE     2017        0.2        11
#> 3 AF     2016        0.3        10
#> 4 AF     2017        0.4        13

^{Created on 2022-06-06 by the reprex package (v1.0.0)}

Hyto · June 9, 2022, 3:15pm

Thank you for your answer !

system · June 30, 2022, 3:16pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.