Trouble with linear model loop

Hello!

I'm trying to run a linear model of the effect of year on the natural log (number of measles cases +1) from 194 different countries individually so that I can gain the slope from the linear model from each of the 194 countries that span between 1980-1996, so that I can make a histogram to plot these on, but I'm having trouble.

I know it needs to start with something like:

for(i in 1:194) {

but I really can't get my head around how to do it, my data that I'm importing from excel looks a little like this:

(https://i.stack.imgur.com/XFwTh.png)

(the years range from 1980-1996 in total) I've been told that it would be easier to make a loop if I convert the names of the countries into numbers, so I can do the (i in 1:194) but I don't know how because I'm very new to R.

I'm very new to R and any help on this would be greatly appreciated. Thank you!

I would not use a for loop at all. Below are two ways, one with the map function from the purrr package and another with the lapply function from base R. The first step is to use the split function to make subsets of the data, one for every value of CName. This returns a list, which is a kind of vector, where each element is a data frame. There will be one data frame for every CName. Then you can use either map() or lapply() to walk along that list and apply the lm() function. In either case, you will get a list of results from lm(), one for each country. Then you can apply the coef() function to those fits to get the fit coefficient. In this case you want the second fit coefficient, which is the slope. That slope is labeled Year, meaning it is the fit coefficient associated with the Year.
I quickly invented some data, so the CNames are just letters and the Cases numbers are random, resulting in crazy slope values. Also, I fit to the raw cases values, not to log(Cases+1). You would want to make a column containing the log(Cases+1) values beforehand.

set.seed(1)
DF <- data.frame(Row = 1:25, 
                 CName = rep(LETTERS[1:5], 5),
                 Year = rep(1980:1984, each = 5),
                 Cases = runif(25, min = 1000, max = 5000))
DF
#>    Row CName Year    Cases
#> 1    1     A 1980 2062.035
#> 2    2     B 1980 2488.496
#> 3    3     C 1980 3291.413
#> 4    4     D 1980 4632.831
#> 5    5     E 1980 1806.728
#> 6    6     A 1981 4593.559
#> 7    7     B 1981 4778.701
#> 8    8     C 1981 3643.191
#> 9    9     D 1981 3516.456
#> 10  10     E 1981 1247.145
#> 11  11     A 1982 1823.898
#> 12  12     B 1982 1706.227
#> 13  13     C 1982 3748.091
#> 14  14     D 1982 2536.415
#> 15  15     E 1982 4079.366
#> 16  16     A 1983 2990.797
#> 17  17     B 1983 3870.474
#> 18  18     C 1983 4967.624
#> 19  19     D 1983 2520.141
#> 20  20     E 1983 4109.781
#> 21  21     A 1984 4738.821
#> 22  22     B 1984 1848.570
#> 23  23     C 1984 3606.695
#> 24  24     D 1984 1502.220
#> 25  25     E 1984 2068.883

library(purrr)
DF2 <- DF %>% split(.$CName) %>% map( ~ lm(Cases ~ Year, data = .)) %>% 
  map(function(x){tmp <- coef(x); tmp[2]}) 
DF2
#> $A
#>     Year 
#> 375.0811 
#> 
#> $B
#>      Year 
#> -218.8078 
#> 
#> $C
#>     Year 
#> 195.4996 
#> 
#> $D
#>      Year 
#> -725.7537 
#> 
#> $E
#>     Year 
#> 338.6946
#Same thing with lapply
Fits <- DF %>% split(.$CName) %>% 
  lapply(FUN = function(x) lm(Cases ~ Year, data = x)) %>% 
  lapply(FUN = function(x) {tmp <- coef(x); tmp[2]})
Fits
#> $A
#>     Year 
#> 375.0811 
#> 
#> $B
#>      Year 
#> -218.8078 
#> 
#> $C
#>     Year 
#> 195.4996 
#> 
#> $D
#>      Year 
#> -725.7537 
#> 
#> $E
#>     Year 
#> 338.6946

Created on 2019-12-13 by the reprex package (v0.2.1)

This is another way to do it (I'm using gapminder dataset as sample data)

library(gapminder) # loaded just for sample data
library(tidyverse)

gapminder %>% 
  group_nest(country) %>% 
  mutate(model = map(data, ~lm(lifeExp ~ year, data = .x)),
         slope = map_dbl(model, ~coef(.x)[2])) %>% 
  ggplot(aes(x = slope)) +
  geom_histogram(binwidth = 0.05)

Having just learned about group_modify(), that was my first thought. It requires a data frame as input, hence the use of tidy() from the broom package to organize the output from lm() into a data frame. It would probably help to run the first three lines (without the last %>%) to see what the tidy lm output looks like. I too used gapminder as an example, but with gdpPercap as the independent variable (just because I am an economist).

library(dplyr)
library(broom)  # for tidy() function
library(gapminder)  # for sample data

gapminder %>% 
  group_by(country) %>% 
    group_modify(~ tidy(lm(lifeExp ~ gdpPercap, data = .))) %>% 
  ungroup() %>% 
  filter(term == "gdpPercap") %>% 
  select(country, estimate)
#> # A tibble: 142 x 2
#>    country      estimate
#>    <fct>           <dbl>
#>  1 Afghanistan -0.00224 
#>  2 Albania      0.00444 
#>  3 Algeria      0.00714 
#>  4 Angola      -0.00103 
#>  5 Argentina    0.00187 
#>  6 Australia    0.000524
#>  7 Austria      0.000450
#>  8 Bahrain      0.00142 
#>  9 Bangladesh   0.0325  
#> 10 Belgium      0.000447
#> # … with 132 more rows

Created on 2019-12-14 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.