Time series per period mean

Hi, I have the following dataset:

# A tibble: 134,644 x 3
     |  `1y`| `5y    | date       |
     | <dbl>|  <dbl> | <date>     |    
| 1  |   3  |   6    | 2009-01-01 |
| 2  |   2  |  NA    | 2009-01-01 |
| 3  |  -1  |   NA   | 2009-01-01 |
| 4  |   3  |  NA    | 2009-01-01 |
| 5  |  -5  |  NA    | 2009-01-01 |
| 6  |   3  | -2     | 2009-01-01 |
| 7  |  NA  |  NA    | 2009-01-01 |
| 8  |   5  | NA     | 2009-01-01 |
| 9  |   0  |  5     | 2009-01-01 |
| 10 |   0  |   NA   | 2009-01-01 |
# ... with 134,634 more rows

It is quarterly data, so the dates are 2009-01-01, 2009-04-01, 2009-07-01, 2009-10-01, 2010-01-01 and so on and so forth. The data has the following class:

sapply(uk_data, class)

   sapply(uk_data, class)
   1y        5y      date 
"numeric" "numeric"    "Date" 

For each variable, I am trying to get some descriptive statistics per period (mean, median, sd, skewness etc.) ignoring the NAs. I will later substitute the mean of each period for the NAs, hence I was thinking of adding two columns with the period means to the dataset, substitute them for the NAs, and finally find the remaining descriptive statistics. I have run the following code:

uk_data %>%
group_by(date) %>%
summarise_at(vars("1y"),
list(name = mean("1y")))

Yet, I have the following error:

Error in `FUN()`:
! expecting a one sided formula, a function, or a function name.
Run `rlang::last_error()` to see where the error occurred.
Warning message:
In mean.default("1y") : argument is not numeric or logical: returning NA

<error/rlang_error>
 Error in `FUN()`:
! expecting a one sided formula, a function, or a function name.
Backtrace:
1. uk_data %>% group_by(date) %>% ...
2. dplyr::summarise_at(., vars("1y"), list(name = mean("1y")))
3. dplyr:::manip_at(...)
4. dplyr:::as_fun_list(.funs, .env, ..., .caller = .caller)
5. dplyr:::map(...)
6. base::lapply(.x, .f, ...)
7. dplyr FUN(X[[i]], ...)
Run `rlang::last_trace()` to see the full context.

 <error/rlang_error>
 Error in `FUN()`:
! expecting a one sided formula, a function, or a function name.
  Backtrace:
   x
1. +-uk_data %>% group_by(date) %>% ...
2. \-dplyr::summarise_at(., vars("1y"), list(name = mean("1y")))
3.   \-dplyr:::manip_at(...)
4.     \-dplyr:::as_fun_list(.funs, .env, ..., .caller = .caller)
5.       \-dplyr:::map(...)
6.         \-base::lapply(.x, .f, ...)
7.           \-dplyr FUN(X[[i]], ...)
8.             \-rlang::abort("expecting a one sided formula, a function, or a function name.")

Is anyone able to help? Thanks in advance!

Hi there,

You can choose to ignore the NA values when summarising your data, so you don't have to fill in the missing values. Of course if you like you can still do this as an in between step, but the results are identical

library(tidyverse)

set.seed(670) #Only needed for reproducibility 
myData = data.frame(id = 1:2, value = sample(c(NA, 1:5), 10, replace = T) %>% 
                      as.numeric()) %>% arrange(id)


#Fill in missing values with mean
myData %>% group_by(id) %>% 
  mutate(value = ifelse(is.na(value), mean(value, na.rm = T), value)) %>% 
  summarise(value = mean(value))
#> # A tibble: 2 x 2
#>      id value
#>   <int> <dbl>
#> 1     1  1.33
#> 2     2  2.5

#Directly calculate mean ignoring missing values
myData %>% group_by(id) %>% 
  summarise(value = mean(value, na.rm = T))
#> # A tibble: 2 x 2
#>      id value
#>   <int> <dbl>
#> 1     1  1.33
#> 2     2  2.5

Created on 2022-03-18 by the reprex package (v2.0.1)

If you like to impute missing values, the mice package is a great option as well!

Hope this helps,
PJ

You get this error message because 1Y is a non-syntactic variable name and you have to reference it among backticks (i.e. `1Y`) not quotes, otherwise, R thinks you are trying to calculate the mean of literally the "1Y" character string, and that is not possible.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.