I'm interested in applying outlier detection models to my database of temporal information to identify some interesting data points.
I'm following along to @robjhyndman examples in his vignettes for the tsfeatures package which can be found at the bottom of this page as well as his research paper which can be found here
# INSTALL PACKAGES
pacman::p_load(tsfeatures, dplyr, broom, tidyverse, tictoc, forecast, factoextra, dbscan, ggfortify, rainbow, cowplot)
# GET DATA
yahoo <- yahoo_data()
# CREATE TIME SERIES FEATURES
tic()
hwl <- bind_cols(
tsfeatures(yahoo, c("acf_features", "entropy", "lumpiness", "flat_spots", "crossing_points")),
tsfeatures(yahoo, "stl_features", s.window = "periodic", robust = TRUE),
tsfeatures(yahoo, "max_kl_shift", width = 48),
tsfeatures(yahoo, c("mean", "var"), scale = FALSE, na.rm = TRUE),
tsfeatures(yahoo, c("max_level_shift", "max_var_shift"), trim = TRUE)
) %>%
dplyr::select(
mean, var, x_acf1, trend, linearity, curvature,
seasonal_strength, peak, trough, entropy, lumpiness, spike, max_level_shift, max_var_shift, flat_spots,
crossing_points, max_kl_shift, time_kl_shift
)
toc()
# PERFORM PCA
hwl_pca <- hwl %>%
na.omit() %>%
prcomp(scale = TRUE)
# GET A SUMMARY OF THE PCA
summary(hwl_pca)
# SCREE PLOT OF THE PRINCIPAL COMPONENTS
fviz_eig(hwl_pca)
# PLOT THE FIRST TWO PRINCIPAL COMPONENTS AND THEIR LOADINGS
autoplot(hwl_pca,loadings = TRUE, loadings.label = TRUE)
# EXTRACT THE PCA SCORES
pc1 <- hwl_pca$x[,1]
pc2 <- hwl_pca$x[,2]
pc_matrix <- cbind(pc1, pc2)
According to this article from Rob's website he uses the Rainbow package to calculate the density rankings but it's at this stage that I struggle.
The rainbow package requires the two main PCs to be converted to functional data (either fds or fts format) and I believe the fds function does that which I've completed below:
x_var <- seq(1,nrow(pc_matrix))
functional_data <- rainbow::fds(x = x_var, y = pc_matrix, xname = "xvar", yname = "yvar")
fboxplot(data = functional_data, plot.type = "functional", alpha = 0.5)
When I run the fxboxplot function I get the following error:
[1] "not enough data points"
Error in if (pcbag$is.one.dim == TRUE) { : argument is of length zero
Checking the structure of the functional data input I can see that it contains vectors of length 1748 so I'm unsure what the error is referring to:
str(functional_data)
List of 4
$ x : int [1:1748] 1 2 3 4 5 6 7 8 9 10 ...
$ y : num [1:1748, 1:2] 1.58 -1.23 -2.73 -1.07 -3.09 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "pc1" "pc2"
$ xname: chr "xvar"
$ yname: chr "yvar"
- attr(*, "class")= chr "fds"
If anyone has experience with converting data to functional form I'd be very grateful for any help that could be shared.