Hi R Community,
First post, so forgive me if I break a few rules or if this is not the clearest example.
I have a few large datasets (~3.6 million rows each) which I am looking to subset based upon a few conditions.
The data has 3 columns we are interested in: X (numeric, a stationary time series), Activity (logical), and SignalStrength (numeric, between 0-100).
The column X occasionally contains NA values, the other two do not though.
I was looking for ways to find all sub-series within X that are of a specific length (in my case 600,000 rows) that do not contain NA values.
I looked into na.contiguous however, this only gives me the longest of such series, not all of the series. The other method I had been thinking about was to have some form of "rolling window" approach?
The reason I want to extract multiple series is that I will be comparing each of these series to find the ones which have the lowest number of TRUE values for 'Activity' and the highest average 'SignalStrength'.
I'm comfortable with the last two filtering functions, but am struggling with the initial subsetting based upon NAs and length.
Any help would be greatly appreciated. Here is a glimpse of what the data looks like:
X_716557 Activity_716557 SignalStrength_716557
1 0.104 0 31.6
2 0.083 0 31.6
3 0.002 0 31.6
4 -0.06 0 31.6
5 -0.048 0 31.6
6 0.002 0 31.6
7 0.021 0 31.8
8 0.002 0 31.8
9 -0.01 0 31.8
10 0.002 0 31.8
11 0.016 0 31.8
12 0.007 0 31.8
13 -0.009 0 31.8
14 -0.012 0 31.8
15 -0.004 0 31.8
16 -0.001 0 31.8
17 -0.004 0 31.8
18 -0.004 0 31.8
19 NA 0 31.8
20 NA 0 31.8