How to not filter a data frame after a threshold

I have time series data about different stocks. Now I need to filter them according to a threshold (for example threshold = 7). If I now apply the filter function on the data then it filters all values above 7. But what I would need is that the function filters just once per stock. Which means that once a stock exceeded this limit, it should be included in the set, even if it falls below this limit again sometime afterwards.

Stock 1 time series ( 1 2 3 3 4 5 5 6 6 4 5 )
Stock 2 time series ( 6 7 8 9 9 7 8 9 10 12)
Stock 3 time series ( 3 4 5 6 7 8 9 5 4 3 6 )

Stock 1 should be filtered out because its below the threshold
Stock 2 should be in the subset after exceeding the threshold
Stock 3 should also be in the subset after exceeding the threshold. But after having exceeded this value it falls below that threshold again. But nevertheless it has exceeded it once, hence it should be in the final dataset. So (7 8 9 5 4 3 6) should be in the set even though the threshold is 7.

Sorry that I didn't make my point clear in first place.

You re right! I tried to make it clearer.

Are all your stock series in one table?

Yes they are in long format in one table.

In that case, it would be helpful if you could post a sample of the table by following the instructions from my previous post, including the triple backticks -- could you do that?

This would be a picture of my df.

I'm a little confused -- this looks like a single 'stock', but you were trying to identify several.

In the picture it is indeed a single stock. My df is in long format so all my stock time series are listed as in the photo above. So the next stock time series would be below that section in the picture but in the same table

In order for folks to be able to help you, it's important to provide data samples that they can easily copy and paste so they can reproduce your context. Could you follow the instructions I posted earlier so a sample of your data is available for copying?

```
<--- paste output of dput(head(your_data_frame, 50)) here
```

I would like to. The point is I am not really allowed to do so :confused:
I tried to make it clear with the example I gave above.
Do you know if there is a variation of the filter function that does the this?

You could provide mock up data with a similar structure, the idea is to have a proper REPRoducible EXample (reprex) we can all work with.

If you're concerned about the identity of the stocks or their prices, you could try to apply the following function to your table:

library(tidyverse)
my_fake_names <- 1:10

my_fake_prices <- 
  my_real_prices %>% 
  select(1:10) %>% 
  rename_all(~ my_fake_names) %>% 
  mutate_if(is.numeric, function(x){x + rnorm(x, mean(x), sd(x))})

Threshold should be 7

#>              t        value  Stock 
#> 1          1         3         x
#> 2          2         4         x
#> 3          3         5         x
#> 4          4         6         x
#> 5          5         6         x  
#> 6          1         5         y  
#> 6          2         6         y
#> 6          3         7         y
#> 6          4         8         y
#> 6          5         9         y
#> 6          1         6         z
#> 6          2         7         z
#> 6          3         8         z
#> 6          4         5         z
#> 6          5         4         z

I now tried to fill a frame with mock data.

Good! Now could you follow the instructions I gave above?

```
<---- paste the output of dput(your_new_table) here
```

Stock z exceeds the threshold of 7 but falls below it at t = 4. But it have exceeded it once so stock z should be in the subset with (7, 8, 5, 4)

The output should look like

#> 1            t        value  Stock 
#> 2          3         7         y
#> 3          4         8         y
#> 4          5         9         y
#> 5          2         7         z
#> 6          3         8         z
#> 7          4         5         z
#> 8          5         4         z

Have you taken a look into the guide I gave you? that is not copy/paste friendly and makes things harder for people trying to help. I'm going to make an extra effort this time but don't expect people to do this very often, the polite thing to do here is to provide a proper reprex for your question.

library(tidyverse)

sample_df <- data.frame(
  stringsAsFactors = FALSE,
                 t = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
             value = c(3, 4, 5, 6, 6, 5, 6, 7, 8, 9, 6, 7, 8, 5, 4),
             Stock = c("x","x","x","x","x","y",
                       "y","y","y","y","z","z","z","z","z")
)

sample_df %>% 
    group_by(Stock) %>% 
    mutate(flag = if_else(value >= 7, TRUE, NA)) %>% 
    fill(flag, .direction = "down") %>% 
    filter(flag) %>% 
    select(-flag)
#> # A tibble: 7 x 3
#> # Groups:   Stock [2]
#>       t value Stock
#>   <dbl> <dbl> <chr>
#> 1     3     7 y    
#> 2     4     8 y    
#> 3     5     9 y    
#> 4     2     7 z    
#> 5     3     8 z    
#> 6     4     5 z    
#> 7     5     4 z

Created on 2020-03-15 by the reprex package (v0.3.0.9001)

@andresrcs and @dromano have already (a) requested that you post a reprex, (b) been kind enough to provide guidance for how you can easily do so and (c) taken efforts to assist you despite the fact that have not posted one. By making just a little more effort, you could have made it so much easier for others to assist you. Please bear this in mind in the future.

@andresrcs has already taken a stab at answering your question. The solution posted, however, will only return observations for stocks that exceeded the threshold value after they have done so . Perhaps this is what you want, but from the manner in which you initially phrased your question, it seems that what you actually want is to:

  1. Identify all stocks that have ever exceeded the threshold value,
  2. Retain all of the observations for such stocks, and
  3. Discard all of the observations for all other stocks.

If this is in deed the case, the following should give you what you need (i.e. it will return all values for stocks y and z and remove all values for stock x). Note that I have used the same sample data used by @andresrcs:

# load library
library(tidyverse)

# create some mock data (similar output would have been produced if you used
# dput() on your data)
sample_df <- structure(
  list(
    t = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
    value = c(3, 4, 5, 6, 6, 5, 6, 7, 8, 9, 6, 7, 8, 5, 4),
    Stock = c("x", "x", "x", "x", "x", "y", "y", "y", "y", "y", "z", "z", "z", "z", "z")), 
  class = "data.frame", 
  row.names = c(NA, -15L)
)

# take sample_df %>% group by each stock identifier %>% retain each group's
# observations iff any of the values in the series have ever exceeded 7 %>%
# ungroup
sample_df %>% 
  group_by(Stock) %>% 
  filter(any(value > 7)) %>% 
  ungroup()
#> # A tibble: 10 x 3
#>        t value Stock
#>    <dbl> <dbl> <chr>
#>  1     1     5 y    
#>  2     2     6 y    
#>  3     3     7 y    
#>  4     4     8 y    
#>  5     5     9 y    
#>  6     1     6 z    
#>  7     2     7 z    
#>  8     3     8 z    
#>  9     4     5 z    
#> 10     5     4 z

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.