HELP! How to flag consecutive non-zero values

test_r · January 9, 2020, 2:30pm

Hi, I am trying to create a flag for a large database.
There are multiple monthly and if a customer has an outstanding debit for consecutive months, I need a flag "Y". If not then "N" (no flag).

Please see a sample data base:
Name Jan Feb Mar April
a 0 100 150 0
b 100 0 200 0
c 100 100 100 0

Flags expected:

Name Flag
a Y
b N
c Y

Could anyone help me with the same?

Best and thanks in advance

jlacko · January 9, 2020, 4:34pm

Your database seems to be in de-normalised format (not tidy in tidy data speak). I suggest to normalize it as a first step.

Then, assuming there are no gaps in months, it should be a simple case of dplyr::lag().

For normalizing your data (turning it from wide to long format) consider tidyr::pivot_longer().

For an example on dplyr::lag() identifying duplicates (a bit different from your use case - you need both current and previous value >0, but it is a start) consider this post Remove 3rd iteration of duplicate

andresrcs · January 9, 2020, 5:05pm

To exemplify Jindra's solution (and make a reprex for your question).

sample_df <- data.frame(stringsAsFactors=FALSE,
                        Name = c("a", "b", "c"),
                        Jan = c(0, 100, 100),
                        Feb = c(100, 0, 100),
                        Mar = c(150, 200, 100),
                        April = c(0, 0, 0)
)

library(tidyverse)

sample_df %>% 
    pivot_longer(cols = Jan:April, names_to = "month", values_to = "value") %>% 
    mutate(month = as_factor(month)) %>% 
    arrange(Name, month) %>% 
    group_by(Name) %>% 
    mutate(flag = if_else(value > 0 & lag(value) > 0, TRUE, FALSE)) %>% 
    summarise(flag = any(flag, na.rm = TRUE))
#> # A tibble: 3 x 2
#>   Name  flag 
#>   <chr> <lgl>
#> 1 a     TRUE 
#> 2 b     FALSE
#> 3 c     TRUE

^{Created on 2020-01-09 by the reprex package (v0.3.0.9000)}

test_r · January 10, 2020, 7:54am

Hi,

Thank you so much. It worked perfectly. Appreciate the time!

test_r · January 13, 2020, 8:44am

Hey,
Again, thanks for your help earlier. I hope I can trouble you for one more question please.

As my dataset it very large with some repeating names. It is possible to have the result "flag" as a new column in the first data frame?

sample_df <- data.frame(stringsAsFactors=FALSE,
Name = c("a", "b", "c", "b", "d"),
Jan = c(0, 100, 100, 80, 100),
Feb = c(100, 0, 100, 80, 0),
Mar = c(150, 200, 100, 0, 0),
April = c(0, 0, 0, 0, 100)
)

New dataset result:
Name Jan Feb Mar April flag
a ........ TRUE
b ........ TRUE
c ......... TRUE
d .......... FALSE

Thanks again!

andresrcs · January 13, 2020, 1:40pm

One easy way would be to join the result with the original dataframe


library(tidyverse)

sample_df %>% 
    pivot_longer(cols = Jan:April, names_to = "month", values_to = "value") %>% 
    mutate(month = as_factor(month)) %>% 
    arrange(Name, month) %>% 
    group_by(Name) %>% 
    mutate(flag = if_else(value > 0 & lag(value) > 0, TRUE, FALSE)) %>% 
    summarise(flag = any(flag, na.rm = TRUE)) %>% 
    right_join(sample_df, by = "Name")
#> # A tibble: 3 x 6
#>   Name  flag    Jan   Feb   Mar April
#>   <chr> <lgl> <dbl> <dbl> <dbl> <dbl>
#> 1 a     TRUE      0   100   150     0
#> 2 b     FALSE   100     0   200     0
#> 3 c     TRUE    100   100   100     0

system · February 3, 2020, 1:40pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.