Create sub samples from a large data set with large data gaps

Hi there RStudio Community! I'm new here and really appreciate any help with the subject.

I want to bring in a .csv with Time (mins) in 15 minute increments and Water Level (ft). Unfortunately, the dataset often has large portions of time where the Water Level data is missing. I would like to automate the process of bringing in the large dataset and then breaking it down into smaller datasets that do not include the empty (NA) values.

water <- data.frame(time = 1:30, level = rep(c(1:6, NA, NA, NA, NA), 3))
water <- within(water, segment <- 1 + cumsum(is.na(level) != is.na(dplyr::lag(level, 1, 0))))
water
#>    time level segment
#> 1     1     1       1
#> 2     2     2       1
#> 3     3     3       1
#> 4     4     4       1
#> 5     5     5       1
#> 6     6     6       1
#> 7     7    NA       2
#> 8     8    NA       2
#> 9     9    NA       2
#> 10   10    NA       2
#> 11   11     1       3
#> 12   12     2       3
#> 13   13     3       3
#> 14   14     4       3
#> 15   15     5       3
#> 16   16     6       3
#> 17   17    NA       4
#> 18   18    NA       4
#> 19   19    NA       4
#> 20   20    NA       4
#> 21   21     1       5
#> 22   22     2       5
#> 23   23     3       5
#> 24   24     4       5
#> 25   25     5       5
#> 26   26     6       5
#> 27   27    NA       6
#> 28   28    NA       6
#> 29   29    NA       6
#> 30   30    NA       6
water_split <- split(water, water$segment)
water_split
#> $`1`
#>   time level segment
#> 1    1     1       1
#> 2    2     2       1
#> 3    3     3       1
#> 4    4     4       1
#> 5    5     5       1
#> 6    6     6       1
#> 
#> $`2`
#>    time level segment
#> 7     7    NA       2
#> 8     8    NA       2
#> 9     9    NA       2
#> 10   10    NA       2
#> 
#> $`3`
#>    time level segment
#> 11   11     1       3
#> 12   12     2       3
#> 13   13     3       3
#> 14   14     4       3
#> 15   15     5       3
#> 16   16     6       3
#> 
#> $`4`
#>    time level segment
#> 17   17    NA       4
#> 18   18    NA       4
#> 19   19    NA       4
#> 20   20    NA       4
#> 
#> $`5`
#>    time level segment
#> 21   21     1       5
#> 22   22     2       5
#> 23   23     3       5
#> 24   24     4       5
#> 25   25     5       5
#> 26   26     6       5
#> 
#> $`6`
#>    time level segment
#> 27   27    NA       6
#> 28   28    NA       6
#> 29   29    NA       6
#> 30   30    NA       6

Created on 2021-07-29 by the reprex package (v1.0.0)

1 Like

Thank you. This looks like the logic I'm looking for. I'll try to apply it to my dataset!

This is exactly what I was looking for. Now I just need to save the outputs to a .csv file.

Any ideas on how to generate plots automatically from these sub groups?

I am not exactly sure what you mean but will this help?

library(ggplot2)
ggplot(water, aes(time, level, colour = segment)) +  geom_point() +
  facet_grid(. ~ segment)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.