Multiple scatterplot in one graph & multiple Facet_wrap plots

The "noisy" data I noticed before were situations where a meter_value seemed to oscillate between values. On the assumption that a meter reading should always increase, we can identify the ones that break this assumption:

library(data.table)
library(tidyverse)
library(lubridate)

# For a CSV this big, data.table::fread is *much* faster than read.csv
sensor_data <- 
  data.table::fread("Downloads/dataport_oct2015-mar2016_original.csv") %>% 
  
  # convert localminute to datetime (fread imports it as character)
  mutate(localminute = lubridate::as_datetime(localminute)) %>%
  
  # arrange each meter separately
  group_by(dataid) %>% 
    arrange(localminute, .by_group = TRUE) %>% 
    mutate(interval_hr = (localminute - lag(localminute)) / lubridate::dhours(1),
           meter_change = meter_value - lag(meter_value)) %>%
    ungroup()

# Which have noisy data? (I'm assuming a negative change indicates an error)
noisy_readings <- 
  sensor_data %>%
  filter(meter_change < -100)   # Setting here to ignore small changes

# Full list of meters with at least one negative value change
noisy_meters <- unique(noisy_readings$dataid)
# ADDED 2018-05-28

# For each meter with noisy data, define a time window on either side of the noise.
# Window in seconds, so 60*60*24 is one day
noisy_ranges <-
  noisy_readings %>%
  group_by(dataid) %>%
  summarize(min_range = min(localminute) - 60*60*24*3,
            max_range = max(localminute) + 60*60*24*3)

# Join the time window to the original data frame, and only keep rows near noise.
noisy_context <-
  sensor_data %>%
  left_join(noisy_ranges) %>%
  filter(localminute >= min_range,
         localminute <= max_range)

# Plot all the examples of noise
ggplot(noisy_context, aes(localminute, meter_value, group = dataid, label = dataid)) +
  geom_point(shape = 1, alpha = 0.1) + 
  geom_line(alpha = 0.3) +
  geom_text(data = noisy_context %>% group_by(dataid) %>% top_n(1, localminute),
            hjust = -0.3, size = 3) +
  scale_y_continuous(labels = scales::comma) +
  scale_x_datetime(date_breaks = "1 day", date_labels = "%b\n%d")

Interesting. It seems a subset of meters went through a period from Dec 8-13 (mostly) where the readings oscillated between a value that was consistent with trend and something roughly 10% higher.

One meter briefly had a similar issue two weeks later.

1 Like