The data.table package adds its own class to data frames it creates. It also adds certain attributes related to memory management. This infrastructure is part of how data.table is able to be very fast at handling very large data frames. The data frame is still also a data frame, so you generally don't need to worry about the extra layers of data table attributes.
Yes, data.table tries to auto-detect data types on import, but it falls back to character when that doesn't work, as in this case. You can supply data type information to fread() via a colClasses argument, but I find it more flexible to let lubridate handle the conversion after import (frankly, I find that lubridate does a better job parsing date times).
If you check the documentation for lubridate::as_datetime, you'll see that it parses into UTC by default. Your data have UTC timezone offsets, so this is a reasonable choice. You can choose later to format the date times for display in any time zone you want, but the default display will be in UTC, so:
- "2015-10-01 00:00:10-05" before parsing is displayed as
- "2015-10-01 05:00:10 UTC" after parsing
These two strings represent exactly the same date and time. Remember that all of these datetime strings are just a form of display formatting — internally, POSIXct datetimes are stored as the number of seconds since the beginning of 1970. If you want to learn more about the date time parsing that's going on here, I recommend checking out the lubridate website and reading the Journal of Statistical Software paper on lubridate,
That's the line of code that tells the function as_datetime from the package lubridate to parse localminute from character into POSIXct format.
packageName::functionName is how R represents namespaces — meaning it's how you tell R to look for a function in a specific package, even if that package is not loaded. Strictly speaking, I didn't have to use it here, since lubridate was already loaded at the beginning of my script. However, when I'm using one function from a package that has a similar name to a base function (here, lubridate::as_datetime vs base::as.Date and its variants), I tend to include the namespace just so it's very clear where I'm getting this function from. (Though in this case, maybe it wasn't so clear since you were unfamiliar with the namespace syntax!
)
I'm using the base R function order(), which can reorder vectors and data frames in a very powerful way, but unfortunately not a very readable way. Here's a breakdown of the steps, working from the inside out:
-
order(sensor_data$meter_value) takes the entire vector of meter values and puts it in ascending order
-
sensor_data[order(sensor_data$meter_value), ] selects the rows of the sensor_data data frame, sorted in order of the meter values; also selects all the columns of the data frame, since the column specification (after the comma) is left blank — so basically, this sorts the whole data frame in ascending order of meter_value
-
sensor_data[order(sensor_data$meter_value), ]$dataid selects just the dataid column from the data frame that has been sorted by meter_value
-
unique(sensor_data[order(sensor_data$meter_value), ]$dataid) grabs just the unique values of dataid from the whole vector of dataIDs that was sorted by meter_value
-
levels = unique(sensor_data[order(sensor_data$meter_value), ]$dataid) passes the result to the levels argument of factor().
The result of all this is that the dataid categories will be displayed in an order that keeps dataIDs with similar meter value ranges closer to each other than would be the case if you just put them in numeric order. You can skip this step (as I did in later code), and you'll get the dataid categories in numerical order.
mutate doesn't just add new variables, it also changes existing variables. If you tell it a new name, it will make a new variable, but if you tell it an existing variable name, it will replace that variable with the new definition you supply. So my mutate() statement converts old character localminute into new POSIXct localminute, and it converts old integer dataid into new factor dataid. The resulting data frame still has 3 variables: localminute, dataid, and meter_level.
If you're going to run dplyr code outside of a pipeline (not using %>%), then you need to supply the data frame as the first argument (the pipe normally supplies this for you). Have you read the dplyr introduction?
Once you've defined a function, you can make many different plots with just one line:
-
sensor_plot(data = sensor_data, data_id = c(2645, 1619, 2043)) makes a plot with 3 dataIDs.
-
sensor_plot(data = sensor_data, data_id = c(35)) makes a plot with just dataID 35.
-
sensor_plot(data = sensor_data, data_id = c(4874, 9295, 7030, 2575)) makes a plot with 4 dataIDs.
And so on! You might want to take a look at this guide to programming with ggplot2.
Nope, you want to change the value supplied to the scales argument in facet_wrap(), like so: facet_wrap(~ dataid, scales = "fixed"). Have you taken a look at the documentation for facet_wrap, especially the examples? If you want to set your own y-axis limits, you would do that with a separate call to scale_y_continuous:
# ...plot layers +
facet_wrap(~ dataid, scales = "fixed") +
scale_y_continuous(limits = c(0, 300000)) +
#... more layers