Fitting of Date and Time in R Studio

Hey there,

I got a Problem with fitting of Date and Time. My problem is, that I have data of 20 Stationes, which all measure at different times but all in a 10 Min frequency. Now I want to calculate for each Station one single 10 Minute frequency. In my script (sorry I´m german, so my comments are german) I calculated this 10 Minute frequency for one station, with resulting in times 5, 15, 25, 35, 45, 55... I´m happy with that. BUT I got 59471 obs. with my new times and just 57486 obs. in my original data sheet. If I try to fit the new times to my data, my Measurements end on 2023-5-5 instead of 2023-5-18.

Hopefully someone can help me! :upside_down_face:

Aktivieren von "Readxl" ----

library(readxl)
install.packages("readxl")

install.packages("openxlsx")
library(openxlsx)

Leere Liste für angepasste Daten ----

angepassteDaten <- list()

Interation über Messstationen ----

for (station in 1) #Annahme der 20 Messststionen

Lade Ecxel-Datenblatt für aktuelle Messstation

data <- read_excel("C:\my file",
sheet = station)

#Bereinigung der Daten in Spalte Time
data$Time <- format(as.POSIXct(data$Time, format = "%H:%M:%S"), format = "%H:%M:%S")

Kombinieren von Datum und Zeit in Spalte "DateTime"

data$DateTime <- as.POSIXct(paste(data$Date, data$Time), format = "%Y-%m-%d %H:%M:%S")

Überprüfe und entferne ungültige Zeilen

data <- na.omit(data)

Überprüfung auf vorhandene Daten

if (nrow(data) > 0) {

Extraktion des Startzeit dür die aktuelle Station

startZeit <- min(data$DateTime)
endZeit <- max(data$DateTime)

Berechung Abstand zeischen den Messungen in Sekunden

messIntervall <- 10 * 60 # 10 Minuten in Sekunden

Erstellen einer Sequenz von Zeitpunkten, zu denen gemessen werden soll

messZeit <- seq(from = startZeit, to = endZeit, by = messIntervall, legth.out = nrow(data))

Runden der Messzeiten auf die nächstgelegenen 10-Minuten-Schritte

messZeitenGerundet <- round(messZeit, units = "mins")

Datenblatt für angepasste Daten

angepasstesBlatt <- data.frame(DateTime = messZeitenGerundet)

Speichern as XlSX

write.xlsx(angepasstesBlatt, file = "C:\ my file ")

}

Hi @Sophie_Tro
We don't have your data so its hard to check exactly what's happening, but try these changes (at first on just one station):

# Not needed
#data$Time <- format(as.POSIXct(data$Time, format = "%H:%M:%S"), format = "%H:%M:%S")

data$DateTime <- as.POSIXct(paste(data$Date, data$Time), format = "%Y-%m-%d %H:%M:%S")

which(is.na(data$DateTime))  # Find the position of any NAs in DateTime: equipment failure? 

# data <- na.omit(data)  # This will be shortening your dataframe if NAs exist

startZeit <- min(data$DateTime)
endZeit <- max(data$DateTime)

messIntervall <- 10 * 60 # 10 Minuten in Sekunden

# corrected 'legth.out' to 'length.out'
messZeit <- seq(from = startZeit, to = endZeit, by = messIntervall, length.out = nrow(data))

messZeitenGerundet <- round(messZeit, units = "mins")

angepasstesBlatt <- data.frame(DateTime = messZeitenGerundet)

library(tidyverse)
combined <- left_join(data, angepasstesBlatt, by=DateTime)
head(combined)  # See where the mis-matches occur.
1 Like

Hi,
thank you for your help, but the same mistake occurs, if I use the length.out = nrow(data), the obs fit. But not the measuring times. If I usw the start and end times, I get a lot more obs in my fitted DateTime. And If I use the last part and try to find missmatches, I only get a combined data sheet, where the times are not fitted but just the same as the original times, whilst the obs in angepasstesBlatt are fitte but too many. :thinking:

Did you find any NAs in the raw data?
I think I had the left_join around the wrong way. Try
combined <- left_join(angepasstesBlatt, data, by=DateTime)

1 Like

The left_join worked this time. I also found NAs, but far to many, as in the combined table, everything after 2022-04-07 10:05. Up to this point (2022-3-31 to 2022-4-7) I found NAs in between.
Edit: it seems like that some data is available, in between. But not much. Could this be due to the fact, that the measuring time changes in the originial data? It stays at a 10 Minute frequency. But changes from e.g. 10:50 to 10:37 for a few days

simply constructing 10min time series and attempting to join one of your stations measurements will surely only succeed when the station measurements were taken at the specific times (or at least to the minute, as you have 1 minute rounding applied) which is unlikely.
You probably need to be rounding your station timings to the nearest 5 minutes before trying to do do any joins

Example of the concept :

library(tidyverse)
(regular_series <- tibble(desired = seq(from=5,to=60,by=10)))

(odd_series <- tibble(odd=seq(from=3, to =60, by=10)))

(rounded_series <- mutate(odd_series,
                          round = round(odd/5,0)*5))

(fail_join <- left_join(regular_series,
                        odd_series,by=c("desired"="odd"),
                        keep=TRUE))

(success_join <- left_join(regular_series,
                           rounded_series,by=c("desired"="round"),
                           keep=TRUE))
``

Taking some time to provide a reprex will likely accelerate the degree to which forum users can assist you.

https://forum.posit.co/t/faq-how-to-do-a-minimal-reproducible-example-reprex-for-beginners/23061

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.