How to add data into all rows ?

Hi All,

I want to add "2020/01' to all the datasets so that I can calculate the time difference with library(lubridate) ymd_hms function.

The below is test row data but original data contains "date HH:MM:SS" and I want to calculate the time difference for those data. Since right now i can't find the function that allows to calculate with this format D HH:MM:SS, I want to add YYYY/MM/ so that i can use ymd_hms to get the time difference.

Any ideas ?
my.data <- c("1 03:08:09","2 18:24:39","3 19:19:5","4 04:13:02","5 04:10:26","6 04:10:26")

You can use the paste0() function.

my.data <- c("1 03:08:09","2 18:24:39","3 19:19:5","4 04:13:02","5 04:10:26","6 04:10:26")
my.data <- paste0("2020/01/", my.data)
lubridate::ymd_hms(my.data)
[1] "2020-01-01 03:08:09 UTC" "2020-01-02 18:24:39 UTC" "2020-01-03 19:19:05 UTC"
[4] "2020-01-04 04:13:02 UTC" "2020-01-05 04:10:26 UTC" "2020-01-06 04:10:26 UTC"

Thanks for the tips. However my original row data is data matrix that paste function do not seem to work.....

|time1|time2|time3|time4|
|01 02:52:12|01 03:00:25|01 03:08:09|01 03:35:20|
|13 15:58:57|13 17:56:45|13 18:24:39|13 18:57:01|
|17 19:02:37|17 19:12:51|17 23:19:59|18 0:31:09|
|12 04:01:57|12 04:08:18|12 04:13:02|12 04:37:57|

Above is the sample data and currently they are character type not numeric so I also need to change to numeric but as.numeric brings to NA for some reasons.

Do you have some ideas how we can
1 convert to numeric
2 add data to get time difference (we do not have to need to add data 2020/01/ if we can get time difference with the above datasets. But since some data the date is next day and brings negative value if we subtract the data.)

Thank you for your advice!

I can highly recommend the lubridate package (which is part of the tidyverse installation, but requires a separate loading with library(lubridate).

If you want to know more about this package then either type vignette("lubridate") in your console or visit the following site (see menu Intro)

I think you might find sections Parsing dates and times and Arithmetic with date times useful.

If the data are in a matrix, I would change it to a data frame to do the transformations. You can change it back to a matrix if you need the final object to be a matrix.
For the subtraction, do you need to subtract a single value from every value or is a different value used for every case?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
#Make data
VALS <- c('01 02:52:12','01 03:00:25','01 03:08:09','01 03:35:20',
  '13 15:58:57','13 17:56:45','13 18:24:39','13 18:57:01',
  '17 19:02:37','17 19:12:51','17 23:19:59','18 0:31:09',
  '12 04:01:57','12 04:08:18','12 04:13:02','12 04:37:57')
MAT <- matrix(VALS, nrow = 4, byrow = TRUE)
MAT
#>      [,1]          [,2]          [,3]          [,4]         
#> [1,] "01 02:52:12" "01 03:00:25" "01 03:08:09" "01 03:35:20"
#> [2,] "13 15:58:57" "13 17:56:45" "13 18:24:39" "13 18:57:01"
#> [3,] "17 19:02:37" "17 19:12:51" "17 23:19:59" "18 0:31:09" 
#> [4,] "12 04:01:57" "12 04:08:18" "12 04:13:02" "12 04:37:57"

#Change to POSIXct
DF <- as.data.frame(MAT)
DF <- DF %>% mutate(across(.f = function(V) ymd_hms(paste0("2020/01/", V))))

DF
#>                    V1                  V2                  V3
#> 1 2020-01-01 02:52:12 2020-01-01 03:00:25 2020-01-01 03:08:09
#> 2 2020-01-13 15:58:57 2020-01-13 17:56:45 2020-01-13 18:24:39
#> 3 2020-01-17 19:02:37 2020-01-17 19:12:51 2020-01-17 23:19:59
#> 4 2020-01-12 04:01:57 2020-01-12 04:08:18 2020-01-12 04:13:02
#>                    V4
#> 1 2020-01-01 03:35:20
#> 2 2020-01-13 18:57:01
#> 3 2020-01-18 00:31:09
#> 4 2020-01-12 04:37:57

Created on 2021-02-11 by the reprex package (v0.3.0)

1 Like

Thank you for the advise. But since the original data itself has more than 10k rows I do not think it would be ideal to do transformations.....

For the subtraction I am trying to subtract a different value for every case with combination of time1 , time2, time3 and time4.

One way I was trying to achieve on transforming to num is change all columns to num by using below
#get data
time <- read.csv("pasting original excel file name"
#remove date
time$time1=substr(time$time1,4,11)
#change to num
time$time1 <- as_hms(time$time2)
and do the same for all columns (time1 till time4)
and then subtract and this brings difference but the units is seconds and also they do not consider when the date has changed as we removed date on the first place.

Do you have any suggestions or ideas?

Thank you!!

Having 10000 rows is not a problem. The code below invents some data with four columns and 12000 rows. It runs with no problem on my little laptop. Give it a try.

Days <- sample(1:14, size = 48000, replace = TRUE)
Hours <- sample(0:23, size = 48000, replace = TRUE)
Minutes <- sample(0:59, size = 48000, replace = TRUE)
Seconds <-  sample(0:59, size = 48000, replace = TRUE)
TIME <- paste(Hours,Minutes, Seconds, sep = ":")

DataVec <- paste(Days, TIME)
head(DataVec)
MAT <- matrix(DataVec, ncol = 4)
DF <- as.data.frame(MAT)
head(DF)
library(dplyr)
library(lubridate)
DF <- DF %>% mutate(across(.f = function(V) ymd_hms(paste0("2020/01/", V))))
head(DF)