Stuck on splitting column into two columns in dataframe in RStudio

image

I am so stuck trying to split a column in an RStudio dataframe into two columns. In the attached screenshot, I'm trying to split column "Date_Time_UTC" (example: 1/1/2015 11:00) into a separate column for the date (ex. 1/1/2015) and time (ex. 11:00). This could be done by splitting the entire column at the space in between the date and time, but I cannot for the life of me figure out how to do this.

I would also be okay with modifying this column to just delete everything after the space (so that only the date remains), but this is less ideal.

ANY HELP would be so much appreciated!!
Thanks,
Kelly

Hi, Kelly. Welcome to the RStudio Community!

It might be overkill, but the tidyr package has a very handy separate() function to split data in variables. Assuming your Date_Time_UTC variable is just a character string, you can split it into parts like this:

tidyr::separate(
  data = df_temperature, 
  col = Date_Time_UTC, 
  sep = " ", 
  into = c("date", "time")
)

Alternately, tidyr has an extract() function that lets you use a regular expression groups to parse a column:

tidyr::extract(
  data = df_temperature, 
  col = Date_Time_UTC, 
  into = c("date", "time"), 
  "(\\S{10}) (\\S{5})"
)

Hope this gets you going in the right direction.

-- Robert

It's worthwhile using {lubridate} for all datetime work

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
meas <- ymd("2021-12-11") + hours(11)
the_date <- format_ISO8601(meas, usetz=FALSE, precision="ymd")
the_hour <- hour(meas)
meas
#> [1] "2021-12-11 11:00:00 UTC"
the_date
#> [1] "2021-12-11"
the_hour
#> [1] 11

Thank you so much, Robert, for your quick help! I am clearly new to learning R and R Studio :slight_smile: I tried the solution that you suggested using the separate function (although I changed the data = field to "data = df_temperature_2015" to match the dataframe name), and received this error:

Error in [.data.frame(x, x_vars) : undefined columns selected

I did originally rename the column names in this dataframe (it was a generic string of characters before I changed to "Date_Time_UTC").

Btw, I received the same error when trying the extract function.

Kelly

Woah, thank you technocrat! I was not aware of this library and this would probably be useful in translating the character string that is currently in my column "Date_Time_UTC" into a numeric / date value. As an end goal, I am trying to plot these temperature measurements over time but cannot do so currently with the date and time column in the format that it is. Would you be willing to advise how I might apply some of the functions from the lubridate library to the entire "Date_Time_UTC" column, using perhaps mutate() to paste the reformatted data into new columns?

Thanks,
Kelly

The reprex below illustrates how to add variables to a dataframe that reformat presentation of the date time string, in one case, and convert the hour component to integer on the other. It also shows that ggplot can deal with datetime objects as an axis without the need to convert from datetime to an integer.

suppressPackageStartupMessages({
  library(lubridate)
  library(ggplot2)
})
DF <- structure(list(
  Email = c(
    "email1@gmail.com", "email2@gmail.com", "email2@gmail.com",
    "email3@gmail.com", "email3@gmail.com"
  ), Join.Time = structure(c(
    as.POSIXct("2020-12-09 13:04:00"),
    as.POSIXct("2020-12-10 13:20:00"), as.POSIXct("2020-12-11 13:30:00"), as.POSIXct("2020-12-12 13:07:00"),
    as.POSIXct("2020-12-09 13:29:00")
  ), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"),
  Leave.Time = structure(c(
    as.POSIXct("2020-12-09 13:25:00"), as.POSIXct("2020-12-11 13:22:00"),
    as.POSIXct("2020-12-10 14:01:00"), as.POSIXct("2020-12-12 13:29:00"), as.POSIXct("2020-12-09 14:33:00")
  ),
  class = c("POSIXct", "POSIXt"), tzone = "America/New_York"
  )
),
.Names = c("Email", "Join_Time", "Leave_Time"),
row.names = c(NA, -5L), class = "data.frame"
)

begin <- floor_date(min(DF[,2]) - hours(1), unit = "hour")
end   <- floor_date(max(DF[,3]) + hours(1), unit = "hour")

DF
#>              Email           Join_Time          Leave_Time
#> 1 email1@gmail.com 2020-12-09 16:04:00 2020-12-09 16:25:00
#> 2 email2@gmail.com 2020-12-10 16:20:00 2020-12-11 16:22:00
#> 3 email2@gmail.com 2020-12-11 16:30:00 2020-12-10 17:01:00
#> 4 email3@gmail.com 2020-12-12 16:07:00 2020-12-12 16:29:00
#> 5 email3@gmail.com 2020-12-09 16:29:00 2020-12-09 17:33:00

DF$Engaged <- DF$Leave_Time - DF$Join_Time
DF$Date_Only <- format_ISO8601(DF$Join_Time, usetz=FALSE, precision="ymd")
DF$Hour_Only <- hour(DF$Leave_Time)

DF
#>              Email           Join_Time          Leave_Time    Engaged
#> 1 email1@gmail.com 2020-12-09 16:04:00 2020-12-09 16:25:00    21 mins
#> 2 email2@gmail.com 2020-12-10 16:20:00 2020-12-11 16:22:00  1442 mins
#> 3 email2@gmail.com 2020-12-11 16:30:00 2020-12-10 17:01:00 -1409 mins
#> 4 email3@gmail.com 2020-12-12 16:07:00 2020-12-12 16:29:00    22 mins
#> 5 email3@gmail.com 2020-12-09 16:29:00 2020-12-09 17:33:00    64 mins
#>    Date_Only Hour_Only
#> 1 2020-12-09        16
#> 2 2020-12-10        16
#> 3 2020-12-11        17
#> 4 2020-12-12        16
#> 5 2020-12-09        17

p <- ggplot(DF,aes(Date_Only,Engaged))
p + geom_point() +
    theme_minimal()
#> Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.

1 Like

Kelly, here's a more complete example of my original thought on splitting columns generally — though @technocrat is right that it makes more sense to familiarize yourself with lubridate for work with dates and times.

df_temperature_2015 <- data.frame(
  Date_Time_UTC = c(
    as.POSIXct("2020-12-09 13:04:00"),
    as.POSIXct("2020-12-10 13:20:00"), 
    as.POSIXct("2020-12-11 13:30:00"))
)

tidyr::separate(
  data = df_temperature_2015, 
  col = Date_Time_UTC, 
  sep = " ", 
  into = c("date", "time")
)

        date     time
1 2020-12-09 13:04:00
2 2020-12-10 13:20:00
3 2020-12-11 13:30:00
2 Likes

Thank you again technocrat for your help!! I learned a lot by trying out these code examples and newly downloading the lubridate package :slight_smile: take care!!

Kelly

1 Like

Thank you, Robert!! I was able to wrap up my project analysis last week and I so appreciate your help and quick responses. I couldn't have done it without you! Happy coding :slightly_smiling_face:

Kelly

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.