how to read time in the text file using readr

I use read_delim function of readr package to read a text file. The content of the text file is

9:00/ aaaaa
9:01/ bbbbb
9:04/ ccccc
9:07/ ddddd
12:06/ eeeee
12:13/ fffff
3:25/ ggggg

if I used the code
test1<-read_delim(paste0("./test1.txt"),delim="/",col_names=F)
then I get

test1

A tibble: 7 × 2

X1 X2

1 9:00 " aaaaa"
2 9:01 " bbbbb"
3 9:04 " ccccc"
4 9:07 " ddddd"
5 12:06 " eeeee"
6 12:13 " fffff"
7 3:25 " ggggg"

I want the first column type to be time. And I tried to use
test1a<-read_delim(paste0("./test1.txt"),delim="/",col_names=F,col_types = c("t","c"))

But I get a warning message
Warning message:
One or more parsing issues, see problems() for details

test1a

A tibble: 7 × 2

X1 X2

1 09:00 " aaaaa"
2 09:01 " bbbbb"
3 09:04 " ccccc"
4 09:07 " ddddd"
5 NA " eeeee"
6 12:13 " fffff"
7 03:25 " ggggg"

could someone help me on this question?

1 Like

With the given text file content you provided, this is working for me:

library(readr)
library(dplyr)

result <- read_delim('test_file.txt', delim = '/', col_names = FALSE) |>
  # import the time column as character, then transform using lubridate
  mutate(col1 = lubridate::hm(X1))
#> Rows: 7 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "/"
#> chr (2): X1, X2
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
result
#> # A tibble: 7 × 3
#>   X1    X2       col1      
#>   <chr> <chr>    <Period>  
#> 1 9:00  " aaaaa" 9H 0M 0S  
#> 2 9:01  " bbbbb" 9H 1M 0S  
#> 3 9:04  " ccccc" 9H 4M 0S  
#> 4 9:07  " ddddd" 9H 7M 0S  
#> 5 12:06 " eeeee" 12H 6M 0S 
#> 6 12:13 " fffff" 12H 13M 0S
#> 7 3:25  " ggggg" 3H 25M 0S

Created on 2022-08-21 by the reprex package (v2.0.1)

Kind regards

looks like read_delim cannot read 12:06 as time correctly. Is it a bug? I know I can use additional steps to handle it, but just wonder why the col_type setting is not working.
Actually if I remove the record of 12:06, then I read the first column as time type correctly.

Exactly, I don't know why this strange thing happens. I also tried to specify col_types = list(X1 = col_time(format = '%H:%M')) to use read_delim() only, but since this occurs I would recommend using lubridate instead.

1 Like

Copy-pasting your file contents from your first message, RStudio adds a red dot in front of 12:06.

And indeed:

utf8ToInt('5 12:06 " eeeee"')
#>  [1]    53    32 65279    49    50    58    48    54    32    34    32   101
#> [13]   101   101   101   101    34
utf8ToInt('5 12:06 " eeeee"')
#>  [1]  53  32  49  50  58  48  54  32  34  32 101 101 101 101 101  34

In the first one, note that 65,279, which is a zero-width space, so it is essentially invisible. Problem solved if you delete it.

Thank you very much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.