Data type changes on upload into R

Hi all,

I'm a new R user and just joined the community. I am a problem which I believe I can get help from here.

I am working on a set of data file, about 12 excel files having similar datasets and data types. Each data files contain columns, 5 of which has character data type, 1 integer, and the last, time (37:30:55) data type in excel.

I have to upload all of these files onto R, to further my analysis. That means, I have to convert these into CSV format. Once I upload the converted CSV files, 2 of the files have the time data type changed to character which does affect my analysis progression.

I have tried to resave my data.frames with the correct data format, I keep getting the same problem.

Or do I work around this?

you can read excel files; readxl package does it well

1 Like

Is that 37 hours, 50 minutes and 55 seconds? See the FAQ: How to do a minimal reproducible example reprex for beginners for how to post questions that provide sufficient information to reproduce the problem.

So, 2 files convert 37:30:55 to "37:30:55" What do the other 10 convert it to? What typeof() do you want it to be? If it's a duration, see the {lubridate} package and follow the link there to the explainer on dates and times in R.

Hi Technocrat, thanks for the response.

I guess that's a time format that aggregates date and time information across days because it indicates that the hour (duration) goes beyond 24hrs.

I have tried to read through the lubricate package and everything seems to much information.

But I'll try over and over again as I need to break that code.

Hi Nir,

That's correct, I've been carried away. I'll just put that to use and see what the outcome is.

Shifting focus from how to what may help with f(x) = y:

x is what is at hand
y is what is sought
f is the function or chain of functions to convert x to y

Here x is an Excel spreadsheet imported to a data.frame object, one variable of which is a vector typeof character consisting of strings in the form 37:30:55.

Assume y is the number of seconds represented by 37 hours, 30 minutes and 55 seconds, or 14455 seconds, which is typeof numeric, which is wanted in a vector.

Proceeding stepwise

# fake some data
set.seed(42)
h <- sample(1:100,1e3,replace=TRUE)
m <- sample(1:60,1e3,replace=TRUE)
s <- sample(1:60,1e3,replace = TRUE)
chars <- paste(h,m,s,sep = ":")
v_chars <- strsplit(chars,":")
head(v_chars)
#> [[1]]
#> [1] "49" "54" "14"
#> 
#> [[2]]
#> [1] "65" "22" "17"
#> 
#> [[3]]
#> [1] "25" "46" "45"
#> 
#> [[4]]
#> [1] "74" "7"  "59"
#> 
#> [[5]]
#> [1] "100" "47"  "29" 
#> 
#> [[6]]
#> [1] "18" "55" "40"
# convert each element of chars to a vector of numerics
nums <- lapply(v_chars,as.numeric)
head(nums)
#> [[1]]
#> [1] 49 54 14
#> 
#> [[2]]
#> [1] 65 22 17
#> 
#> [[3]]
#> [1] 25 46 45
#> 
#> [[4]]
#> [1] 74  7 59
#> 
#> [[5]]
#> [1] 100  47  29
#> 
#> [[6]]
#> [1] 18 55 40
# convert to seconds
make_seconds <- function(x) {
  x = unlist(x)
  x[1]*360 + x[2]*60 + x[3]
}
result <- sapply(nums,make_seconds)
head(result)
#> [1] 20894 24737 11805 27119 38849  9820

Created on 2022-12-05 by the reprex package (v2.0.1)

1 Like

This is Wow! I'll take time out to study the
Is.

Meanwhile, I had to go through a painful route to get my data.frame in the format I wanted. It was an easy one but painfully long.

I had to break the whole data into bits of excel_97-2003 and then save into CSV before R can read the format I wanted.

I'll dedicate time to study, practice and implement these steps on another practice with same data sets.

Thanks Technocrat

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.