NA value using as.numeric function

I have a column in my table that when I convert to as.numeric it returns all the values in the column as NAs. The code that I am using is:

TFY2019$ride_length <- as.numeric(as.character(TFY2019$ride_length))
is.numeric(TFY2019$ride_length)

I really need to column to be of numeric value and hold on to the data in the table so that I can run stats on it. Does anyone have any suggestions?

P.s Merry Christmas and a Happy New Year to All!!

Could you post a small dput() of your dataset so I can understand your problem better?

structure(list(start_station_id = c(234, 296, 51, 66, 212), end_station_id = c(318,
117, 24, 212, 96), Membership = c("member", "member", "member",
"member", "member"), rideable_type = c("docked_bike", "docked_bike",
"docked_bike", "docked_bike", "docked_bike"), started_at = c("2020-01-30 14:22:39",
"2020-01-09 19:29:26", "2020-01-06 16:17:07", "2020-01-30 8:37:16",
"2020-01-10 12:33:05"), ended_at = c("2020-01-30 14:26:22", "2020-01-09 19:32:17",
"2020-01-06 16:25:56", "2020-01-30 8:42:48", "2020-01-10 12:37:54"
), ride_length = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), day = c("30", "09", "06", "30", "10"), month = c("01", "01",
"01", "01", "01"), year = c("2020", "2020", "2020", "2020", "2020"
), day_of_week = c("Thursday", "Thursday", "Monday", "Thursday",
"Friday")), row.names = c(NA, -5L), class = c("tbl_df", "tbl",
"data.frame"))

I had the ride_length column already calculated from excel but in r when i covert the column to as.numeric i loss all the values and it is returned as a NA.

ride_length is calculated as: started_at -ended_at.

The timestamps are being read as character strings that is why you get NA as result, try converting to a proper date-time format before calculating the difference.

library(tidyverse)
library(lubridate)

# Sample data, replace this with your own data frame
TFY2019 <- data.frame(
  stringsAsFactors = FALSE,
  start_station_id = c(234, 296, 51, 66, 212),
    end_station_id = c(318, 117, 24, 212, 96),
        Membership = c("member", "member", "member", "member", "member"),
     rideable_type = c("docked_bike","docked_bike",
                       "docked_bike","docked_bike","docked_bike"),
        started_at = c("2020-01-30 14:22:39",
                       "2020-01-09 19:29:26","2020-01-06 16:17:07",
                       "2020-01-30 8:37:16","2020-01-10 12:33:05"),
          ended_at = c("2020-01-30 14:26:22",
                       "2020-01-09 19:32:17","2020-01-06 16:25:56",
                       "2020-01-30 8:42:48","2020-01-10 12:37:54"),
       ride_length = c(NA, NA, NA, NA, NA),
               day = c("30", "09", "06", "30", "10"),
             month = c("01", "01", "01", "01", "01"),
              year = c("2020", "2020", "2020", "2020", "2020"),
       day_of_week = c("Thursday", "Thursday", "Monday", "Thursday", "Friday")
)

# Relevant code
TFY2019 %>% 
    mutate(across(ends_with("at"), ymd_hms),
           ride_length = started_at - ended_at)
#>   start_station_id end_station_id Membership rideable_type          started_at
#> 1              234            318     member   docked_bike 2020-01-30 14:22:39
#> 2              296            117     member   docked_bike 2020-01-09 19:29:26
#> 3               51             24     member   docked_bike 2020-01-06 16:17:07
#> 4               66            212     member   docked_bike 2020-01-30 08:37:16
#> 5              212             96     member   docked_bike 2020-01-10 12:33:05
#>              ended_at    ride_length day month year day_of_week
#> 1 2020-01-30 14:26:22 -3.716667 mins  30    01 2020    Thursday
#> 2 2020-01-09 19:32:17 -2.850000 mins  09    01 2020    Thursday
#> 3 2020-01-06 16:25:56 -8.816667 mins  06    01 2020      Monday
#> 4 2020-01-30 08:42:48 -5.533333 mins  30    01 2020    Thursday
#> 5 2020-01-10 12:37:54 -4.816667 mins  10    01 2020      Friday

Created on 2021-12-25 by the reprex package (v2.0.1)

Shouldn't ride_length be calculated as ended_at - started_at? If I do started_at -ended_at, I get negative ride_length, which doesn't make any sense...

Here is what I have:

library(tidyverse)
library(lubridate)

# df is the dput you provided
df %>% 
  # Convert started_at and ended_at columns to date-time
  mutate(started_at = ymd_hms(started_at),
         ended_at = ymd_hms(ended_at)) %>% 
  # Calculate ride_length
  mutate(ride_length = started_at - ended_at,
         # Convert to duration class for more intuitive output (seconds and minutes)
         ride_length = as.duration(ride_length)) 

For more information, have a look at 16 Dates and times | R for Data Science

yes it is supposed to be ended_at - starting act. Ill give this a shot.

Thanks so much

1 Like