Understanding dropping rows(Newbie)

HI,
I am currently working on a capstone project and I ran into an issue during the course of this.

Rdoubt

The ride length was in negative and I wanted to get rid of the rows which had ride length in negative or above 1440.
To do that, I ran the following code :

tripdata <- tripdata[!tripdata$ride_length<0, ] ## Removing any negative ride duration
tripdata <- tripdata[!tripdata$ridelength>1440, ] ## Removing rides longer than 1 day for better scaling

On doing so, my data frame went blank.

Could you let me know where I have gone wrong?

Thank You!

Thank You!

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

Here's the reprex:

## Packages
library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test

## Database available at:https://divvy-tripdata.s3.amazonaws.com/index.html

trial <- tibble::tribble(
                     ~ride_id,  ~rideable_type,        ~started_at,          ~ended_at,              ~start_station_name, ~start_station_id,                ~end_station_name, ~end_station_id,  ~start_lat,   ~start_lng,    ~end_lat,     ~end_lng, ~member_casual,
           "F79335E3A77A57B5", "electric_bike", "29-03-2021 15:41", "29-03-2021 15:41", "Ashland Ave & Belle Plaine Ave",           "13249", "Ashland Ave & Belle Plaine Ave",         "13249",   41.956133,   -87.668981, 41.95614267, -87.66898483,       "member",
           "37261AB193F57280", "electric_bike", "29-03-2021 15:41", "29-03-2021 16:33", "Ashland Ave & Belle Plaine Ave",           "13249", "Ashland Ave & Belle Plaine Ave",         "13249", 41.95606717, -87.66889567,   41.956158, -87.66866283,       "member",
           "9A548550223D5776",  "classic_bike", "08-03-2021 12:12", "08-03-2021 12:44",     "Wilton Ave & Diversey Pkwy",    "TA1306000014",         "Clark St & Drummond Pl",  "TA1307000142",   41.932418,   -87.652705,   41.931248,   -87.644336,       "member",
           "5340C283D8D928D2",  "classic_bike", "26-03-2021 17:04", "26-03-2021 17:10",     "Wilton Ave & Diversey Pkwy",    "TA1306000014",         "Clark St & Drummond Pl",  "TA1307000142",   41.932418,   -87.652705,   41.931248,   -87.644336,       "member",
           "E0D5E06CBC889EE8",  "classic_bike", "18-03-2021 23:36", "18-03-2021 23:57",         "Clark St & Drummond Pl",    "TA1307000142",         "Clark St & Drummond Pl",  "TA1307000142",   41.931248,   -87.644336,   41.931248,   -87.644336,       "member"
           )

## Converting to ymd-hms

trial$started_at <- ymd_hms(trial$started_at)
trial$ended_at <- ymd_hms(trial$ended_at)

## Creating a column for ride time in minutes

trial$ridelength <- (as.double(difftime(trial$ended_at , trial$started_at)))/60 

## Removing any negative ride times

trial <- trial[!trial$ridelength<1, ]
head(trial)
#> # A tibble: 0 x 14
#> # ... with 14 variables: ride_id <chr>, rideable_type <chr>, started_at <dttm>,
#> #   ended_at <dttm>, start_station_name <chr>, start_station_id <chr>,
#> #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
#> #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>,
#> #   ridelength <dbl>

I see several problems here.

First is that I don't think that any of the trial$ridelength values are negative. For example, if I inspect the ridelength column before attempting to remove non-negatives, I get this:

trial$ridelength
[1] 0.0000000 0.8666667 0.5333333 0.1000000 0.3500000

As you say, you are calculating trip differences in minutes and no trial lasted longer than a minute.

Second, your data frame is blank at the end because this line:

trial <- trial[trial$ridelength > 1, ]

is trying to find all rows where trial$ridelength is greater than one and since you don't have any you get a blank data frame.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.