Subsetting vs dplyr filter - getting different results

Hi,

I've a very specific issue that for speed reasons I'd like to understand why I'm getting different results for what seem to be equivalent for me (at least in my mind). I have a for loop like below;

for (i in WL_Admissions_subset$WL_admissions) {
                                    date_row <- date_row + 1
                                    end_num <- nrow(Waiting_subset)
                                    admit_date <- WL_Admissions_subset$Date[date_row]
                                    if(date_row == 1) suitable_patients <- which(Waiting_subset$Date <= admit_date)
                                    else suitable_patients <- which(Waiting_subset$Date < admit_date)
                                    admitted_patients <- suitable_patients[0:i]
                                    Waiting_subset <- Waiting_subset %>% filter(row_number() %notin% admitted_patients)

Trying to convert this to Base R leads to a error of too much filtering and ending up with too small a result set;

 for (i in WL_Admissions_subset$WL_admissions) {
                                    date_row <- date_row + 1
                                    end_num <- nrow(Waiting_subset)
                                    admit_date <- WL_Admissions_subset$Date[date_row]
                                    if(date_row == 1) suitable_patients <- which(Waiting_subset$Date <= admit_date)
                                    else suitable_patients <- which(Waiting_subset$Date < admit_date)
                                    admitted_patients <- suitable_patients[0:i]
                                    admitted_patients <- replace_na(admitted_patients, 0)
                                    Waiting_subset <- Waiting_subset[-admitted_patients,]

Does anyone have any ideas where I'm going wrong? I've tried;

admitted_patients <- admitted_patients[-which(is.na(admitted_patients))]

as an intermediate step too but it's producing the same issue (filtering out too much on each loop.

Sometimes there can be attempts to admit more patients than there is waiting hence the need to address the NA's before the waiting_subset dataframe is subset by using -admitted_patients.

In brief testing the base R version is significantly faster hence my interest in getting it working (otherwise I'd stay within tidyverse syntax.

Hello.
Thanks for providing code , but you could take further steps to make it more convenient for other forum users to help you.

Share some representative data that will enable your code to run and show the problematic behaviour.

You might use tools such as the library datapasta, or the base function dput() to share a portion of data in code form, i.e. that can be copied from forum and pasted to R session.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.