Duplicate column error while there is no duplicate

I used to run the simple filtering code below with no issues and all of a sudden it is throwing errors about duplicated names in the dataframe.

df <- df %>% dplyr::filter(as_of_date == '2019-03-31')

Error in dplyr::filter():
! Can't transform a data frame with duplicate names.
Run rlang::last_error() to see where the error occurred.

rlang::last_error()
<error/rlang_error>
Error in dplyr::filter():
! Can't transform a data frame with duplicate names.


Backtrace:

  1. df %>% ...
  2. dplyr:::filter.data.frame(., as_of_date == "2019-03-31")
    Run rlang::last_trace() to see the full context.

rlang::last_trace()
<error/rlang_error>
Error in dplyr::filter():
! Can't transform a data frame with duplicate names.


Backtrace:

  1. ├─OOT_binned_treat_XGB %>% ...
  2. ├─dplyr::filter(., as_of_date == "2019-03-31")
  3. └─dplyr:::filter.data.frame(., as_of_date == "2019-03-31")
  4. └─dplyr:::filter_rows(.data, ..., caller_env = caller_env())
  5. └─DataMask$new(.data, caller_env, "filter", error_call = error_call)
    
  6.   └─dplyr initialize(...)
    
  7.     └─rlang::abort(...)
    

I check the column names for duplicates and it returns column 486 but when I preview column 486 and the columns before and after it, I don't see any duplications. What is happening here?

which(duplicated(names(df)))
[1] 486

df[,484:487] %>% head()
as_of_date cust_num Covid_Deferral_flag lease_remaining_woe
1 2019-01-31 2125922 0 -0.179584056059164
2 2019-02-28 2125922 0 -0.179584056059164
3 2019-03-31 2125922 0 -0.179584056059164
4 2019-01-31 2125946 0 -0.649132439071706
5 2019-02-28 2125946 0 -0.649132439071706
6 2019-03-31 2125946 0 -0.649132439071706

Your final head() output is showing only 3 column names while you ask for 4.
Can you paste a dput(head(df)) or just dput(names(df) so we can inspect it?

The spacing is messed up. There are 4 columns:
as_of_date
cust_num
Covid_Deferral_flag
lease_remaining_woe

Here is the output for the same columns:

dput(names(df))[484:487]
[1] "as_of_date" "cust_num" "Covid_Deferral_flag" "lease_remaining_woe"

How about

which(names(df)=='Covid_Deferral_flag')

Found two columns!

which(names(OOT_binned_treat_XGB)=='Covid_Deferral_flag')
[1] 69 486

Thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.