Duplicate column error while there is no duplicate

amirsaman · April 29, 2022, 5:48pm

I used to run the simple filtering code below with no issues and all of a sudden it is throwing errors about duplicated names in the dataframe.

df <- df %>% dplyr::filter(as_of_date == '2019-03-31')

Error in dplyr::filter():
! Can't transform a data frame with duplicate names.
Run rlang::last_error() to see where the error occurred.

rlang::last_error()
<error/rlang_error>
Error in dplyr::filter():
! Can't transform a data frame with duplicate names.

Backtrace:

df %>% ...
dplyr:::filter.data.frame(., as_of_date == "2019-03-31")
Run rlang::last_trace() to see the full context.

rlang::last_trace()
<error/rlang_error>
Error in dplyr::filter():
! Can't transform a data frame with duplicate names.

Backtrace:
▆

├─OOT_binned_treat_XGB %>% ...
├─dplyr::filter(., as_of_date == "2019-03-31")
└─dplyr:::filter.data.frame(., as_of_date == "2019-03-31")
└─dplyr:::filter_rows(.data, ..., caller_env = caller_env())

└─DataMask$new(.data, caller_env, "filter", error_call = error_call)

```
  └─dplyr initialize(...)
```
```
    └─rlang::abort(...)
```

I check the column names for duplicates and it returns column 486 but when I preview column 486 and the columns before and after it, I don't see any duplications. What is happening here?

which(duplicated(names(df)))
[1] 486

df[,484:487] %>% head()
as_of_date cust_num Covid_Deferral_flag lease_remaining_woe
1 2019-01-31 2125922 0 -0.179584056059164
2 2019-02-28 2125922 0 -0.179584056059164
3 2019-03-31 2125922 0 -0.179584056059164
4 2019-01-31 2125946 0 -0.649132439071706
5 2019-02-28 2125946 0 -0.649132439071706
6 2019-03-31 2125946 0 -0.649132439071706

Sanjmeh · April 29, 2022, 6:17pm

Your final head() output is showing only 3 column names while you ask for 4.
Can you paste a dput(head(df)) or just dput(names(df) so we can inspect it?

amirsaman · April 29, 2022, 6:21pm

The spacing is messed up. There are 4 columns:
as_of_date
cust_num
Covid_Deferral_flag
lease_remaining_woe

Here is the output for the same columns:

dput(names(df))[484:487]
[1] "as_of_date" "cust_num" "Covid_Deferral_flag" "lease_remaining_woe"

nirgrahamuk · April 29, 2022, 9:43pm

How about

which(names(df)=='Covid_Deferral_flag')

amirsaman · April 29, 2022, 9:52pm

Found two columns!

which(names(OOT_binned_treat_XGB)=='Covid_Deferral_flag')
[1] 69 486

Thank you!

system · May 6, 2022, 9:53pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.