Removing duplicate values

Hi,
I am using the distinctive() function to remove duplicate rows in my dataframe. Although when I use a similar function in excel it is removing a different number of duplicates than in rstudio. Would anyone know why this is happening or which program is more accurate to use in this scenario?

Welcome to the community!

If I want to remove duplicate rows from a dataframe in R, I use base::unique.

I'm not aware of a function called distinctive in base R, and a quick Google search was in vain. Can you please mention the package where does this function come from?

It'll also be very helpful if you please share a small part of the data set (say df), and different results you obtain (say df_excel and df_R) in a copy-paste friendly format.

In case you don't know how to do it, there are many options, which include:

  1. If you have stored the data set in some R object, dput function is very handy.

  2. In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

Thank you for your reply!

Distinctive comes from the package tidyverse. Although, I just used base::unique to remove duplicate rows and it came up with the same result as distinctive.

Unfortunately I cannot share a small part of the data set. Although, if it helps I am working with a very large data set around 127000 rows and 44 columns. When I complete the removal of duplicates in excel it leaves me with 101275 rows but whereas when I complete it in Rstudio it leaves me with 101636 rows.
I cannot work out which one is correct. I understand if it is too difficult to help without the data set.

Thanks

Not quite true, the function is called distinct() not "distinctive" and comes from dplyr package, which is part of the tidyverse.

About your issue, if I was you, I would import the unique values from excel and perform an anti-join with the result of distinct() in R, that way I could take a look to the difference.

Sorry I am new to R so still trying to understand all the terms and everything!

Thanks so much for your help! Using anti-join worked perfectly and I worked out where I went wrong.

Thanks again!

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.