Converting the Following Code to DPLYR

I am working with the R programming language. I have following table :

age=18:29
height=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
gender=c("M","F","M","M","F","F","M","M","F","M","F","M")
testframe = data.frame(age=age,height=height,height2=height,gender=gender,gender2=gender)

head(testframe)

  age height height2 gender gender2
1  18   76.1    76.1      M       M
2  19   77.0    77.0      F       F
3  20   78.1    78.1      M       M
4  21   78.2    78.2      M       M
5  22   78.8    78.8      F       F
6  23   79.7    79.7      F       F

In the above table, I am want to remove columns that have identical entries but have different names. This can be done as follows (in Base R):

no_dup = testframe[!duplicated(as.list(testframe))]

 head(no_dup)
  age height gender
1  18   76.1      M
2  19   77.0      F
3  20   78.1      M
4  21   78.2      M
5  22   78.8      F
6  23   79.7      F

My Question: Does anyone know how to convert the above code testframe[!duplicated(as.list(testframe)) into "DPLYR" commands? Is this possible?

Thanks!

Hi,

That's a cool function, I had haver heard of that one before. I did a little research but couldn't find an integrated dplyr function that does this (distinct() only works for rows) .

You can however use the existing function within the dplyr verbs if you modify it a little. Here is what I came up with:

library(dplyr)

age=18:29
height=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
gender=c("M","F","M","M","F","F","M","M","F","M","F","M")
testframe = data.frame(age=age,height=height,height2=height,gender=gender,gender2=gender)

testframe %>% select(!which(duplicated(as.list(.))))
#>    age height gender
#> 1   18   76.1      M
#> 2   19   77.0      F
#> 3   20   78.1      M
#> 4   21   78.2      M
#> 5   22   78.8      F
#> 6   23   79.7      F
#> 7   24   79.9      M
#> 8   25   81.1      M
#> 9   26   81.2      F
#> 10  27   81.8      M
#> 11  28   82.8      F
#> 12  29   83.5      M

Created on 2021-12-11 by the reprex package (v2.0.1)

Differences with base R:

  • The . in a dplyr function refers to the whole dataframe
  • select() cannot take a l logical vector as a filter, so I used the which() function to convert the T, F to indices

Hope this helps,
PJ

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.