I am working with the R programming language. I have following table :
age=18:29
height=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
gender=c("M","F","M","M","F","F","M","M","F","M","F","M")
testframe = data.frame(age=age,height=height,height2=height,gender=gender,gender2=gender)
head(testframe)
age height height2 gender gender2
1 18 76.1 76.1 M M
2 19 77.0 77.0 F F
3 20 78.1 78.1 M M
4 21 78.2 78.2 M M
5 22 78.8 78.8 F F
6 23 79.7 79.7 F F
In the above table, I am want to remove columns that have identical entries but have different names. This can be done as follows (in Base R):
no_dup = testframe[!duplicated(as.list(testframe))]
head(no_dup)
age height gender
1 18 76.1 M
2 19 77.0 F
3 20 78.1 M
4 21 78.2 M
5 22 78.8 F
6 23 79.7 F
My Question: Does anyone know how to convert the above code testframe[!duplicated(as.list(testframe)) into "DPLYR" commands? Is this possible?
That's a cool function, I had haver heard of that one before. I did a little research but couldn't find an integrated dplyr function that does this (distinct() only works for rows) .
You can however use the existing function within the dplyr verbs if you modify it a little. Here is what I came up with:
library(dplyr)
age=18:29
height=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
gender=c("M","F","M","M","F","F","M","M","F","M","F","M")
testframe = data.frame(age=age,height=height,height2=height,gender=gender,gender2=gender)
testframe %>% select(!which(duplicated(as.list(.))))
#> age height gender
#> 1 18 76.1 M
#> 2 19 77.0 F
#> 3 20 78.1 M
#> 4 21 78.2 M
#> 5 22 78.8 F
#> 6 23 79.7 F
#> 7 24 79.9 M
#> 8 25 81.1 M
#> 9 26 81.2 F
#> 10 27 81.8 M
#> 11 28 82.8 F
#> 12 29 83.5 M