Hello tidyverse community,
I was wondering what is the best way to find a set of unique string values among multiple columns. These columns may have identical strings so just want a summary of all unique strings among these columns. I am thinking that I may need to use the map function to get a list of vectors and then reduce to one big vector?
Thanks for the guidance!
tiernan
December 5, 2017, 10:28pm
2
How about passing the columns to flatten_chr() %>% unique()
?
library(tidyverse) # devtools::install_github('tidyverse/tidyverse')
strings <- starwars$name
head(strings)
#> [1] "Luke Skywalker" "C-3PO" "R2-D2" "Darth Vader"
#> [5] "Leia Organa" "Owen Lars"
set.seed(12345)
tbl <- tibble(COL_A = sample(strings, 10, replace = TRUE),
COL_B = sample(strings, 10, replace = TRUE),
COL_C = sample(strings, 10, replace = TRUE))
print(tbl)
#> # A tibble: 10 x 3
#> COL_A COL_B COL_C
#> <chr> <chr> <chr>
#> 1 Dormé Darth Vader Quarsh Panaka
#> 2 Grievous Han Solo Wicket Systri Warrick
#> 3 Zam Wesell Bail Prestor Organa Poe Dameron
#> 4 Tarfful Luke Skywalker Barriss Offee
#> 5 Quarsh Panaka Roos Tarpals Gregar Typho
#> 6 Greedo Shmi Skywalker Jar Jar Binks
#> 7 Wicket Systri Warrick Jar Jar Binks Luminara Unduli
#> 8 Dud Bolt Rugor Nass Mace Windu
#> 9 Dooku Jabba Desilijic Tiure Palpatine
#> 10 Padmé Amidala Rey Bib Fortuna
unique_strings <- tbl %>% flatten_chr() %>% unique()
unique_strings
#> [1] "Dormé" "Grievous"
#> [3] "Zam Wesell" "Tarfful"
#> [5] "Quarsh Panaka" "Greedo"
#> [7] "Wicket Systri Warrick" "Dud Bolt"
#> [9] "Dooku" "Padmé Amidala"
#> [11] "Darth Vader" "Han Solo"
#> [13] "Bail Prestor Organa" "Luke Skywalker"
#> [15] "Roos Tarpals" "Shmi Skywalker"
#> [17] "Jar Jar Binks" "Rugor Nass"
#> [19] "Jabba Desilijic Tiure" "Rey"
#> [21] "Poe Dameron" "Barriss Offee"
#> [23] "Gregar Typho" "Luminara Unduli"
#> [25] "Mace Windu" "Palpatine"
#> [27] "Bib Fortuna"
3 Likes
What do you mean with "among multiple columns"?
On more then one column on the same time (by column)? By row? Or overall? Other?
Anyway, here below my proposal for the first three possible interpretation (using the example, expanded to 100 row, by @tiernan )
library(tidyverse) # devtools::install_github('tidyverse/tidyverse')
#> -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 2.2.1 v purrr 0.2.4
#> v tibble 1.3.4 v dplyr 0.7.4
#> v tidyr 0.7.2 v stringr 1.2.0
#> v readr 1.1.1 v forcats 0.2.0
#> -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
strings <- starwars$name
head(strings)
#> [1] "Luke Skywalker" "C-3PO" "R2-D2" "Darth Vader"
#> [5] "Leia Organa" "Owen Lars"
set.seed(12345)
tbl <- tibble(COL_A = sample(strings, 100, replace = TRUE),
COL_B = sample(strings, 100, replace = TRUE),
COL_C = sample(strings, 100, replace = TRUE))
tbl
#> # A tibble: 100 x 3
#> COL_A COL_B COL_C
#> <chr> <chr> <chr>
#> 1 Dormé Ackbar Adi Gallia
#> 2 Grievous Yarael Poof Tarfful
#> 3 Zam Wesell BB8 Anakin Skywalker
#> 4 Tarfful Yarael Poof Dud Bolt
#> 5 Quarsh Panaka Gasgano Cordé
#> 6 Greedo Raymus Antilles Zam Wesell
#> 7 Wicket Systri Warrick Mas Amedda Biggs Darklighter
#> 8 Dud Bolt Shaak Ti Owen Lars
#> 9 Dooku IG-88 Ki-Adi-Mundi
#> 10 Padmé Amidala Yoda Gregar Typho
#> # ... with 90 more rows
# By column
map(tbl, unique) # a list
#> $COL_A
#> [1] "Dormé" "Grievous"
#> [3] "Zam Wesell" "Tarfful"
#> [5] "Quarsh Panaka" "Greedo"
#> [7] "Wicket Systri Warrick" "Dud Bolt"
#> [9] "Dooku" "Padmé Amidala"
#> [11] "Darth Vader" "Han Solo"
#> [13] "Bail Prestor Organa" "Luke Skywalker"
#> [15] "Roos Tarpals" "Shmi Skywalker"
#> [17] "Jar Jar Binks" "Rugor Nass"
#> [19] "Jabba Desilijic Tiure" "Rey"
#> [21] "Poe Dameron" "Barriss Offee"
#> [23] "Gregar Typho" "Luminara Unduli"
#> [25] "Mace Windu" "Palpatine"
#> [27] "Bib Fortuna" "Lama Su"
#> [29] "Wedge Antilles" "Poggle the Lesser"
#> [31] "Finis Valorum" "Nute Gunray"
#> [33] "Shaak Ti" "Raymus Antilles"
#> [35] "Yarael Poof" "Wilhuff Tarkin"
#> [37] "Watto" "Tion Medon"
#> [39] "Dexter Jettster" "Bossk"
#> [41] "Arvel Crynyd" "Owen Lars"
#> [43] "Leia Organa" "Plo Koon"
#> [45] "Ratts Tyerell" "Yoda"
#> [47] "Ayla Secura" "Beru Whitesun lars"
#> [49] "Boba Fett" "Captain Phasma"
#> [51] "Jango Fett" "Saesee Tiin"
#> [53] "Nien Nunb" "C-3PO"
#> [55] "Chewbacca" "Mon Mothma"
#> [57] "Taun We" "Jocasta Nu"
#> [59] "Biggs Darklighter"
#>
#> $COL_B
#> [1] "Ackbar" "Yarael Poof"
#> [3] "BB8" "Gasgano"
#> [5] "Raymus Antilles" "Mas Amedda"
#> [7] "Shaak Ti" "IG-88"
#> [9] "Yoda" "Jar Jar Binks"
#> [11] "Jango Fett" "Taun We"
#> [13] "Captain Phasma" "Adi Gallia"
#> [15] "Luke Skywalker" "Arvel Crynyd"
#> [17] "Ki-Adi-Mundi" "Dud Bolt"
#> [19] "R5-D4" "Poggle the Lesser"
#> [21] "Barriss Offee" "Rey"
#> [23] "Wedge Antilles" "Lobot"
#> [25] "Dexter Jettster" "R2-D2"
#> [27] "Cliegg Lars" "Jocasta Nu"
#> [29] "Cordé" "Grievous"
#> [31] "Luminara Unduli" "Biggs Darklighter"
#> [33] "Plo Koon" "Chewbacca"
#> [35] "Poe Dameron" "Padmé Amidala"
#> [37] "Ayla Secura" "Tion Medon"
#> [39] "Finis Valorum" "Ratts Tyerell"
#> [41] "Qui-Gon Jinn" "Lama Su"
#> [43] "Watto" "Wat Tambor"
#> [45] "Saesee Tiin" "Bail Prestor Organa"
#> [47] "San Hill" "Bib Fortuna"
#> [49] "Jabba Desilijic Tiure" "R4-P17"
#> [51] "Roos Tarpals" "Mon Mothma"
#> [53] "Greedo" "Rugor Nass"
#> [55] "Jek Tono Porkins" "Bossk"
#> [57] "Finn" "Han Solo"
#> [59] "Tarfful" "Lando Calrissian"
#> [61] "Quarsh Panaka"
#>
#> $COL_C
#> [1] "Adi Gallia" "Tarfful"
#> [3] "Anakin Skywalker" "Dud Bolt"
#> [5] "Cordé" "Zam Wesell"
#> [7] "Biggs Darklighter" "Owen Lars"
#> [9] "Ki-Adi-Mundi" "Gregar Typho"
#> [11] "Ackbar" "Luminara Unduli"
#> [13] "Finis Valorum" "Bossk"
#> [15] "Finn" "Watto"
#> [17] "Wicket Systri Warrick" "Ben Quadinaros"
#> [19] "Plo Koon" "Greedo"
#> [21] "Obi-Wan Kenobi" "Barriss Offee"
#> [23] "Raymus Antilles" "Shmi Skywalker"
#> [25] "Saesee Tiin" "Mon Mothma"
#> [27] "Kit Fisto" "Darth Maul"
#> [29] "Dooku" "Captain Phasma"
#> [31] "Sly Moore" "Jar Jar Binks"
#> [33] "Rey" "Lama Su"
#> [35] "Cliegg Lars" "Nien Nunb"
#> [37] "Poe Dameron" "R2-D2"
#> [39] "Grievous" "Wat Tambor"
#> [41] "C-3PO" "Jabba Desilijic Tiure"
#> [43] "Ratts Tyerell" "Jocasta Nu"
#> [45] "R5-D4" "Taun We"
#> [47] "Padmé Amidala" "Chewbacca"
#> [49] "Arvel Crynyd" "Han Solo"
#> [51] "Jango Fett" "Roos Tarpals"
#> [53] "Mace Windu" "San Hill"
#> [55] "Wilhuff Tarkin" "Lobot"
#> [57] "Rugor Nass" "Gasgano"
#> [59] "R4-P17" "Yoda"
map(tbl, ~list(unique(.))) %>% as_data_frame # data frame
#> # A tibble: 1 x 3
#> COL_A COL_B COL_C
#> <list> <list> <list>
#> 1 <chr [59]> <chr [61]> <chr [60]>
# By row
tbl %>% transmute(unique = map(t(.) %>% as_tibble, unique)) # data frame
#> # A tibble: 100 x 1
#> unique
#> <list>
#> 1 <chr [3]>
#> 2 <chr [3]>
#> 3 <chr [3]>
#> 4 <chr [3]>
#> 5 <chr [3]>
#> 6 <chr [3]>
#> 7 <chr [3]>
#> 8 <chr [3]>
#> 9 <chr [3]>
#> 10 <chr [3]>
#> # ... with 90 more rows
tbl %>% # data frame (extended)
mutate(
unique = map(t(.) %>% as_tibble, unique),
n = map_int(unique, length),
pasted = map_chr(unique, paste, collapse = ' - ')
) %>%
arrange(n) # only one row has duplicates
#> # A tibble: 100 x 6
#> COL_A COL_B COL_C unique n
#> <chr> <chr> <chr> <list> <int>
#> 1 C-3PO Bossk C-3PO <chr [2]> 2
#> 2 Dormé Ackbar Adi Gallia <chr [3]> 3
#> 3 Grievous Yarael Poof Tarfful <chr [3]> 3
#> 4 Zam Wesell BB8 Anakin Skywalker <chr [3]> 3
#> 5 Tarfful Yarael Poof Dud Bolt <chr [3]> 3
#> 6 Quarsh Panaka Gasgano Cordé <chr [3]> 3
#> 7 Greedo Raymus Antilles Zam Wesell <chr [3]> 3
#> 8 Wicket Systri Warrick Mas Amedda Biggs Darklighter <chr [3]> 3
#> 9 Dud Bolt Shaak Ti Owen Lars <chr [3]> 3
#> 10 Dooku IG-88 Ki-Adi-Mundi <chr [3]> 3
#> # ... with 90 more rows, and 1 more variables: pasted <chr>
map(t(tbl) %>% as_tibble, unique) # list
#> $V1
#> [1] "Dormé" "Ackbar" "Adi Gallia"
#>
#> $V2
#> [1] "Grievous" "Yarael Poof" "Tarfful"
#>
#> $V3
#> [1] "Zam Wesell" "BB8" "Anakin Skywalker"
#>
#> $V4
#> [1] "Tarfful" "Yarael Poof" "Dud Bolt"
#>
#> $V5
#> [1] "Quarsh Panaka" "Gasgano" "Cordé"
#>
#> $V6
#> [1] "Greedo" "Raymus Antilles" "Zam Wesell"
#>
#> $V7
#> [1] "Wicket Systri Warrick" "Mas Amedda" "Biggs Darklighter"
#>
#> $V8
#> [1] "Dud Bolt" "Shaak Ti" "Owen Lars"
#>
#> $V9
#> [1] "Dooku" "IG-88" "Ki-Adi-Mundi"
#>
#> $V10
#> [1] "Padmé Amidala" "Yoda" "Gregar Typho"
#>
#> $V11
#> [1] "Darth Vader" "Yarael Poof" "Ackbar"
#>
#> $V12
#> [1] "Han Solo" "Jar Jar Binks" "Luminara Unduli"
#>
#> $V13
#> [1] "Bail Prestor Organa" "Jango Fett" "Finis Valorum"
#>
#> $V14
#> [1] "Luke Skywalker" "Jar Jar Binks" "Bossk"
#>
#> $V15
#> [1] "Roos Tarpals" "Taun We" "Finn"
#>
#> $V16
#> [1] "Shmi Skywalker" "Raymus Antilles" "Watto"
#>
#> $V17
#> [1] "Jar Jar Binks" "Captain Phasma" "Wicket Systri Warrick"
#>
#> $V18
#> [1] "Rugor Nass" "Adi Gallia" "Ben Quadinaros"
#>
#> $V19
#> [1] "Jabba Desilijic Tiure" "Luke Skywalker" "Plo Koon"
#>
#> $V20
#> [1] "Rey" "Arvel Crynyd" "Greedo"
#>
#> $V21
#> [1] "Quarsh Panaka" "Ki-Adi-Mundi" "Obi-Wan Kenobi"
#>
#> $V22
#> [1] "Wicket Systri Warrick" "Dud Bolt" "Obi-Wan Kenobi"
#>
#> $V23
#> [1] "Poe Dameron" "R5-D4" "Biggs Darklighter"
#>
#> $V24
#> [1] "Barriss Offee" "Poggle the Lesser" "Ben Quadinaros"
#>
#> $V25
#> [1] "Gregar Typho" "Barriss Offee" "Tarfful"
#>
#> $V26
#> [1] "Jar Jar Binks" "Taun We" "Barriss Offee"
#>
#> $V27
#> [1] "Luminara Unduli" "Rey" "Raymus Antilles"
#>
#> $V28
#> [1] "Mace Windu" "Wedge Antilles" "Watto"
#>
#> $V29
#> [1] "Palpatine" "Lobot" "Shmi Skywalker"
#>
#> $V30
#> [1] "Bib Fortuna" "Dexter Jettster" "Saesee Tiin"
#>
#> $V31
#> [1] "Lama Su" "R2-D2" "Mon Mothma"
#>
#> $V32
#> [1] "Luke Skywalker" "Cliegg Lars" "Kit Fisto"
#>
#> $V33
#> [1] "Wedge Antilles" "Jocasta Nu" "Darth Maul"
#>
#> $V34
#> [1] "Poggle the Lesser" "Cordé" "Luminara Unduli"
#>
#> $V35
#> [1] "Finis Valorum" "Rey" "Dooku"
#>
#> $V36
#> [1] "Nute Gunray" "Grievous" "Dud Bolt"
#>
#> $V37
#> [1] "Shaak Ti" "Luminara Unduli" "Captain Phasma"
#>
#> $V38
#> [1] "Raymus Antilles" "Wedge Antilles" "Sly Moore"
#>
#> $V39
#> [1] "Yarael Poof" "Biggs Darklighter" "Darth Maul"
#>
#> $V40
#> [1] "Wilhuff Tarkin" "Jocasta Nu" "Jar Jar Binks"
#>
#> $V41
#> [1] "Lama Su" "Gasgano" "Finn"
#>
#> $V42
#> [1] "Watto" "Luminara Unduli" "Rey"
#>
#> $V43
#> [1] "Tion Medon" "Yoda" "Raymus Antilles"
#>
#> $V44
#> [1] "Dexter Jettster" "Plo Koon" "Lama Su"
#>
#> $V45
#> [1] "Bossk" "Chewbacca" "Plo Koon"
#>
#> $V46
#> [1] "Arvel Crynyd" "Poe Dameron" "Cliegg Lars"
#>
#> $V47
#> [1] "Owen Lars" "Padmé Amidala" "Finn"
#>
#> $V48
#> [1] "Darth Vader" "Ayla Secura" "Ben Quadinaros"
#>
#> $V49
#> [1] "Leia Organa" "Tion Medon" "Nien Nunb"
#>
#> $V50
#> [1] "Plo Koon" "Finis Valorum" "Lama Su"
#>
#> $V51
#> [1] "Poe Dameron" "Ratts Tyerell" "Anakin Skywalker"
#>
#> $V52
#> [1] "Ratts Tyerell" "Ki-Adi-Mundi" "Poe Dameron"
#>
#> $V53
#> [1] "Arvel Crynyd" "Taun We" "R2-D2"
#>
#> $V54
#> [1] "Yoda" "Qui-Gon Jinn" "Bossk"
#>
#> $V55
#> [1] "Dooku" "Yoda" "Grievous"
#>
#> $V56
#> [1] "Ayla Secura" "Lama Su" "Wat Tambor"
#>
#> $V57
#> [1] "Dooku" "Poggle the Lesser" "C-3PO"
#>
#> $V58
#> [1] "Beru Whitesun lars" "Watto" "Jabba Desilijic Tiure"
#>
#> $V59
#> [1] "Watto" "Wat Tambor" "Anakin Skywalker"
#>
#> $V60
#> [1] "Boba Fett" "Saesee Tiin" "Darth Maul"
#>
#> $V61
#> [1] "Lama Su" "Bail Prestor Organa" "Rey"
#>
#> $V62
#> [1] "Bossk" "Adi Gallia" "Ratts Tyerell"
#>
#> $V63
#> [1] "Captain Phasma" "San Hill" "Jocasta Nu"
#>
#> $V64
#> [1] "Jango Fett" "Jocasta Nu" "Sly Moore"
#>
#> $V65
#> [1] "Captain Phasma" "Jar Jar Binks" "Luminara Unduli"
#>
#> $V66
#> [1] "Palpatine" "Bib Fortuna" "R5-D4"
#>
#> $V67
#> [1] "Rey" "Padmé Amidala" "Anakin Skywalker"
#>
#> $V68
#> [1] "Han Solo" "Jabba Desilijic Tiure" "Taun We"
#>
#> $V69
#> [1] "Saesee Tiin" "R4-P17" "Padmé Amidala"
#>
#> $V70
#> [1] "Rey" "Roos Tarpals" "Luminara Unduli"
#>
#> $V71
#> [1] "Poggle the Lesser" "BB8" "Chewbacca"
#>
#> $V72
#> [1] "Ayla Secura" "Wedge Antilles" "Arvel Crynyd"
#>
#> $V73
#> [1] "Finis Valorum" "Jango Fett" "Han Solo"
#>
#> $V74
#> [1] "Nien Nunb" "Mon Mothma" "Padmé Amidala"
#>
#> $V75
#> [1] "Leia Organa" "Finis Valorum" "Jango Fett"
#>
#> $V76
#> [1] "Yarael Poof" "Jango Fett" "Roos Tarpals"
#>
#> $V77
#> [1] "Poe Dameron" "Barriss Offee" "Mace Windu"
#>
#> $V78
#> [1] "Gregar Typho" "Tion Medon" "R2-D2"
#>
#> $V79
#> [1] "Dud Bolt" "Taun We" "Nien Nunb"
#>
#> $V80
#> [1] "Han Solo" "R4-P17" "San Hill"
#>
#> $V81
#> [1] "Shaak Ti" "Greedo" "Wilhuff Tarkin"
#>
#> $V82
#> [1] "Dud Bolt" "Rugor Nass" "Lama Su"
#>
#> $V83
#> [1] "Luke Skywalker" "Jek Tono Porkins" "Biggs Darklighter"
#>
#> $V84
#> [1] "C-3PO" "Bossk"
#>
#> $V85
#> [1] "Chewbacca" "BB8" "Lobot"
#>
#> $V86
#> [1] "Mon Mothma" "Jocasta Nu" "Rugor Nass"
#>
#> $V87
#> [1] "Ratts Tyerell" "Qui-Gon Jinn" "Owen Lars"
#>
#> $V88
#> [1] "Ayla Secura" "Finn" "Taun We"
#>
#> $V89
#> [1] "Taun We" "Greedo" "Dud Bolt"
#>
#> $V90
#> [1] "Owen Lars" "Han Solo" "Gasgano"
#>
#> $V91
#> [1] "Tion Medon" "Jango Fett" "Barriss Offee"
#>
#> $V92
#> [1] "Jocasta Nu" "Tarfful" "R4-P17"
#>
#> $V93
#> [1] "Beru Whitesun lars" "Mon Mothma" "Owen Lars"
#>
#> $V94
#> [1] "Saesee Tiin" "Arvel Crynyd" "Nien Nunb"
#>
#> $V95
#> [1] "Dormé" "Captain Phasma" "Taun We"
#>
#> $V96
#> [1] "Dud Bolt" "Luke Skywalker" "Lobot"
#>
#> $V97
#> [1] "Dormé" "Lando Calrissian" "Jocasta Nu"
#>
#> $V98
#> [1] "Jango Fett" "Taun We" "Yoda"
#>
#> $V99
#> [1] "Biggs Darklighter" "IG-88" "Ki-Adi-Mundi"
#>
#> $V100
#> [1] "Roos Tarpals" "Quarsh Panaka" "C-3PO"
# Overall (suggested by @tiernan <https://forum.posit.co/u/tiernan>)
tbl %>% flatten_chr %>% unique
#> [1] "Dormé" "Grievous"
#> [3] "Zam Wesell" "Tarfful"
#> [5] "Quarsh Panaka" "Greedo"
#> [7] "Wicket Systri Warrick" "Dud Bolt"
#> [9] "Dooku" "Padmé Amidala"
#> [11] "Darth Vader" "Han Solo"
#> [13] "Bail Prestor Organa" "Luke Skywalker"
#> [15] "Roos Tarpals" "Shmi Skywalker"
#> [17] "Jar Jar Binks" "Rugor Nass"
#> [19] "Jabba Desilijic Tiure" "Rey"
#> [21] "Poe Dameron" "Barriss Offee"
#> [23] "Gregar Typho" "Luminara Unduli"
#> [25] "Mace Windu" "Palpatine"
#> [27] "Bib Fortuna" "Lama Su"
#> [29] "Wedge Antilles" "Poggle the Lesser"
#> [31] "Finis Valorum" "Nute Gunray"
#> [33] "Shaak Ti" "Raymus Antilles"
#> [35] "Yarael Poof" "Wilhuff Tarkin"
#> [37] "Watto" "Tion Medon"
#> [39] "Dexter Jettster" "Bossk"
#> [41] "Arvel Crynyd" "Owen Lars"
#> [43] "Leia Organa" "Plo Koon"
#> [45] "Ratts Tyerell" "Yoda"
#> [47] "Ayla Secura" "Beru Whitesun lars"
#> [49] "Boba Fett" "Captain Phasma"
#> [51] "Jango Fett" "Saesee Tiin"
#> [53] "Nien Nunb" "C-3PO"
#> [55] "Chewbacca" "Mon Mothma"
#> [57] "Taun We" "Jocasta Nu"
#> [59] "Biggs Darklighter" "Ackbar"
#> [61] "BB8" "Gasgano"
#> [63] "Mas Amedda" "IG-88"
#> [65] "Adi Gallia" "Ki-Adi-Mundi"
#> [67] "R5-D4" "Lobot"
#> [69] "R2-D2" "Cliegg Lars"
#> [71] "Cordé" "Qui-Gon Jinn"
#> [73] "Wat Tambor" "San Hill"
#> [75] "R4-P17" "Jek Tono Porkins"
#> [77] "Finn" "Lando Calrissian"
#> [79] "Anakin Skywalker" "Ben Quadinaros"
#> [81] "Obi-Wan Kenobi" "Kit Fisto"
#> [83] "Darth Maul" "Sly Moore"
1 Like
Thank you, @tiernan for the guidance. This is exactly what I was looking for.
Thanks @CorradoLanera for the examples. I probably should have been more specific to clarify that I wanted to get unique strings on more than one column at the same time.
Glad to hear it!
Consider marking this question as "solved" by clicking the checkbox under my post:
This will help people with the same question find the answer quickly.
1 Like