Converting Names into Identification Codes in different columns

I am new with R and I am struggling with the following issue:

I have a dataset more or less like this:

NAME                       Collegue1                   Collegue 2
John Smith                 Bill Gates                   Brad Pitt
Adam Sandler               Mary Lynn                    John Smith
Bill Gates                 NA                           Adam Sandler
Brad Pitt                  John Smith                   Bob Morriss

I need to create an ID code and substitute names with the corresponding ID in the three columns, how can I do that? The result should be more or less like the following:

ID. NAME. Collegue1. Collegue2

  1.     John Smith                3                   4
  2.     Adam Sandler              5                   1
  3.     Bill Gates                NA                  2
  4.     Brad Pitt                 1                   6

Any idea? Thanks

I would do it like this.

library(dplyr, warn.conflicts = FALSE)

data <- tribble(~ Name, ~ Colleague1, ~ Colleague2,
                "John Smith", "Bill Gates", "Brad Pitt",
                "Adam Sandler", "Mary Lynn", "John Smith",
                "Bill Gates", NA, "Adam Sandler",
                "Brad Pitt", "John Smith", "Bob Morris")

ids <- data %>% 
  select(Name) %>% 

data %>% 
  left_join(ids, by = c("Colleague1" = "Name")) %>% 
  left_join(ids, by = c("Colleague2" = "Name")) %>% 
  select(Name, Colleague1_ID = rowname.x, Colleague2_ID = rowname.y)
#> # A tibble: 4 x 3
#>   Name         Colleague1_ID Colleague2_ID
#>   <chr>        <chr>         <chr>        
#> 1 John Smith   3             4            
#> 2 Adam Sandler <NA>          1            
#> 3 Bill Gates   <NA>          2            
#> 4 Brad Pitt    1             <NA>

Created on 2020-04-29 by the reprex package (v0.3.0)

df <- tibble::tribble(
  ~NAME, ~Collegue1, ~Collegue2,
  "John Smith"   , "Bill Gates", "Brad Pitt",
  "Adam Sandler" , "Mary Lynn" , "John Smith",
  "Bill Gates"   , NA          , "Adam Sandler",
  "Brad Pitt"    , "John Smith", "Bob Morriss"

all_names <- sort(unique(c(df$NAME,df$Collegue1,df$Collegue2)))

df$Collegue1 <- factor(df$Collegue1,levels=all_names)
df$Collegue2 <- factor(df$Collegue2,levels=all_names)

df2<- df %>% mutate_at(.vars = c("Collegue1","Collegue2"),
                       .funs = as.integer)

thanks. What if I have a dataset with 200+ rows and 100+ columns, so tibble:tribble is not the best for me. What would you do?

You asked how to transform your data.
This required me to generate your example data. The tribble is an example of how to communicate example data on the forum.

Here is a fully generic solution, again your own data would be in place of df.


df <- tibble::tribble(
  ~NAME, ~Collegue1, ~Collegue2,
  "John Smith"   , "Bill Gates", "Brad Pitt",
  "Adam Sandler" , "Mary Lynn" , "John Smith",
  "Bill Gates"   , NA          , "Adam Sandler",
  "Brad Pitt"    , "John Smith", "Bob Morriss"

contents <- map(names(df)[-1],~df[[.]])
all_names<- sort(unique(reduce(contents,c)))

df2<- df %>% mutate_at(.vars = names(df)[-1],
                       .funs = ~ as.integer(factor(.,levels=all_names)))

Superuseful, thanks a lot!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.