Ethnicity Prediction from package 'rethnicity'

YuJiang · April 28, 2023, 7:42pm

Hello I'm trying to use first names and last names together to predict the ethnicity for users. Since my real dataset is quite huge, more than 1M rows, I'd like to use 'print[i]' to see which line is executed now. However, it does not work. Can you help me with it? Really appreciate it.

I have another question as well. Since there are multiple duplicated first names and last names, I'd like to extract the unique first and last names at first. After obtaining the ethnicity information, I will join them with the original dataset. Can you also help me with it?

# Load the library
library(rethnicity)
library(tidyverse)

# Sample dataset
id <- 1:7
NAME <- c("A Katherine", "Aadar", "Xing", 
          "aaron", NA, "Alan", 
          "aaron")

LASTNAME <- c("Austin", "Gupta", "Zhao", 
              "darling", NA, NA,
              "darling")

df <- data.frame(id, NAME, LASTNAME)
df

# Get the unique first name and last name
unique_first_last_names <- df |> 
  distinct(NAME, LASTNAME, .keep_all = TRUE)

# Initialize an empty vector for storing ethnicity predictions
unique_first_last_names$Ethnicity <- vector("character", nrow(unique_first_last_names))

# Loop through each row, predict ethnicity, and print row number
for (i in 1:nrow(unique_first_last_names)) {
  print(i)
  
  if (is.na(unique_first_last_names$NAME[i]) | is.na(unique_first_last_names$LASTNAME[i])) {
    unique_first_last_names$Ethnicity[i] <- "unknown"
  } else {
    unique_first_last_names$Ethnicity[i] <- predict_ethnicity(firstnames = unique_first_last_names$NAME[i],
                                                              lastnames = unique_first_last_names$LASTNAME[i], 
                                         method = "fullname")
  }
}

# Analyze the results
print(unique_first_last_names)

# Merge with the original dataset

However, this simple line works:

predict_ethnicity(firstnames = df$NAME[1], lastnames = df$LASTNAME[1], method = "fullname")

Really appreciate your help.

system · June 9, 2023, 7:42pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.