Hello I'm trying to use first names and last names together to predict the ethnicity for users. Since my real dataset is quite huge, more than 1M rows, I'd like to use 'print[i]' to see which line is executed now. However, it does not work. Can you help me with it? Really appreciate it.
I have another question as well. Since there are multiple duplicated first names and last names, I'd like to extract the unique first and last names at first. After obtaining the ethnicity information, I will join them with the original dataset. Can you also help me with it?
# Load the library
library(rethnicity)
library(tidyverse)
# Sample dataset
id <- 1:7
NAME <- c("A Katherine", "Aadar", "Xing",
"aaron", NA, "Alan",
"aaron")
LASTNAME <- c("Austin", "Gupta", "Zhao",
"darling", NA, NA,
"darling")
df <- data.frame(id, NAME, LASTNAME)
df
# Get the unique first name and last name
unique_first_last_names <- df |>
distinct(NAME, LASTNAME, .keep_all = TRUE)
# Initialize an empty vector for storing ethnicity predictions
unique_first_last_names$Ethnicity <- vector("character", nrow(unique_first_last_names))
# Loop through each row, predict ethnicity, and print row number
for (i in 1:nrow(unique_first_last_names)) {
print(i)
if (is.na(unique_first_last_names$NAME[i]) | is.na(unique_first_last_names$LASTNAME[i])) {
unique_first_last_names$Ethnicity[i] <- "unknown"
} else {
unique_first_last_names$Ethnicity[i] <- predict_ethnicity(firstnames = unique_first_last_names$NAME[i],
lastnames = unique_first_last_names$LASTNAME[i],
method = "fullname")
}
}
# Analyze the results
print(unique_first_last_names)
# Merge with the original dataset
However, this simple line works:
predict_ethnicity(firstnames = df$NAME[1], lastnames = df$LASTNAME[1], method = "fullname")
Really appreciate your help.