I have the following code which populates a vector by performing a lookup of values in one dataframe on values in another dataframe (like a v or xlookup in microsoft excel). This is however too large for a spreadsheet . The challenge is that the resulting vector has a length of 1 greater than the number of rows of the input data. i.e 2,000,000 (data) and 2,000,001 (vector).
raw_data contains data that I want to lookup and IntdUniGroup contains data that raw_data will be looke up against (I hope that makes sense)
intd_uni_group <- c() #creates an empty vector
for(i in 1:nrow(raw_data)){
intd_uni_group <- c(intd_uni_group,
if(is.na(raw_data$PHDuniv[i])){
NA
} else if(tolower(raw_data$PHDuniv[i]) %in% tolower(IntdUniGroup$PHDuniv)){
IntdUniGroup$Group[which(tolower(raw_data$PHDuniv[i]) == tolower(IntdUniGroup$PHDuniv))]
} else{
NA
}
)
}
I also tried this using the apply function below. While the resulting list has a length equal to that of the input dataframe, when I unlist it, I end up with the same scenario as above with the length being one element larger than the number of rows in the input dataframe.
intd_uni_group2 <- apply(raw_data[,"PHDuniv"],
1,
function(x) if(is.na(x)){
NA
} else if(tolower(x) %in% tolower(IntdunivGroup$PHDuniv)){
IntdunivGroup$Group[which(tolower(x) == tolower(IntdUnivGroup$PHDuniv))]
} else{
NA
}
)