In the following reprex, I'd like to create Petal.Length and Petal.Width columns in the virginica_species data frame based on the following conditions: 1) each row in virginica_species finds the most similar value in setosa_versicolor according to their Septal.Length columns that is also in the same new_col group, and then 2) I'd like to extract the corresponding Petal.Length value in setosa_versicolor to create virginica_species's column Petal.Width. 3) Then I'd like to do the same but with regards to the Sepal.Width and Petal.Width columns. The idea is, in my actual data there could be an unspecified number of these columns that I'd like to 'match' between two data sets so I can't just manually input the column names beyond, say, setting a vector equal to the column name 'inputs' and running that vector into something like mutate_at(vars()).
library(dplyr)
# creates my two data frames, with `setosa_versicolor` being the data frame I'd like to map the onto `virginica_species` data frame
my_iris <- as.data.frame(iris) %>% mutate(new_col = (row_number() - 1) %% 2)
setosa_versicolor <- filter(my_iris, Species != "virginica")
virginica_species <- my_iris %>% filter(Species == "virginica") %>% select(-c(Petal.Length, Petal.Width))
# As an example of what I'd like the output to look like, this would be the first two rows of the final `virginica_species` data frame:
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_col
# 1 6.3 3.3 4.9 1.6 virginica 0 # The closest Sepal.Length value in `seotsa_versicolor` and also with new_col == 1 also happened to be 6.3, so we find that row in `seotsa_versicolor` and extract its `Petal.Length` val; closest Sepal.Width value with new_col == 1 was 3.3, so we extract its `Petal.Width`
# 2 5.8 2.7 4.1 1.6 virginica 1 # Same methodology in this and every subsequent line (note there are sometimes multiple matching values in `seotsa_versicolor` but in my actual data there aren't any matches so I don't particularly care in this repex how multiple matches are chosen between)
I gave it my best shot (many attempts not documented below) but was getting an error, wasn't sure whether group_by was achieving what I was intended, couldn't generalize the method to any number of unspecified columns, and didn't know how to extract Petal.Length or Petal.Width when I finally did determine row in setosa_versicolor with the closest value in Sepal.Length or Sepal.Width
library(DescTools)
sepal_cols <- c("Sepal.Length", "Sepal.Width")
virginica_species %>%
group_by(new_col) %>%
rowwise() %>%
mutate_at(vars(sepal_cols), list(petals = ~ Closest(pull(setosa_versicolor, quo_name(quo("Sepal.Width"))), .)))
# Error: Column `Sepal.Width_petals` must be length 1 (the group size), not 3
Many thanks!
