In the following reprex, I'd like to create Petal.Length
and Petal.Width
columns in the virginica_species
data frame based on the following conditions: 1) each row in virginica_species
finds the most similar value in setosa_versicolor
according to their Septal.Length
columns that is also in the same new_col
group, and then 2) I'd like to extract the corresponding Petal.Length
value in setosa_versicolor
to create virginica_species
's column Petal.Width
. 3) Then I'd like to do the same but with regards to the Sepal.Width
and Petal.Width
columns. The idea is, in my actual data there could be an unspecified number of these columns that I'd like to 'match' between two data sets so I can't just manually input the column names beyond, say, setting a vector equal to the column name 'inputs' and running that vector into something like mutate_at(vars())
.
library(dplyr)
# creates my two data frames, with `setosa_versicolor` being the data frame I'd like to map the onto `virginica_species` data frame
my_iris <- as.data.frame(iris) %>% mutate(new_col = (row_number() - 1) %% 2)
setosa_versicolor <- filter(my_iris, Species != "virginica")
virginica_species <- my_iris %>% filter(Species == "virginica") %>% select(-c(Petal.Length, Petal.Width))
# As an example of what I'd like the output to look like, this would be the first two rows of the final `virginica_species` data frame:
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_col
# 1 6.3 3.3 4.9 1.6 virginica 0 # The closest Sepal.Length value in `seotsa_versicolor` and also with new_col == 1 also happened to be 6.3, so we find that row in `seotsa_versicolor` and extract its `Petal.Length` val; closest Sepal.Width value with new_col == 1 was 3.3, so we extract its `Petal.Width`
# 2 5.8 2.7 4.1 1.6 virginica 1 # Same methodology in this and every subsequent line (note there are sometimes multiple matching values in `seotsa_versicolor` but in my actual data there aren't any matches so I don't particularly care in this repex how multiple matches are chosen between)
I gave it my best shot (many attempts not documented below) but was getting an error, wasn't sure whether group_by was achieving what I was intended, couldn't generalize the method to any number of unspecified columns, and didn't know how to extract Petal.Length
or Petal.Width
when I finally did determine row in setosa_versicolor
with the closest value in Sepal.Length
or Sepal.Width
library(DescTools)
sepal_cols <- c("Sepal.Length", "Sepal.Width")
virginica_species %>%
group_by(new_col) %>%
rowwise() %>%
mutate_at(vars(sepal_cols), list(petals = ~ Closest(pull(setosa_versicolor, quo_name(quo("Sepal.Width"))), .)))
# Error: Column `Sepal.Width_petals` must be length 1 (the group size), not 3
Many thanks!