I am reaching out for tips on data management in tidyverse:
I have three data sets. Datasets 2a and 2b comprise each a randomized half of dataset 1. Dataset 1 contains an extra variable which I want to add to the correct rows in datasets 2a and 2b.
The order of the rows are different between the datasets, and must remain so. Does anyone have a tip about how that can be done? It is a data set with 4000 observations.
I am thinking there must be a way to identify identical rows across the datasets, but am not sure how to go about executing the operation. (I do have more variables than names and surnames, so all rows are uniquely identifiable within each dataset).
I hope that was clear enough, and appreciate any advice!
of course, you could add a unique row identified to dataset1, and preserve it so it propogates when you sample dataset1 and make datasets 2a and 2b, but if you are doing that, then you may as well sample the entire rows ?
Update: When trying the solution on my actual data set, the 'phone' variable returns only NA's in the dataset2ax dataset. The variable is there, but no values. Any ideas?