Hi, I am a recovering SAS addict. Please forgive if my questions seem very basic, but I am trying to grasp the fundementals of R.
One of my usual tasks is to extract data on individuals of interest from large datasets. Let us assume that I have two dataframes, one includes my population:
and the other is a dataset which includes the data I am interested in:
Note that df2 includes some of the individuals from df1, it also includes other individuals, and each id in df1 may appear between zero and several times in df2.
What I want is to create a new variable (df1$newvar) which will assume the value 1 if the following conditions are met (and 0 if they are not):
There exists a record in df2 with
- the same value on id
and 2) any of the variables v1-v6 has the value "a", "b", or "c".
In reality df2 would obviously hold millions of records from thousands of individuals, but if R can do it in a tiny scale , it can also do it in a large scale.