Hi, I am a recovering SAS addict. Please forgive if my questions seem very basic, but I am trying to grasp the fundementals of R.
One of my usual tasks is to extract data on individuals of interest from large datasets. Let us assume that I have two dataframes, one includes my population:
df1<-data.frame(id=c(1,2,3,4,5))
and the other is a dataset which includes the data I am interested in:
id<-c(2,2,2,2,3,4,4,5,5,5,7,7,8,9,9)
v1<-c("w","h","y","d","o","i","h","a","v","e","t","o","l","e","a")
v2<-c("r","n","r","w","h","y","d","o","e","s","s","a","s","h","a")
v3<-c("v","e","t","o","b","e","s","o","c","o","n","s","u","m","e")
v4<-c("r","u","n","f","r","i","e","n","d","l","y","a","n","d","u")
v5<-c("s","e","a","r","o","t","t","e","n","b","u","s","i","n","e")
v6<-c("s","s","m","o","d","e","l","w","h","y","w","h","y","w","h")
df2<-data.frame(id,v1,v2,v3,v4,v5,v6)
Note that df2 includes some of the individuals from df1, it also includes other individuals, and each id in df1 may appear between zero and several times in df2.
What I want is to create a new variable (df1$newvar) which will assume the value 1 if the following conditions are met (and 0 if they are not):
There exists a record in df2 with
- the same value on id
and 2) any of the variables v1-v6 has the value "a", "b", or "c".
In reality df2 would obviously hold millions of records from thousands of individuals, but if R can do it in a tiny scale , it can also do it in a large scale.