I am trying to create a dummy variable using a for loop.
I am working on the online database called NIS.
I am drying to find a specific diagnosis code across different columns let's say DX2 to DX25. I want to create a new variable hypertension=1 if code is present and 0 if not present across these different column. I tried to use the for loops but it has not been working and I am doing something wrong for sure. I was wondering if someone know how to approach this?
Thank you
You can do this using the apply() function and looping over all the rows checking if the value you are interested in is present. Here is an example:
#Generate some data
myData = data.frame(
DX1 = LETTERS[1:5],
DX2 = LETTERS[4:8],
DX3 = LETTERS[c(3,5,1,9,11)]
)
myData
#> DX1 DX2 DX3
#> 1 A D C
#> 2 B E E
#> 3 C F A
#> 4 D G I
#> 5 E H K
# Example: Hypertension = A
myData$hypertension = apply(myData, 1, function(row){
ifelse("A" %in% row, 1, 0)
})
myData
#> DX1 DX2 DX3 hypertension
#> 1 A D C 1
#> 2 B E E 0
#> 3 C F A 1
#> 4 D G I 0
#> 5 E H K 0
The apply function will loop over all rows (1, we used this) or columns (2) of a data frame performing a function you provide (here I created one).
Thank you so much. It has been very helpful. I am working a very large database with multiple columns. Is there a way I can apply the function to only the specific columns? In this case I want to apply it to column DX2 to DX25?
Thank you so much. That helped! I used the [col1:19] and that worked as well. Now, I am trying to find different diagnosis code for same disease using OR e.g A or B. I used c("A", "B) and it didn't work. I also tried "A" |"B" but it is giving me an error. Do you know what would be the correct code for that?