For loops and dummy variable

Hi all

I am trying to create a dummy variable using a for loop.
I am working on the online database called NIS.
I am drying to find a specific diagnosis code across different columns let's say DX2 to DX25. I want to create a new variable hypertension=1 if code is present and 0 if not present across these different column. I tried to use the for loops but it has not been working and I am doing something wrong for sure. I was wondering if someone know how to approach this?
Thank you

Hi,

Welcome to the RStudio community!

You can do this using the apply() function and looping over all the rows checking if the value you are interested in is present. Here is an example:

#Generate some data
myData = data.frame(
  DX1 = LETTERS[1:5],
  DX2 = LETTERS[4:8],
  DX3 = LETTERS[c(3,5,1,9,11)]
)
myData
#>   DX1 DX2 DX3
#> 1   A   D   C
#> 2   B   E   E
#> 3   C   F   A
#> 4   D   G   I
#> 5   E   H   K

# Example: Hypertension = A
myData$hypertension = apply(myData, 1, function(row){
  ifelse("A" %in% row, 1, 0)
})

myData
#>   DX1 DX2 DX3 hypertension
#> 1   A   D   C            1
#> 2   B   E   E            0
#> 3   C   F   A            1
#> 4   D   G   I            0
#> 5   E   H   K            0

The apply function will loop over all rows (1, we used this) or columns (2) of a data frame performing a function you provide (here I created one).

Created on 2022-06-01 by the reprex package (v2.0.1)

Hope this helps,
PJ

Thank you so much. It has been very helpful. I am working a very large database with multiple columns. Is there a way I can apply the function to only the specific columns? In this case I want to apply it to column DX2 to DX25?

Thanks again
Zafar

Hi,

You can simply limit the columns over which to run the apply function like this:

#Only evaluate column DX1 and DX3
myData$hypertension = apply(myData[, c("DX1", "DX3")], 1, function(row){
  ifelse("A" %in% row, 1, 0)
})

PJ

Thank you so much. That helped! I used the [col1:19] and that worked as well. Now, I am trying to find different diagnosis code for same disease using OR e.g A or B. I used c("A", "B) and it didn't work. I also tried "A" |"B" but it is giving me an error. Do you know what would be the correct code for that?

Thank you

Hi,

Just replace this:

ifelse("A" %in% row, 1, 0)

by this

ifelse(any(c("A", "B") %in% row), 1, 0)

You can also use all() instead of any() if you want A AND B.

PJ

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.