Creating new varibles like in arrays from SAS

Conce · December 5, 2023, 12:27am

I use array in SAS such this example below but I dont kow how to replicate this in R. I have many (lie 180) variables describing measures of teeth surfaces, and I need to cread one variable that will tell me how many of these 180 variables have a measure of more then 4mm. The example bellow is how I do in SAS . In this example that I am posting each variable indicates bleeding on that surface and I need to know for each individual the amount of sites with bleeding. Does anybody know how to replicate this code in R?

"
nsangsitiosv = 0;
nsangv = 0;
array sangv (84)

rcSS47dv rcSS47v rcSS47mv rcSS46dv rcSS46v rcSS46mv rcSS45dv rcSS45v rcSS45mv rcSS44dv rcSS44v rcSS44mv rcss43dv rcss43v rcss43mv rcss42dv rcss42v rcss42mv rcss41dv rcss41v rcss41mv
rcss31mv rcss31v rcss31dv rcss32mv rcss32v rcss32dv rcss33mv rcss33v rcss33dv rcSS34mv rcSS34v rcSS34dv rcSS35mv rcSS35v rcSS35dv rcSS36mv rcSS36v rcSS36dv rcSS37mv rcSS37v rcSS37dv
rcSS17dv rcSS17v rcSS17mv rcSS16dv rcSS16v rcSS16mv rcSS15dv rcSS15v rcSS15mv rcSS14dv rcSS14v rcSS14mv rcss13dv rcss13v rcss13mv rcss12dv rcss12v rcss12mv rcss11dv rcss11v rcss11mv
rcss21mv rcss21v rcss21dv rcss22mv rcss22v rcss22dv rcss23mv rcss23v rcss23dv rcss24mv rcss24v rcss24dv rcss25mv rcss25v rcss25dv rcss26mv rcss26v rcss26dv rcss27mv rcss27v rcss27dv;

do i = 1 to 84;

if sangv (i) >= 8 then sangv (i) = sangv (i) = .;
if sangv (i) > . and sangv (i) < 8 then nsangsitiosv = nsangsitiosv + 1;
if sangv (i) = 1 then nsangv = nsangv + 1;

if ppc3 = . then nsangsitiosv = .;
if ppc3 = . then nsangv = .;

psangv = nsangv/nsangsitiosv;
end;
"

I havent tried in R but creating thousand of new variables with each condition and then summarizing the ones I want, but it is a lot of coding for each variable. I would like some equivalent in R similar to SAs.

jrkrideau · December 5, 2023, 1:29am

I have not used SAS in so long that I don't understand the terminology.

Are you saying that you have an N X 180 dataset and need to know how many times in all 180 column you are getting a hit?

What kind of output is expected?

We probably need to sse some sample data.
A handy way to supply sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here between
```

```

Conce · December 5, 2023, 10:35am

I wil try to explain better. We measured in milimiters 6 surfaces of each teeth in mount recording at total 186 variables (not data sets). So I have 186 information for each patients. I want to creat a new variable that will inform me how many of this 186 variables are grater then 4mm.

In SAS I usually create a variable containing zero . Ex. totalinformation = 0. Then I prepare what we call in SAS array that contains the 186 variables, so I can "say to sas" to sum all variables greater thatn 4mm. In SAS I do this like :

totalinformation = 0;
array inf ( variable1 variable2 variable3 ....variable186);
do i = 1 to 186;
if (i) >= 4 then totalinforamation = totalinformation + 1;
end;

nirgrahamuk · December 5, 2023, 4:44pm

in R you wont need to construct an array from your data.frame, as the frame directly lends itself to such work.
I use tidyverse package

library(tidyverse)

(your_data<-data.frame(measure_1=1:3,
                      measure_2=rep(1,3),
                      measure_3=3:1))

mutate(rowwise(your_data),
       is_over_1 = sum(c_across(where(is.numeric))>1))
  ```
in the above each row has 3 measurments,  the code , for each row, adds up how many over a threshold (1) are there, and puts the result in another column (is_over_1)

Conce · December 5, 2023, 5:05pm

But I need to specify each variable becaus my data set has thousand of other variables such , sex, socioeconomic level, psychological stuff and I have to keep them. If I have to breakdown separete data set for each computation it will be insane.

Maria

nirgrahamuk · December 5, 2023, 5:09pm

you can replace where(is.numeric) with anything that picks out the variables to use
i.e. if you can list them out, then you can list them out here, you can also exploit patterns with the start and end of their names etc. its very flexible.

if the variables are in your set you can put

my_hand_crafted_list_of_names <- c("rcSS47dv","rcSS47v","rcSS47mv","rcSS46dv","rcSS46v","rcSS46mv","rcSS45dv","rcSS45v","rcSS45mv","rcSS44dv","rcSS44v","rcSS44mv","rcss43dv","rcss43v","rcss43mv","rcss42dv","rcss42v","rcss42mv","rcss41dv","rcss41v","rcss41mv",
"rcss31mv","rcss31v","rcss31dv","rcss32mv","rcss32v","rcss32dv","rcss33mv","rcss33v","rcss33dv","rcSS34mv","rcSS34v","rcSS34dv","rcSS35mv","rcSS35v","rcSS35dv","rcSS36mv","rcSS36v","rcSS36dv","rcSS37mv","rcSS37v","rcSS37dv",
"rcSS17dv","rcSS17v","rcSS17mv","rcSS16dv","rcSS16v","rcSS16mv","rcSS15dv","rcSS15v","rcSS15mv","rcSS14dv","rcSS14v","rcSS14mv","rcss13dv","rcss13v","rcss13mv","rcss12dv","rcss12v","rcss12mv","rcss11dv","rcss11v","rcss11mv",
"rcss21mv","rcss21v","rcss21dv","rcss22mv","rcss22v","rcss22dv","rcss23mv","rcss23v","rcss23dv","rcss24mv","rcss24v","rcss24dv","rcss25mv","rcss25v","rcss25dv","rcss26mv","rcss26v","rcss26dv","rcss27mv","rcss27v","rcss27dv")

mutate(rowwise(your_data),
       is_over_1 = sum(c_across(all_of(my_hand_crafted_list_of_names ))>1))

jrkrideau · December 5, 2023, 5:18pm

I think I may understand. Does this do something close to what you want? You will, probably, need to install {data.table}

install.packages("data.table")

library(data.table)
dat1  <- data.table(id = sample(letters[1:3],10, replace = TRUE),
                  aa = sample(1:8, 10, replace = TRUE),
                  bb = sample(1:8, 10, replace = TRUE),
                  cc = sample(1:8, 10, replace = TRUE),
                  cc = sample(1:8, 10, replace = TRUE))
dat1

DT  <- melt(dat1)
DT1  <- DT[value >=5, .N, by = variable]
# or
DT2 <- DT[value >=5, .N, by = c("id", "variable")]

Conce · December 5, 2023, 6:37pm

I need to creat the new variable inside my primary data set.

I am reaching a conclusion that it is impossible with R. So that is a limitation of R compared with SAS or Stata.
I already asked several people and no one nows how to do this.

Thanks , going back to SAS.

nirgrahamuk · December 5, 2023, 6:53pm

Im sorry it seems i have failed to communicate that my solution is indeed a solution (i firmly believe it is). Best of luck to you.

jrkrideau · December 6, 2023, 3:44am

It might help to explain what you want.

I need to creat the new variable inside my primary data set.

Why?
We still have no idea what your SAS program does.

Can you give us some examples of the dataset before and after the SAS run?

system · January 17, 2024, 3:45am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.