Coding in SAS vs. R

Hello All,

As I mentioned before, I am new to R and I am trying to mimic the tasks that I can easily be done in SAS with R.
Here is another example:
In SAS I wrote these lines:

data viscdaib_;
merge viscdaib(where=(paramcd="T_CDAI") rename=(aval=cdaib) keep= usubjid paramcd aval trtn)
viscdaib(where=(paramcd="I_NLVSS") rename=(aval=sfb) keep= usubjid paramcd aval trtn)
viscdaib(where=(paramcd="I_APSS") rename=(aval=apb) keep= usubjid paramcd aval trtn);
by usubjid;

Here I am creating 'a' dataset viscdaib_ from viscdaib by merging three datasets where the paramcd are distinct, renaming the aval variable into relevant variable names, keeping only the needed variables and calculating sfb and apb - A simple task accomplished in one data step.

With R, what I can do using filter(), rename() and select() functions:

Rename and calculate sfb and apb

cdaib <- visdatab |> filter(paramcd=="T_CDAI") |> rename(cdaib=aval) |> select(usubjid, paramcd, trtn, cdaib)
sfb <- visdatab |> filter(paramcd=="I_NLVSS") |> rename(sfb=aval/7) |> select(usubjid, paramcd, trtn, sfb)
apb <- visdatab |> filter(paramcd=="I_APSS") |> rename(apb=aval/7) |> select(usubjid, paramcd, trtn, apb)

These created three additional data and I did piping to go from one dataset to another.

Questions to the Gurus:

  1. Is there any simpler way to accomplish this by combining all these functions and not creating three different dataset?

  2. If not, I also need to merge these three data together by usubjid. That will create another dataset and I do not yet know how to merge three datasets.

I can always write my own functions but asking the if there is a simpler way to do that with already built functions.

Thanks in advanced.


Hello there,

Great, that you're taking a stab at R - I promise you, if you hang in there, you won't regret it!

Next, I would highly recommend setting aside some time for going through this book:

Lastly, the whole "Here is SAS code, convert it to R" works pretty well in chatGPT, so give that a try - It's not perfect and it will fail (with great confidence), but it's a place to get started. Remember though, there is a stark difference between having "code which runs" and actually understanding the finer details of what it going, it's a classic "If you give a man a fish, you feed him for a day. If you teach a man to fish, you feed him for a lifetime"

Happy learning!

Thanks for the suggestions Leon.


You might also be interested in this cheatsheet.

If I understand correctly what you're doing, that can be done simply by merging 2 by 2, for example:

cdaib |>
  left_join(sfb, by = "usubjid") |>
  left_join(apb, by = "usubjid")

I'm not fully sure I understand what your data looks like in the first place (and I'm not familiar with SAS), I feel like what you're doing is probably an advanced use of pivot_wider().

But that would also depend if you expect each usubjid to correspond to exactly one row in each of the datasets etc.

As both a R and SAS user, this is how I would accomplish this task.

Two notes on differences between R and SAS in this example.

  1. Since paramcd is on all 3 datasets, it will not behave like in SAS. In SAS, it would get re-written by the next dataset if there's a merging record. In R, you'll end up with 3 variables parmacd.x, paramcd.y, and paramcd which come from the first, second and third dataset, respectively. If you no longer need paramcd, remove it from the select after filtering.

  2. Rounding function is different round(x, 0.1) in SAS rounds to the nearest 0.1. To accomplish the same in R, use round(x, 1) which rounds to the first decimal place.


cdaib <- visdatab |> filter(paramcd=="T_CDAI") |> rename(cdaib=aval) |> select(usubjid, paramcd, trtn, cdaib)
sfb <- visdatab |> filter(paramcd=="I_NLVSS") |> rename(sfb=aval) |> select(usubjid, paramcd, trtn, sfb)
apb <- visdatab |> filter(paramcd=="I_APSS") |> rename(apb=aval) |> select(usubjid, paramcd, trtn, apb)

viscdaib_ <- list(cdaib, sfb, apb) |>
  reduce(full_join, by='usubjid') |>
1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.