I've been trying to find a solution for this for over a year already and decided to write a post about it. Any help would be appreciated. Here is the pseudocode that I can do easily in Stata and SAS but I don't know how to do in R. {} is the glue-like operator that was introduced into dplyr this year.
In Stata, I can easily do with local or global macros like this:
local df_name "mtcars"
then reference it as `df_name'
In SAS I can do it with global macros like this:
%LET df_name=mtcars;
then reference it like &df_name.
Please note how visually easy to parse these approaches are when the macro is references.
Both approaches allow to use them in dataset names, functions, variables, etc. Simplifies my code tremendously and saves me tons of time. How to do this with visual simplicity in R? My code should be readable for people familiar with Stata/SAS (dplyr is awesome in this regard!) and too many evals, wrapping everything in functions, assigns with parentheses will just make them give up on the project or force me to change back to SAS/Stata.
Would really appreciate any help in this matter! I had this problem back in 2009 with R and gave up on R until I had to come back in 2019 and still can't find an easy way to approach this.
Short of writing a domain specific language to translate SAS/STRATA statements into well-formed R statements, there is no satisfactory solution to making R obvious to thinkers in those languages.
Instead, consider f(x) = y, the foundational paradigm of R. x is what is at hand, y is what is desired and f is the first-class object to transform one to the other. While a deeply composed fβi(h(g(f(x))) can cause eyes to glaze over, R does not insist on wrapping it all up in a single statement, and %>% is there to make the compositional steps easier to follow, as well as intermediate object names.
Better than either is to use a basket of functions to abstract away the syntax-related barriers.
# the input, an arbitrary data frame
names(mtcars)
#> [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
#> [11] "carb"
# the output desired, count
nrow(mtcars['mpg'])
#> [1] 32
# another output desired, result of division
mtcars['mpg']/2
#> mpg
#> Mazda RX4 10.50
#> Mazda RX4 Wag 10.50
#> Datsun 710 11.40
#> Hornet 4 Drive 10.70
#> Hornet Sportabout 9.35
#> Valiant 9.05
#> Duster 360 7.15
#> Merc 240D 12.20
#> Merc 230 11.40
#> Merc 280 9.60
#> Merc 280C 8.90
#> Merc 450SE 8.20
#> Merc 450SL 8.65
#> Merc 450SLC 7.60
#> Cadillac Fleetwood 5.20
#> Lincoln Continental 5.20
#> Chrysler Imperial 7.35
#> Fiat 128 16.20
#> Honda Civic 15.20
#> Toyota Corolla 16.95
#> Toyota Corona 10.75
#> Dodge Challenger 7.75
#> AMC Javelin 7.60
#> Camaro Z28 6.65
#> Pontiac Firebird 9.60
#> Fiat X1-9 13.65
#> Porsche 914-2 13.00
#> Lotus Europa 15.20
#> Ford Pantera L 7.90
#> Ferrari Dino 9.85
#> Maserati Bora 7.50
#> Volvo 142E 10.70
# function to transform input to output
find_count <- function(x,y) nrow(x[y])
find_count(mtcars,"mpg")
#> [1] 32
# function to divide input
divide_it <- function(x,y,z) x[y]/z
divide_it(mtcars,"mpg",2)
#> mpg
#> Mazda RX4 10.50
#> Mazda RX4 Wag 10.50
#> Datsun 710 11.40
#> Hornet 4 Drive 10.70
#> Hornet Sportabout 9.35
#> Valiant 9.05
#> Duster 360 7.15
#> Merc 240D 12.20
#> Merc 230 11.40
#> Merc 280 9.60
#> Merc 280C 8.90
#> Merc 450SE 8.20
#> Merc 450SL 8.65
#> Merc 450SLC 7.60
#> Cadillac Fleetwood 5.20
#> Lincoln Continental 5.20
#> Chrysler Imperial 7.35
#> Fiat 128 16.20
#> Honda Civic 15.20
#> Toyota Corolla 16.95
#> Toyota Corona 10.75
#> Dodge Challenger 7.75
#> AMC Javelin 7.60
#> Camaro Z28 6.65
#> Pontiac Firebird 9.60
#> Fiat X1-9 13.65
#> Porsche 914-2 13.00
#> Lotus Europa 15.20
#> Ford Pantera L 7.90
#> Ferrari Dino 9.85
#> Maserati Bora 7.50
#> Volvo 142E 10.70
Created on 2020-10-09 by the reprex package (v0.3.0.9001)
Thank you for the reply. I don't need to translate SAS/State statements to R, though. I am looking for a visually simply way to execute the pseudocode above in R. Quosures, !!, curly curly, NSE -- that's I've been digging at for the last year and still have no idea how to use them to execute the pseudocode above in its entirety. I can do the {var} part somewhat in the dplyr pipeline but i can't do anything with LHS of the <- and/or base R.
Sounds like you want to treat names as strings so you can modify them , add suffixes to them. In R id think to use glue or paste to construct the symbols and use assign/get to make objects with those names and retrieve them from the environment respectively
This sounds very interesting! Can you show me an example using the pseudocode above? I read on SO that "get" is not recommended anymore, and too much verbiage (assign and etc) in the code makes the code hard to read. I'm in academic research so a lot of new data, with extensive data checks and dead ends in the code usually several thousands of lines long and any visual simplification would be very welcome.
I'm surprised this post hasn't produced 10s of replies.
I use both SAS and R regularly. Sometimes, it's best not to mimic the behavior of another language. To be able to do things multiple times on the different inputs, use functions in R. Using the idea of SAS macros isn't quite the same because all that SAS macros do is that they replace text before compiling which isn't how R operates. I don't have a great answer for your question other than using get and assign which do get verbose because it's not natural in R to do what you're suggesting. I'm sorry this isn't a great answer and not what you're looking for.
I wonder if storing your set of data frames as a list rather than as separate objects in your environment might make this a little easier? Not totally clear if the following is what you're after, but I think it works for your pseudocode example. It does not read as idiomatic dplyr code to my eye, but maybe it's closer to what your colleagues are used to in SAS or Stata?
How would you solve this via functions in a visually easy-to-understand way? It seems with {{ and !!, tidyverse is finally moving in the right direction even without functions just in straight dplyr pipelines. Not sure if this was an intended consequence though.
Thank you for the reply!! I thought of packing everything in lists but it would make everything unfortunately more convoluted. I'm not that good in R and lists just add a layer of complexity in time-constrained situations and make the code less readable to colleagues. If i could do this:
The whole macro approach, though, strikes still strikes me as profoundly a-functional except in the case that there is the same hardwired object name to be passed to several different functions. Better, I assert, is to compose a piped function designed to operate in a purrr::map context to take an arbitrary object signifier.
Sean Connery in The Untouchables: βThatβs the Chicago way.β
"same hardwired object name to be passed to several different functions" -- I agree this is exactly the process here. As I mentioned earlier it's not for production it is for academic research with a lot of pipes, dead ends, intermediate checks, intermediate summary tables where the initial dataset is checked in an "as needed" fashion against posterior versions of the dataset with different transformations. I tried to wrap things in functions but the overhead is too much to handle and code gets too cluttered. If the goal was production-ready code, I would of course aim for the function-oriented approach. This is a valuable discussion.