I've been getting used to using tidymodels
for about a couple of years now, and it has been a joy to use it. tidymodels
makes 1) collaborative coding such a breeze due to its Tidy coding flow, reducing a lot of necessities for commenting and documentation and 2) bringing everything into one place. It sure beats using any Python package, at least for now.
The current concern and dilemma I'm facing is its next steps and how it compares to the currently trendy Python DS packages. I personally would like to stick with R simply because sticking with one is leaner (both in terms of computing overhead and mental capacity) therefore easier to specialise, and the saturation of the Tidy philosphy in R. And of course, compared to using all kinds of separate libraries and syntax/philosophy for each library years ago, tidymodels
makes R such an attractive language. However, as much as I would like to use R solely, I also enjoy using the tidypolars
package in Python, so perhaps the more accurate way of describing this whole preference would be that I personally would like to stick with the Tidy philsophy. This is where tidymodels
comes in. (though definitely not an expert, but let's say I got tired of learning new languages/syntax after a dozen or so, though I'm keen on Rust these days)
When looking at high-performance Tidy libraries/packages, there is tidytable
that replaces dplyr
/data.table
and tidypolars
that replaces pandas
/polars
for me. Python lacks something like tidymodels
and this makes it an easy decision to stick R simply due to the necessities of collaborative coding; coding together in a team of different specialists is a must these days and that makes the Tidy philosophy a must for me in a team so that learning advanced R/Python isn't such a barring requirement.
However, I'm concerned about the steps beyond. How could R stay as the only language I need? Personally, the lack of all-in-one Tidy solution for GPU and out-of-memory are the biggest hindrances. Sure, there are solutions, but they are not part of tidymodels
(meaning no Tidy philosophy and not all-in-one solution) and one of the reasons is CRAN not supporting GPU. And it sure does remind me of the old days of R where decentralised chaos was the norm, but seems to be the only path for GPU/out-of-memory solutions.
So out of curiosity, how do you feel about the trendy phrase 'you should know both Python and R' in the academia and industry? Do you feel that the concerns I stated above are similar to what you're currently facing? What are your biggest reasons for relying on Python (besides market trends/popularity) instead of solely using R? I would really like to learn about your thoughts.