I love tidymodels. What does it need beyond?

I've been getting used to using tidymodels for about a couple of years now, and it has been a joy to use it. tidymodels makes 1) collaborative coding such a breeze due to its Tidy coding flow, reducing a lot of necessities for commenting and documentation and 2) bringing everything into one place. It sure beats using any Python package, at least for now.

The current concern and dilemma I'm facing is its next steps and how it compares to the currently trendy Python DS packages. I personally would like to stick with R simply because sticking with one is leaner (both in terms of computing overhead and mental capacity) therefore easier to specialise, and the saturation of the Tidy philosphy in R. And of course, compared to using all kinds of separate libraries and syntax/philosophy for each library years ago, tidymodels makes R such an attractive language. However, as much as I would like to use R solely, I also enjoy using the tidypolars package in Python, so perhaps the more accurate way of describing this whole preference would be that I personally would like to stick with the Tidy philsophy. This is where tidymodels comes in. (though definitely not an expert, but let's say I got tired of learning new languages/syntax after a dozen or so, though I'm keen on Rust these days)

When looking at high-performance Tidy libraries/packages, there is tidytable that replaces dplyr/data.table and tidypolars that replaces pandas/polars for me. Python lacks something like tidymodels and this makes it an easy decision to stick R simply due to the necessities of collaborative coding; coding together in a team of different specialists is a must these days and that makes the Tidy philosophy a must for me in a team so that learning advanced R/Python isn't such a barring requirement.

However, I'm concerned about the steps beyond. How could R stay as the only language I need? Personally, the lack of all-in-one Tidy solution for GPU and out-of-memory are the biggest hindrances. Sure, there are solutions, but they are not part of tidymodels (meaning no Tidy philosophy and not all-in-one solution) and one of the reasons is CRAN not supporting GPU. And it sure does remind me of the old days of R where decentralised chaos was the norm, but seems to be the only path for GPU/out-of-memory solutions.

So out of curiosity, how do you feel about the trendy phrase 'you should know both Python and R' in the academia and industry? Do you feel that the concerns I stated above are similar to what you're currently facing? What are your biggest reasons for relying on Python (besides market trends/popularity) instead of solely using R? I would really like to learn about your thoughts.

For me, the data wrangling, pipe, statistics and notebook features of R and the tidyverse outweigh its deficiencies in pure ML. In my experience, the success of a project relies more on clarity of approach, data preparation, domain knowledge, and feature engineering than pure ML power, so my preferences run accordingly.

pandas is extremely clunky compared to dplyr. Working in dplyr and purrrr is soooo much nicer.

I hate how many data types exist in Python with disorganized mix of OO and functional programming. In R, once you learn how to work with data frames, vectors, and lists, you're golden.

The one thing I'm jealous of is scikitlearn. I'm not crazy about tidymodels. Would love a simpler interface to its tuning and CV capabilities (without recipes and data preparation) without having to learn all the tidymodels functions and how to pipe them in sequence. Also, more support for newer methods, and possibly a direct link to scikitlearn via reticulate.

1 Like

In my experience, the success of a project relies more on clarity

I 100% agree with you there. This necessitates Tidy, I think.

pandas is extremely clunky compared to dplyr

Then you might want to look into tidypolars in Python! It's great package -- same author as tidytable.

Python with disorganized mix of OO and functional programming

I also agree with this, though perhaps from a different angle. Simply put, people with different backgrounds prefer different data types, and that makes things a bit more clutter-y in a collaborative coding environment, not to mention lambda functions (like anonymous functions), which makes things unreadable.