How to use Python in R ?

Hi,
I would like to use Python in R. Does anybody here know how to use Python on win 10 ?

Mainly I want to do it because of that reason:
https://community.rstudio.com/t/please-share-some-use-cases-why-we-would-use-reticulate-python-and-r/94575

I started to use google Colab but would prefer to do it in RStudio.
In Colab keyboard shortcuts don't work, there is no environment pane, etc.
I would be grateful for any hints how to use it in RStudio.

You can install just python from here

Or you can install the Anaconda distribution witch includes other software as well like JupyterLab, Jupyter Notebooks, etc.

Once you have Python installed in your system, you can use it from RStudio in some ways:

  • Import Python modules and/or scripts from whitin R code with the reticulate package.
  • Write Python scripts and execute the code in a Python console ( REPL).
  • Include Python code chuncks in Rmd or qmd files.

Hi, thank you for reply,

Just a few questions:

  1. I have got R dataframe - do I need to convert it to Python's dataframe somehow, when I want to do something with it in Python? Does the same apply to vector ?
  2. If I have R object saved as RDS or RData - will Python recognize it ?
  3. What does actually reticulate do ? Helping to connect R and Python ?
  4. Is there R to Python "translator", eg. I know how to do something in R, I want to have it translated into Python's code ? Is it possible ?
  5. Does the tidyverse's pipe exist in Python ?
  6. About making plots: is it like in R, I mean: ggplot(Iris, "rest of the code") or Python's plot is being build step by step by adding and running next, additional lines od code ?

Thank you in advance for reply.

No, objects are stored in separate environments but you can access R objects from within Python code using the r object (e.g. r.x would access to x variable created within R from Python).

You usually would read it in R and access the content in Python as explained above. Still, it s possible that a Python library I'm unaware of exists that allows reading directly from these formats.

From the documentation:

Reticulate embeds a Python session within your R session, enabling seamless, high-performance interoperability.

I'm not aware of such a tool but that doesn't mean it doesn't exist.

There is no tidyverse in Python since it is an R package but very likely there are Python libraries implementing similar functionality.

Python, like R, is open-source with a lot of independent contributors, so functionalities depend on the specific library you are using, there are plotting libraries based on ggplo2 like this one for example:

https://plotnine.readthedocs.io/en/stable/

For this I can say that chatGPT is a good tools, but not is perfect. Im try in occasion but not is 100 % perfect. Is a tool for start with basics things.

Put in chatGPT this and see the results: make a histogram in R and python.

Can you please show an example how to do it ?

Can you provide a reproducible example for an specific use case? The exact way depends on how you are using Python in RStudio.

Otherwise there are plenty of examples in the documentation

This is not code translation per se, it is freely creating code in both languages to accomplish the same task but it is not translating one into the other.

Yes, I can,
I want to find best theoretical distribution for the "serving" data:

library(fitdistrplus)
data(groundbeef)
serving <- groundbeef$serving

using that approach (package distfit):
https://towardsdatascience.com/how-to-find-the-best-theoretical-distribution-for-your-data-a26e5673b4bd

There is "serving" dataframe in R. There is no R package called distfit. There is a distfit package in Python. What to do next , what would be the best way to do it in RStudio ?

I don't want to install unnecessary packages on my Windows setup so I have tested this on Posit Cloud but it should be very similar for you:

library(reticulate)
library(fitdistrplus)
#> Loading required package: MASS
#> Loading required package: survival

data(groundbeef)
serving <- groundbeef$serving

# You only need to do this once
#conda_create("my_env")
#conda_install("my_env", "distfit")


use_condaenv("my_env")

np <- import("numpy")
distfit <- import("distfit")

X <- np_array(serving, dtype='int64')

dfit <- distfit$distfit(method='parametric', todf=TRUE)
dfit$fit_transform(X)
#> $model
#> $model$name
#> [1] "gamma"
#> 
#> $model$score
#> [1] 0.0003573497
#> 
#> $model$loc
#> [1] -2.20644
#> 
#> $model$scale
#> [1] 17.63285
#> 
#> $model$arg
#> $model$arg[[1]]
#> [1] 4.301304
#> 
#> 
#> $model$params
#> $model$params[[1]]
#> [1] 4.301304
#> 
#> $model$params[[2]]
#> [1] -2.20644
#> 
#> $model$params[[3]]
#> [1] 17.63285
#> 
#> 
#> $model$model
#> <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x7f726d698eb0>
#> 
#> $model$bootstrap_score
#> [1] 0
#> 
#> $model$bootstrap_pass
#> NULL
#> 
#> $model$color
#> [1] "#e41a1c"
#> 
#> $model$CII_min_alpha
#> [1] 25.00746
#> 
#> $model$CII_max_alpha
#> [1] 142.0396
#> 
#> 
#> $summary
#>          name        score          loc       scale                 arg
#> 1       gamma 0.0003573497     -2.20644    17.63285            4.301304
#> 2        beta 0.0003606702     5.400327    336.2081 2.670049, 10.487716
#> 3  genextreme 0.0003626564     57.57065    29.49865          0.03931055
#> 4    dweibull 0.0003783728     67.92109    32.61583            1.487238
#> 5     lognorm  0.000392554     9.963838    52.78125           0.8290418
#> 6           t 0.0004097055     72.87649    35.60593            37.17807
#> 7        norm 0.0004128576      73.6378    35.82408                NULL
#> 8    loggamma 0.0004144338     -8538.13    1221.852            1151.153
#> 9      pareto   0.00055225 -34359738358 34359738368           539926600
#> 10      expon 0.0005522501           10     63.6378                NULL
#> 11    uniform 0.0005753646           10         190                NULL
#>                                       params                         model
#> 1             4.301304, -2.206440, 17.632847 <environment: 0x55d9afd77a00>
#> 2  2.670049, 10.487716, 5.400327, 336.208112 <environment: 0x55d9afd7ad78>
#> 3       0.03931055, 57.57065027, 29.49864839 <environment: 0x55d9afd7a2c0>
#> 4             1.487238, 67.921091, 32.615829 <environment: 0x55d9afd7d638>
#> 5           0.8290418, 9.9638375, 52.7812468 <environment: 0x55d9afd7cb80>
#> 6               37.17807, 72.87649, 35.60593 <environment: 0x55d9afd7c0c8>
#> 7                         73.63780, 35.82408 <environment: 0x55d9afd7f440>
#> 8              1151.153, -8538.130, 1221.852 <environment: 0x55d9afd7e988>
#> 9       539926600, -34359738358, 34359738368 <environment: 0x55d9afd7ded0>
#> 10                          10.0000, 63.6378 <environment: 0x55d9afd81248>
#> 11                                   10, 190 <environment: 0x55d9afd80790>
#>    bootstrap_score bootstrap_pass   color
#> 1                0           NULL #e41a1c
#> 2                0           NULL #e41a1c
#> 3                0           NULL #377eb8
#> 4                0           NULL #4daf4a
#> 5                0           NULL #984ea3
#> 6                0           NULL #ff7f00
#> 7                0           NULL #ffff33
#> 8                0           NULL #a65628
#> 9                0           NULL #f781bf
#> 10               0           NULL #999999
#> 11               0           NULL #999999
#> 
#> $histdata
#> $histdata[[1]]
#>  [1] 0.0035018649 0.0045793618 0.0215499378 0.0021549938 0.0175093245
#>  [6] 0.0002693742 0.0086199751 0.0013468711 0.0070037298 0.0008081227
#> [11] 0.0002693742 0.0000000000 0.0008081227
#> 
#> $histdata[[2]]
#>  [1]  17.30769  31.92308  46.53846  61.15385  75.76923  90.38462 105.00000
#>  [8] 119.61538 134.23077 148.84615 163.46154 178.07692 192.69231
#> 
#> 
#> $size
#> [1] 254
#> 
#> $alpha
#> [1] 0.05
#> 
#> $stats
#> [1] "RSS"
#> 
#> $bins
#> [1] "auto"
#> 
#> $bound
#> [1] "both"
#> 
#> $name
#> [1] "popular"
#> 
#> $method
#> [1] "parametric"
#> 
#> $multtest
#> [1] "fdr_bh"
#> 
#> $n_perm
#> [1] 10000
#> 
#> $smooth
#> NULL
#> 
#> $weighted
#> [1] TRUE
#> 
#> $f
#> [1] 1.5
#> 
#> $n_boots
#> NULL
#> 
#> $random_state
#> NULL

Created on 2023-02-12 with reprex v2.0.2

Yes, it is, thank you. This output looks a bit messy, is it possible to put it in like broom:tidy or something to get that output in more orderly way ?
I understand from that code of yours and output as well, that whilst writing in *.py file in R we change R library to Python's import and Python's dot to R $ ? Is it correct ?
One more question, does Python recognize R objects ? I mean I want mtcars in Python, how do I convert it to Python's dataframe ?

I suppose it is, but I have never worked with that specific python library and it's output so me finding out how is pretty much the same as you finding out how on your own. Have in mind that I'm importing Python modules into R so the output is just an R object.

I think I don't understand what you mean, if you write code in a .py file it means you are writing Python code, ergo, you have to use Python syntax which is completely independent from R.

reticulate makes interoperability possible, when you are importing python modules into R (like I did in my example) the data sharing is transparent to the user but if you are writing Python code you can access R objects trough the special r object (as long as you are running Python using reticulate) so it would be something like r.mtcars, given that mtcars is already loaded into the global environment of the R session.

I must have mixed up something, but what I meant was for example that in R there is library(tidyverse), but in Python we do have import pandas, in R it was written up there in your post:

dfit$fit_transform(X)

but in Python it is:

dfit.fit_transform(X)

This is what I meant, apologies for confusion as this is all new to me.
Thank you for your kind replies.

Not sure if this is helpful but when you use Posit Cloud there's a built in function that allows you to code with Python. It's free and on the cloud

Can you please, explain it more, I never used Posit Cloud.

Posit Cloud is a service that provides an RStudio or Jupyter Notebooks session running on the cloud, I believe @rtoppenh is referring to the latter.

Jupyter Notebooks is similar to using rmarkdown or Quartobut it doesn't have that much interoperability among languages

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.