Python equivalent to R package

When I write R-functions I have the habit to put them in packages, which is super-smooth with help packages such as devtools and usethis.

Recently I have been writing some Python code and I was wondering if anyone know what the equivalent to an R package in Python is? I asked some software engineers at my office but they didn't really get the question, since they don't know that much about R. I get that you can save Python-functions in modules and call them but that seems pretty unstructured?

1 Like

My impression is modules is basically it, but they can be really quite complex. Take a look at the source of pandas, for example. Ultimately it's python scripts in nested directories defining different namespaces. There's no need for a devtools equivalent to build, since you don't need to make a package binary, and a lot of usethis features are also less needed because you don't have to conform to a particular directory structure. Documentation you can do with docstrings, but it seems as though mostly you don't rely on documentation in-place and instead look to other locations.

1 Like

The standards for structuring Python packages are much less strict than in R. One way is to use Setuptools.

2 Likes

@filipwastberg You are right that it is possible to distribute Python functions in a Python package similar to how it works with R packages.

Let's say you have a Python module named mypandas.py that contains some custom data analysis functions. You want to make it easier to use them in different projects and also to share with collaborators. To do this, create a new directory (the name doesn't matter, but I name it mypandas for consistency), move the file there, and create a file called setup.py.

mypandas/
├── mypandas.py
└── setup.py

Then add the following to setup.py:

from distutils.core import setup

setup(name='mypandas',
      version='0.1.0',
      py_modules=['mypandas'],
      install_requires=['pandas']
      )

This specifies the following:

  1. The package name is "mypandas"
  2. The version is 0.1.0
  3. The package consists of the Python module mypandas.py
  4. The package depends on pandas

Then you can run pip install . to install the package (it will install pandas if it isn't already installed). Now you can run import mypandas from anywhere on your machine, without having to worry about the current working directory or setting PYTHONPATH.

To share the package with your colleagues, you can run python setup.py sdist, and then send them the file mypandas-0.1.0.tar.gz. They can download it, extract it, and then run pip install mypandas-0.1.0.tar.gz, which will install mypandas and pandas.

This is the simplest case with only one module. Similar to R packages, the organization of Python packages can get complex. To get started with a more complex Python package setup, you can use the Python package cookiecutter (this works similarly to usethis::create_project()):

pip install cookiecutter
cookiecutter https://github.com/audreyr/cookiecutter-pypackage.git

Here are some resources I found useful:

1 Like

Thanks. Those links are very useful. I'm used to be able to create simple R-packages that I can distribute internally via Github. Trying to do the same for Python.

Found this link as well which I think looks useful. https://uoftcoders.github.io/studyGroup/lessons/python/packages/lesson/

Maybe I'll write a blog post about this to investigate the differences.

1 Like

For more resources on Python packaging, see this Twitter thread started by @robinsones

I wrote about a blog post about my experience building a company internal module in Python. http://dataland.rbind.io/2019/06/25/python-packages-for-r-users/

Hope this can be helpful for anyone trying to do the same.

1 Like