Dependency Management in R and Python

Flavio Hafner

2025-01-30

Overview

URL: https://edu.nl/4xx6w

Why track dependencies

  • You want your code still to run in 2 years
    • Functionality change; bugfixes
  • A project might depend on the specific version of a package
    • Other projects could use newer versions
  • Make your tools portable
    • For yourself
    • For your co-authors

The high-level idea

DIY Project R/Python Project
🏪 Hardware store Repository (CRAN, PyPI)
🗄 The storage shelf in your garage The folder with Python/R libraries
🧰 Toolboxes (the drill box, the box of screwdrivers) Packages (ggplot2, etc.)
🪑 Workbench Environment: renv, venv

The environment is an isolated workspace with all tools

  • An environment is defined by a file with metadata on
    • Which packages, which versions
    • From where (repository, github)
  • Metadata are stored in requirements and lock files
    • \(\Rightarrow\) Recreate the dependencies in the same way
  • This metadata should be under version control
    • So that you can track changes and share with others

R: Using renv

Create a new project in RStudio

  1. Install renv globally with
install.packages("renv")
  1. Create new project with version control (git)
    (File -> New Project -> Version Control -> Git)
  1. Open the project in a new session

Install and track packages with renv

  1. Attach renv to your R session with
library(renv)
  1. Initialize the renv
renv::init(bare = TRUE)

This creates

  • renv/ – A new folder that serves as the library of packages for your project.
  • .Rprofile – This file makes sure that once renv is turned on for a project, it stays on.

Create lock file to log project state

renv::snapshot()

yields something like

{
  "R": {
    "Version": "4.4.2",
    "Repositories": [
      {
        "Name": "CRAN",
        "URL": "https://cloud.r-project.org"
      }
    ]
  },
  "Packages": {
      "rlang": {
      "Package": "rlang",
      "Version": "1.1.4",
      "Source": "Repository",
      "Repository": "CRAN",
      "Requirements": [
        "R",
        "utils"
      ],
      "Hash": "3eec01f8b1dee337674b2e34ab1f9bc1"
    }
  }
}

The lock file records dependencies only when used in scripts

  1. Install new packages with
renv::install(dplyr)
  1. Use it in a script
library(dplyr) # or require(dplyr)
  1. Update lock file
renv::snapshot()

Restoring and checking

Source: rstudio.github.io

Python: Using venv and pip

Python Environment Hell from XKCD (Creative Commons Attribution-NonCommercial 2.5 License)

Overview

  • PyPI is the Python Package Index, the analogue to R’s CRAN
    • But CRAN reviews submissions while PyPI does not
  • pip is the Python package manager and interacts with PyPI
    • Other tools do the same job
  • pip + venv are available by default for Python 3.3+

Getting started



Use the command line: a terminal or from an IDE (VS Code)



Make sure you can invoke Python

$ python3 --version # on Mac/Linux
$ python --version # on Windows — Windows installation comes with a python.exe file rather than a python3.exe file 

Creating the virtual environment

Run

$ python3 -m venv venv

What does this do?

  • Creates a folder venv to which packages are installed
  • (The -m flag calls the main part of the venv module)

Activate the virtual environment

$ source venv/Scripts/activate
(venv) $
$ source venv/bin/activate
(venv) $

Install packages with pip

Run

(venv) $ python3 -m pip install numpy
(venv) $ python3 -m pip install matplotlib


To display information about some installed packages

(venv) $ python3 -m pip show numpy

To display information about all installed packages

(venv) $ python3 -m pip list

Creating requirements.txt files

(venv) $ python3 -m pip freeze > requirements.txt

We can inspect the file

(venv) $ cat requirements.txt
contourpy==1.2.0
cycler==0.12.1
fonttools==4.45.0
kiwisolver==1.4.5
matplotlib==3.8.2
numpy==1.26.2
packaging==23.2
Pillow==10.1.0
pyparsing==3.1.1
python-dateutil==2.8.2
six==1.16.0

Restore an environment from requirements.txt

(venv) $ python3 -m pip install -r requirements.txt

PyPI vs conda

PyPI conda
Easy to use & create packages Not only Python, can use external libraries
Does not track Python version Tracks Python version
Complicated with external libraries Heavy – use miniconda/mamba; packaging is harder

… And try to keep them apart whenever possible

Conclusion

Good practices

  • Have the environment inside your project directory
  • In Python, keep your base environment clean
  • When adding new dependencies, do not pin versions:
# prefer
(venv) $ python3 -m pip install numpy
# over
(venv) $ python3 -m pip install numpy==1.26.2
  • But keep track of exact versions in lock file and commit often
    • Share with your co-authors

Other considerations

  • Adding dependencies is always a make-or-buy decision
    • Don’t reinvent the wheel?
  • Before adding dependencies,
    • Make sure they work as intended
    • Check if they are maintained and the size of the user base

References & further reading

For R:

For Python:

Hands-on: ideas

  • Start using an environment for an existing project
  • Play around in a dummy project
    • If you’re using pip or conda, try the other, or try uv