Skip to content

Instantly share code, notes, and snippets.

@johnwason
Last active September 13, 2023 09:23
Show Gist options
  • Select an option

  • Save johnwason/c3ede8a646aa8f3dfccf6b222615f2dc to your computer and use it in GitHub Desktop.

Select an option

Save johnwason/c3ede8a646aa8f3dfccf6b222615f2dc to your computer and use it in GitHub Desktop.
Software Packaging for Robotics

Python Packaging Best Practices

Python is a versatile and popular programming language. According to the TIOBE index, it is currently the world's most popular programming language. It has found extensive use and engineering and scientific applications, including robotics. Python is popular because it has clear/flexible syntax, a "batteries included" standard library, easy interfacing to native code, and a massive ecosystem of packages providing additional functionality. This document provides an overview of best practices for packaging Python robotics software for distribution.

There are many tutorials, blog posts, and whitepapers on Python packaging. This guide is intended to provide a quick overview of the most important aspects of packaging, with references to more detailed guides.

Python Package Formats

Python packages typically bundle the Python *.py files, resources files, and any additional native modules/libraries used by the package. Native modules and libraries are files containing machine code that can only operate on specific systems. These modules are typically developed using C/C++, Rust, Fortran, or other native compiling software language. The native code can be interfaced with Python using a number of methods, include ctypes, cffi, manual programming, or generative tools like SWIG.

Python packages can either be "source" or "binary".

  • Source packages: Contain Python script files, resource files, and the native source code for any native modules. Source packages typically compile on installation on a users computer.
  • Binary packages: Contain Python script files, resource files, and native modules/libraries. These packages are tied to specific operating systems and computer architectures.

For "pure python" packages, the distinction between source and binary is less strict, and typically a single package can run on all systems with a high enough Python version that supports all the required dependencies.

Python Module Design

Python packages should be designed carefully so that they are easily reused. Avoid unpredictable interactions between packages by being very deliberate in the design of the API and how dependencies are used.

Pure-Python Packaging Best Practices

This document focuses on the packaging pure-Python packages. These packages contain only Python script files (*.py) and resource files. Because there are no native modules, the package should run on any operating system and system architecture with a recent enough Python version that supports all the required dependencies.

See Also

Example Packages

The following packages will be used as examples:

PyPi and pip

PyPi (https://pypi.org) is the central repository for Python packages. The related tool pip is used to install packages, either from local files or the PyPi repository. It is included with Windows, but the python3-pip package may need to be installed on Linux.

Package File Structure

A python package will typically have the following package structure. In this example, the package name is my-example-package and it contain the module my_example_module

  • src/
    • my_example_module/
      • __init__.py
      • my_submodule.py
      • my_other_submodule.py
  • docs/
    • conf.py
    • index.rst
    • requirements.txt
    • example_module/
      • sphinx documentation files
  • test
    • test_my_example_module.py
  • examples/
    • examples using the module
  • scripts/
    • scripts used during development
  • .gitignore
  • .readthedocs.yaml
  • LICENSE
  • README.md
  • setup.py (and/or) pyproject.toml

This file structure represents one possible directory structure, but it is far from the only one possible. See the many other packaging guides for details on other possible options.

This package can be managed using pip and uploaded to PyPi if desired. Once installed, the package modules can be imported:

from my_example_module import my_submodule, my_other_submodule

These submodules are now imported. The contents of the __init__.py file will be placed in the my_example_module root package.

src/

The src/ directory contains the Python module and scripts. In this case, the my_example_module folder will be packaged. Some developers omit the src directory and have the module directory in the root of the repository. However, experience has shown it is better practice to use a src/ directory.

README.md

The readme file may be the most important file in the repository. It should give the user an overview of what the package does a quick guide of how to use it.

Common sections of a readme file:

  • Introduction
  • Documentation
  • Installation
  • Getting Started / Quick Start / Usage
  • Examples
  • Building (if applicable)
  • License

Even if there is no additional documentation, always include a good readme!

examples/

The examples directory should contain simple examples demonstrating how to use the module.

scripts/

When developing a module for a scientific project, there will often be scripts that are executed to accomplish some task that use the module. Place these general purpose scripts in the scripts/ directory, or make the module executable.

An better option to a scripts/ directory is to make modules executable. Running:

python -m my_example_module

will invoke the __main__.py module in the directory next to __init__.py. If developing a package that will be called from the command language, the -m option in a package is the preferred way to distribute. See also "entry points" in pyproject.toml or setup.py that can create callable command line programs from modules.

tests/

pytest is the most common testing framework used for Python. It will scan the test directory and find methods and classes that start the test and execute each one as a unit test. See Effective Python Testing With Pytest.

Python tests can be automatically executed by GitHub when push using GitHub Actions. See Building and testing Python for more information on Python testing on GitHub. Also see the example modules GitHub workflow files.

.gitignore

All Python packages should use Git, and regularly push to a Git repository for backup and sharing. The .gitignore file specifies files that should not be committed to git. A template gitignore can be found on GitHub: https://github.com/github/gitignore/blob/main/Python.gitignore

LICENSE

A standard license should be used. For open-source robotics packages, the BSD 3-clause or Apache 2.0 are the most common:

setup.py and pyproject.toml

setup.py and pyproject.toml contain the metadata and building instructions for the package. For pure-python packages running on Python 3.6 or greater, only a pyproject.toml file is required. See the example and links above for more information on creating a pyproject.toml.

setup.py can be used to compile or package native modules. This is an advanced use case and the specific solutions used by different developers can vary widely.

If non-Python resource files are used, it is necessary to specify they should be included.

See A Practical Guide to Setuptools and Pyproject.toml.

.readthedocs.yaml and docs/

Read-the-docs is a service that automatically compiles and serves documentation. The documentation is compiled using sphinx, a popular tool for documenting Python and other software languages. Sphinx uses Python Docstrings to document classes and methods. Docstrings are text literals in the source code that describes the class/method, and any parameters/return/exceptions. Sphinx can use a tool called autodoc to read these docstrings and generate the appropriate files.

The contents of the docs/ folder contains the files required to build the documentation. conf.py contains the instructions to build the documentation. The documentation can contain both API documentation and general instructions on how to use the package. See the Read the Docs tutorial and the example modules for instructions on using Read the Docs.

The example modules have additional code in conf.py to include the README.md file. This is recommended

Developing the Module

Modern Python development uses "virtual environments". Virtual environments create an isolated Python installation where packages can be installed and removed without affecting the system installation. All installed packages are local to the virtual environment, and different modules can have different virtual environments with different packages installed.

Create a virtual environment in the root of the repository:

python -m venv venv

This directory is ignored by the default .gitignore file so it will not be committed. Linux users may need to install python3-venv.

The venv can now be activated:

Windows:

venv/Scripts/activate

Linux:

source venv/bin/activate

The path of the local prompt is modified so that the local virtual environment is called when python is called.

Now, the module under development can be "editable" installed.

python -m pip install -e .

This means that the Python virtual environment will point to the files in scr/, rather than copying into venv/site-packages. This makes any changes to the files in src/ will apply immediately.

Any scripts executed while the virtual environment is active will use the source files in src/ for the module.

See also Python Virtual Environments: A Primer

Distributing the Package

There are several popular Python distribution options:

This section focuses on distribution and installation using pip.

The simplest installation option is to use the unpacked repository. If the repository is local, it can be simply installed:

python -m pip install .

This will copy the files to site-packages of the Python installation.

Another option is to install directly from the GitHub repository. For instance, to directly install the general_robotics_toolbox run:

python -m pip install git+https://github.com/rpiRobotics/rpi_general_robotics_toolbox_py.git

Keep in mind in these use cases, pip cannot automatically resolve dependencies that are not on pypi.org. They will need to be installed in order if only available on Git.

Uploading to pypi.org requires a "wheel" archive of the package. This can be accomplished using the "build" package:

Install the "build" and "twine" packages (one time only per virtual env)

python -m pip install build twine

Build the package:

python -m build

Upload the wheel to PyPi.org:

python -m twine upload dist/*

See Packaging Python Projects for more information.

Once uploaded, the package can be installed using pip from any machine:

python -m pip install my-example-package

Distributing Python packages for ROS is typically accomplished by adding a link to the Python PyPi project. See https://github.com/ros/rosdistro/blob/master/rosdep/python.yaml

Python Packaging Best Practices

  • Always include a README.md
  • Carefully design public API interface (classes and methods)
  • Use the formatting specified in PEP 8 - Style Guide for Python Code
  • Use a src/ directory
  • Use a unique and descriptive name for packages
  • If packages are not widely used, host on GitHub or a private package repository
  • Always use docstrings and type annotations where possible
  • Use a pyproject.toml or setup.py to define dependencies. If not relevant, use a requirements.txt file
  • Do not copy and paste code! Create reusable functions!
  • Support Python 3.7 or greater
  • For newer projects, use a pyproject.toml instead of setup.py if possible
  • Use a documentation system like Sphinx and/or Read the Docs to keep documentation up to date
  • Include additional instructions beyond the API documentation
  • Include the readme in the full documentation
  • Always use GitHub Actions to test your code an the supported platforms
  • Use file objects to read and write files rather than passing filenames
  • Use "relative" imports
  • Test in all configurations the software will run
  • Do not modify sys.path! Use the packaging system to place the packages in the correct locations

Software Packaging for Robotics

"Software packaging" is the process of creating reusable software components that can be shared with other developers or end users. Software development will typically focus on compiling and executing software on a single computer or a small set of development computer system with specific configuration. Software packaging needs to consider sharing software among many different computer systems, often with varying software and hardware configurations. Different computer systems often have different software installed, which may have different dependency versions or conflict with the software being distributed. Each computer architecture, computer language, operating system, and software ecosystem have different executable file formats, packaging file formats, and installer file formats. Operating systems have different application programming interfaces (APIs), and favor/support different programming languages. This complexity makes creating reusable packages an arduous and time consuming process. Creating a cross-platform project like Robot Raconteur with support for multiple operating systems and numerous software languages has been a difficult and time consuming process. This document provides an overview of some of the lessons learned, with a focus on creating Python packages.

Packaging Software for Developers and End Users

Software packaging focuses on distributing software for "developers" or for "end users". Software developers are typically consuming "software libraries" or "utilities". Software libraries contain reusable code that are integrated into application to provide some additional functionality. For instance, the Robot Raconteur Core library is integrated into a robotics application to provide communication functionality. Utilities are executables or other files that are not directly integrated into a software application, but are used for software development. For example, the C++ compiler is considered a utility. Development tools can be distributed as "source" or "binary". With "source" distributions, the developer is expected to compile the source code before use. "Binary" files are expected to be able to run without compilation, but are tied to a specific operating system. (Scripting languages like Python or JIT compiled languages like C# blur this distinction slightly, since the binaries or scripts are generally not tied to one specific type of computer.) These libraries and utilities are typically distributed to developers as "archives", "packages", or "installers".

  • Archives: Plain archives like Zip or Tar files containing the utilities/libraries. Plain archives are typically source code rather than compiled binaries. Linux uses "tarballs" as a minimal method to distribute source code. See https://www.thegeekstuff.com/2012/06/install-from-source/ . Archives can also contain binary
  • Packages: Packages are archives with specific structure that allows a "package manager" to download, unpack, and install the contents of the archive without significant user intervention. See Package Managers for more details on package managers. Some package managers will use installers instead of archives, but to the end user this distinction is minor. End user software for modern mobile devices are typically using package managers, such as the Google Play or Apple Store.
  • Installers: Installers are typically executables or special packages that provide the user an interactive experience when installing software. Typically the user will receive the software by downloading from the internet or on a disc. The user will run an executable, and a wizard will appear to guide the user through the installation process, asking the user to answer questions about how the software should be installed.

The exact method of distribution can vary significantly between operating systems, software ecosystems, and vendors. Because of this wide variation, significant problems can occur if conflicting files are installed, or if files are omitted from the package/installer.

Examples of Package Managers

The following section contains a non-exhaustive list of package managers and their use case.

Package Manager Source/Binary Operating System Programming Languages Command Line Tool Notes
Debian/Ubuntu APT Both Linux All apt-get Package repository for deb-based Linux operating systems
PyPi Both Cross-platform Python pip Package manager for the Python ecosystem
Nuget Binary Cross-platform C#, F#, .NET nuget Package manager for the .NET ecosystem
npm Source Cross-platform JavaScript, TypeScript npm Package manager for Node.js ecosystem
ROS Both Linux/Windows C++, Python rosdep ROS meta-operating system package manager, typically using deb or nuget packages
conda Binary Windows, Linux, Mac OS C/C++, Python, Rust, Java, C# conda Platform-independent packaging and distribution system for data analysis and robotics
vcpkg Compile-on-install Windows, Linux, MacOS, iOS, Android C/C++ vcpkg Compile-on-demand tool for managing C/C++ dependencies
homebrew Binary or compile-on-demand MacOS All brew Third-party package manager for MacOS with a focus on open-source command line tools and libraries

The previous list focused on package managers typically used by developers for open-source software. There are also many proprietary package managers used for different purposes:

| Package Manager | Source/Binary | Operating System | Programming Languages | Command Line Tool | Notes | | Chocolatey | Binary | Windows | All | choco | Third party package manager for Windows | | Google Play | Binary | Android | All | N/A | Package manager for Android ecosystem | | Apple App Store | Binary | iOS/macOS | All | N/A | Package manager for Apple ecosystem | | Mathworks File Exchange | Binary | Windows, Linux, MacOS | Third-party add-on manager for Matlab |

The large number of operating systems, package managers, programming languages, software frameworks, and software versions makes reliably packaging software extremely difficult.

Preferred Programming Language by Operating System

The following is a "preferred" list of programming language by operating system. There is no hard rule that a given programming language cannot be used on a given platform, since there are numerous projects providing workarounds. However, support is often third-party and can vary in quality.

  • Windows: C/C++, C# (.NET), Powershell
  • Linux: C/C++, Python, Perl, Bash
  • Mac OS (X): Swift, Objective-C, C/C++ (limited support for GUI applications with C/C++)
  • Android: Kotlin, Java, C/C++ (limited support, library only)
  • iOS: Swift, Objective-C, C/C++ (limited support for GUI applications with C/C++)
  • Arduino: C/C++, MicroPython

Projects like Python, .NET, and Java have excellent support on multiple platforms, so while they are not "preferred", in practice they are generally well supported on Windows, Linux, and Mac OS.

Library API and ABI Versioning Issues

Software libraries contain computer code, typically routines (functions) and/or classes. An application loads these libraries when started, and calls the routines as needed. The interaction between the library and the software application can vary greatly between different programming languages and compilation schemes. For C/C++, the library interface must match byte for byte the expected interface compared to when the application was compiled. Even slight changes can cause unpredictable behavior. Worse than crashes, an incompatible library can cause subtle instability or corruption in the running program that can be nearly impossible to detect. For a scripting language like Python or a JIT language like C#, there may be more flexibility, but errors may still occur if there is a version mismatch. The binary interface to a compiled library is called the Application Binary Interface (ABI). The programming interface consisting of function and classes is called an Application Programming Interface (API). For a compiled language, the ABI must match exactly, meaning that even recompiling the library with an unchanged API could potentially change the ABI and cause compatibility problems. Typically different versions of the same library will have incomptablie ABI unless extraordinary care is taken to ensure compatibly ABI. For non-compiled languages like Python, as long as the API and the behavior of the library doesn't change, in most cases there will not be a problem.

Managing ABI compatibility can be extraordinarily difficult. Applications will often be compiled against different versions of libraries. An individual library may depend on other libraries that then depend on different versions of a third set of libraries. This is something colloquially called "DLL Hell", where DLL is "Dynamic Used Library" used on Windows for shared libraries. DLL Hell was one of the major reasons early versions of Windows were so unreliable.

There are many strategies attempted to try to deal with maintaining shared library ABI compatibility:

  • Epochal Package Versioning (Debian, Ubuntu, ROS): Epochal (sic) versioning refers to selecting an epoch of all library versions for a release. For example, Ubuntu 22.04 or ROS Humble will have a strict set of library versions for that release. Any changes to the libraries must maintain the exact ABI to remain compatible. This solution has the benefit of maintaining ABI compatibility, but also means that packages are typically far out of date since they cannot be easily updated to newer versions. Patches to libraries are typically only for severe bugs or security issues.
  • Side-by-side Assemblies (Windows): Windows contains an esoteric system called "Side-by-Side Assemblies" that can manage multiple versions of the same library and load the correct version when requested.
  • Bundled Dependencies (Docker, Snap, Mobile Apps, Windows Apps): Bundled dependencies include all of the required shared libraries. Some of the more advanced systems like Docker create a fully isolated execution environment that is separate from the operating system. Windows also uses this strategy for distributing applications, but there can be a lot of problems with the search path for DLLS, causing DLL Hell.
  • Rolling Versioning (Conda-Forge): Conda Forge has a sophisticated system of "pinning" to keep tens of thousands of packages up-to-date while still maintaining ABI compatibility. Unfortunately so far this has been unreliable due to the extreme complexity of this approach.
  • Compile-on-demand (vcpkg, Homebrew, Gentoo): Developer tools like vcpkg and Homebrew can compile libraries on demand alongside the application being developed. These libraries are only mutually compatible and cannot be used by other applications. Gentoo is an example of an esoteric Linux distribution that compiles all packages upon instal and does not distribute any binary packages.
  • Static-linking (Rust, Go, C/C++ optionally): Static linking embeds all shared libraries directly into the main application so there is no need to load external libraries.
  • Just-In-Time (JIT) compilation (C#, Java): The binary code distributed is in an "intermediate" form, and is compiled to machine code when it is executed. This allows for more flexbility in the version of the loaded libraries.
  • Script Distribution (Python, JavaScript, Ruby): The packages distributed contain plain text script code and are intrepreted at runtime.

Packaging Burden

Compiling, testing, packaging, and distributing software is a heavy burden on software development. The complexity is made worse by the poor quality of the tools available for packaging. These tools tend to be complex, underdocumented, unreliable, and unpolished. The learning curve is often quite high as well. Even once the software is in a package, it still may not work on all the systems expected due to minor configuration problems with compilers or version mismatches. Extensive testing on computer systems with different configurations is required.

Because of how critical packaging of software libraries and applications is, it is an important skill that institutions need to take seriously and allocate enough resources. The best case scenario for poor software packaging is an annoyed customer or lost opportunity because the software is not available. The worst case possibilities include major software security events, data corruption, system outages, or even physical destruction when a controlled machine crashes! The overall point is that institutions need to take these issues seriously and not leave them as an afterthought.

ROS Packages

ROS is a "meta-operating system". It overlays existing operating systems like Linux and Windows, and provides additional libraries and services that are helpful for robotics. It is primarily geared towards Linux, with some support for Windows and Linux.

Part of ROS is an extensive package manager. This package manager accepts third-party library contributions, but is less open compared to community package managers like PyPi or Nuget.

ROS has its own internal package manager that uses Apt on Linux and Chocolatey on Windows. ROS operates its own repository servers and uses Jenkins to compile binary packages. The Apt packages use *.deb files, which are standard for Ubuntu and Debian.

There is another project called Robostack which uses conda-forge to distribute ROS instead of using operating-system specific package managers. This has some advantages compared to the ROS repositories, but has the disadvantage of observed decreased reliability and smaller package selection.

Creating Packages

See these other pages for discussions on creating packages:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment