Skip to content

Instantly share code, notes, and snippets.

@MoffKalast
Forked from peci1/pain.md
Last active April 29, 2025 11:11
Show Gist options
  • Select an option

  • Save MoffKalast/8141e90841d57174571e6f4902b67378 to your computer and use it in GitHub Desktop.

Select an option

Save MoffKalast/8141e90841d57174571e6f4902b67378 to your computer and use it in GitHub Desktop.
ROS 2 migration pain points

Launch

  • Where the hell do I find complete docs for XML launch files?

  • use_sim_time hell (i.e. no support for global parameters)

  • $(dirname) is broken and unreliable (ros2/launch#618)

  • ugly testing for nonempty XML launch args ($(eval 'bool(\'$(var var_name)\')'))

  • in Python launch, launch_arguments of IncludeLaunchDescription ingest booleans as 'true'/'false', i.e. strings!

    • launch_arguments=[('use_sim_time', 'true'),]
    • and the error you get if you pass an actual boolean is super cumbersome
  • cannot use parameters which are lists of dicts or other non-trivial types (ros2/ros2#1380)

    • the error you get is difficult to understand
  • I haven't found any docs on writing custom launch actions (and it took me quite some time to get a little into it)

    • what's missing at minimum is a doc that would tell what are all the steps needed to register the action
      • setup.cfg, make and install as a Python package even if it's a C++ one otherwise, add the decorator...
  • No XSD schema for launch_ros with all its actions, only for launch base:

  • Setting <arg value> in an <include> will also overwrite the arg in the rest of the launch file (ros2/launch#620 (comment))

    • Workaround: wrap each <include> within its own <group>.
    • It is described in the migration guide, but its meaning is not really obvious to people who want to dive into ROS 2 without prior knowledge:
      • Available in ROS 1, included content was scoped. In ROS 2, it’s not. Nest includes in group tags to scope them.

  • newly created launch files are not detected until recompilation even when using symlinked directories (which was very convenient ROS 1 behaviour)

rclpy

QoS

  • I haven't found any complete documentation about qos_overrides parameter. There are some bits and pieces, but e.g. no docs page explains to you that deadline is in nanoseconds.

  • tutorials and most tools hardcode "10" as the default QoS for subscribers which means reliable transport is required, and unreliable publishers cannot send data to them since they're not compatible; subscribers should always be set to best effort if not specified otherwise

    • this is an annoyance where the QoS is redefinable on the fly (e.g. Rviz2), but a major problem in cases of 3rd party packages where connecting unreliable topics just isn't possible without forking the codebase
  • having a volatile and transient local publisher on the same topic makes them impossible to receive properly by the same subscriber, because those two durability policies aren't compatible

    • conceptually one would want a subscriber to connect to a specified topic without any extra information given, since that information can be inferred from the publisher, and if there's multiple publishers, the broadest possible compatibility config should be chosen
  • subscribers should not declare topic QoS unless strictly specified, but adapt to the first publisher instead

CLI tools

  • ros2 topic hz is unusable under FastRTPS because it is in Python and it deserializes all messages (https://discourse.ros.org/t/rmw-zenoh-binaries-for-rolling-jazzy-and-humble/41395/18)

  • tab completion is slower and overly verbose in selection, the pregenerated message content is incorrect and cannot be used as-is

  • CLI tools silently ignore RMW_IMPLEMENTATION variable and only talk to the running daemon. So if you first launch something with one RMW and then want to run something else with another RMW, all CLI tools still silently use the first RMW. To solve that, call ros2 daemon stop.

  • ros2 node kill does not exist because of endless bickering around killing composite nodes

    • the bar for doing better than ps aux | grep nodename | awk '{print $2}' | xargs kill -9 which we need to resort to like cavemen is extremely low, at the very least ros2 should contain a cli link to something roughly equivallent that tracks node PIDs at least as reliably and pack it into a cli command for convenience
    • idea for a more proper approach: every node could implicitly (see: without ANY extra declarations) be lifecycle node in active state by default and implement only the deactivate trigger that allows it to be called by node kill, and that could simply stop rlcpy/rclcpp and call the shutdown hook to perform basic cleanup, proper lifecycle nodes could then override this default behaviour if needed
      • this would allow killing nodes cleanly with while maintaining backwards compatibility
      • it would let you kill nodes on different machines
      • it could stop individual composite nodes while leaving the process intact for the rest
      • containers could be killed themselves (while calling all nodes contained within recursively) as a way to just remove the entire set
  • ros2cd packagename has been added as a 3rd party tool but has yet to be rolled into colcon tools

package types

ros_gz

  • server can be launched as a composable node, but the client can't (need to use Executable)
  • topic bridges cannot be launched as composable nodes with the bridge specification passed as arg/parameter (only through YAML config)
    • there is no way how the YAML config of a bridge could be parametrized by world and robot name, which makes these configs practically useless (gazebosim/ros_gz#717)
  • topic bridges do not allow easy setting of QoS profiles (but that can be overcome by setting qos_overrides manually)
    • quite important for /clock bridge!
  • it is quite difficult to understand how changing a sensor frame_id works, because you have to use gz_frame_id, but that is an undocumented SDF element.
  • <export>ing Gazebo resource paths in package.xml no longer works

Components

  • component_container is single-threaded (it even rins its own load/unload callbacks on the single thread)! Use component_container_mt or component_container_isolated.

Colcon

  • no support for explicit --extend as in catkin_tools: colcon/colcon-core#393

  • default behaviour requires recompiling for every file change since directories are deep copied and it seems like most people end up using --symlink-install eventually and despair and suffer until they learn of it, so it would make sense to swap the two, making linking the default, and e.g. --no-symlink to swap back to the current default

micro-ROS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment