Skip to content

Instantly share code, notes, and snippets.

@lwang
Created October 25, 2023 05:25
Show Gist options
  • Select an option

  • Save lwang/584bed2a663ae7901c84d95aa5e0e3ee to your computer and use it in GitHub Desktop.

Select an option

Save lwang/584bed2a663ae7901c84d95aa5e0e3ee to your computer and use it in GitHub Desktop.
Using Python libraries with shared library dependencies on AWS Glue and Lambda

Using Python libraries with shared library dependencies on AWS Glue and Lambda

Some Python libraries such as ctds depend on external libraries written in C/C++ like FreeTDS. In a typical scenario where the dependency is installed through the system's package manager, the dependency libraries will be placed in a location such as /usr/lib64 where the Python library can find it upon import. However, AWS Glue/Lambda does not allow the installation of system packages. In some cases, we can copy the shared object files to another location and use LD_LIBRARY_PATH to point to the new library directory, however, Glue/Lambda also does not allow developers to configure the run command.

Package the Python library and dependencies into a wheel/layer using Docker

  • Attempting to add the desired Python library to a AWS Glue Python Shell job through the --additional-python-modules option will cause an error as pip will try to build a wheel for the library but will not have the the dependencies necessary to do so. In the case of ctds, the error was: fatal error: sybdb.h: No such file or directory since files provided by FreeTDS are not available.
  • Instead, we will build the wheel for the library in a Docker environment that is similar to the runtime environment of the Glue job:
    # Adapted from https://randywestergren.com/building-pymssql-freetds-for-lambda/
    FROM amazonlinux
    
    ENV INSTALLDIR='/tmp/freetds'
    
    RUN yum update -y
    RUN yum install wget tar gzip zip gcc make gcc gcc-c++ python39-devel unixODBC-devel -y
    RUN mkdir $INSTALLDIR build
    
    RUN wget 'https://www.freetds.org/files/stable/freetds-patched.tar.gz' && \
        tar -xzf freetds-patched.tar.gz && \
        cd freetds-* && \
        ./configure --prefix=${INSTALLDIR} --with-openssl=$(openssl version -d | sed  -r 's/OPENSSLDIR: "([^"]*)"/\1/') && \
        make && make install
    
    ENV CPPFLAGS="-I/usr/include/python3.9m"
    RUN pip install --upgrade pip
    RUN mkdir /opt/python/
    
    RUN yum install -y perl
    RUN wget https://www.openssl.org/source/openssl-1.1.1w.tar.gz && \
        tar -zxf openssl-1.1.1w.tar.gz && \
        cd openssl-1.1.1w && \
        ./config && make && \
        cp libcrypto.so.1.1 libssl.so.1.1 /opt/python/ && \
        cd ..
    
    RUN pip download ctds && \
        tar -xzf ctds-1.14.0.tar.gz
    
    RUN cd ctds-1.14.0
    
    RUN cp ${INSTALLDIR}/lib/libct.so.4 ${INSTALLDIR}/lib/libsybdb.a ${INSTALLDIR}/lib/libsybdb.so.5 .
        
    RUN CTDS_INCLUDE_DIRS=${INSTALLDIR}/include \
        CTDS_LIBRARY_DIRS=${INSTALLDIR}/lib \
        CTDS_RUNTIME_LIBRARY_DIRS=${INSTALLDIR}/lib \
        python setup.py build_ext --rpath=/glue/lib/installation && \
        python setup.py bdist_wheel --universal
    
    RUN cp *.whl /opt/python/
    
  • Run docker build -t ctds . to build everything
  • Run docker run --rm --entrypoint bash -v $PWD:/local pymssql-3-9 -c "cp -R /opt /local" to copy the built Python library to your host machine
  • For Lambda layer: move libraries from opt/python/*.so to opt/lib/*, then zip the python & lib directories to create Lambda layer
  • For Glue wheel: zip everything from opt/python/* to ctds-1.14.0-cp39-cp39-linux_x86_64.whl
    • Upload the whl file to S3 and point the --additional-python-modules property of the Glue job to the S3 path
  • Running python setup.py build_ext --rpath /glue/lib/installation configures the Python library to automatically look in this directory for the libraries it needs. Skipping this line will require the Glue script to manually load the dependency libraries before importing the Python library.
    # With `build_ext --rpath`
    import ctds
    print(ctds.__version__)
    
    # Without `build_ext --rpath`
    # Load order matters
    from ctypes import *
    cdll.LoadLibrary('/glue/lib/installation/libcrypto.so.1.1')
    cdll.LoadLibrary('/glue/lib/installation/libssl.so.1.1')
    cdll.LoadLibrary('/glue/lib/installation/libsybdb.so.5')
    cdll.LoadLibrary('/glue/lib/installation/libct.so.4')
    import ctds
    print(ctds.__version__)
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment