Some Python libraries such as ctds depend on external libraries written in C/C++ like FreeTDS. In a typical scenario where the dependency is installed through the system's package manager, the dependency libraries will be placed in a location such as /usr/lib64 where the Python library can find it upon import. However, AWS Glue/Lambda does not allow the installation of system packages. In some cases, we can copy the shared object files to another location and use LD_LIBRARY_PATH to point to the new library directory, however, Glue/Lambda also does not allow developers to configure the run command.
- Attempting to add the desired Python library to a AWS Glue Python Shell job through the
--additional-python-modulesoption will cause an error as pip will try to build a wheel for the library but will not have the the dependencies necessary to do so. In the case ofctds, the error was:fatal error: sybdb.h: No such file or directorysince files provided by FreeTDS are not available. - Instead, we will build the wheel for the library in a Docker environment that is similar to the runtime environment of the Glue job:
# Adapted from https://randywestergren.com/building-pymssql-freetds-for-lambda/ FROM amazonlinux ENV INSTALLDIR='/tmp/freetds' RUN yum update -y RUN yum install wget tar gzip zip gcc make gcc gcc-c++ python39-devel unixODBC-devel -y RUN mkdir $INSTALLDIR build RUN wget 'https://www.freetds.org/files/stable/freetds-patched.tar.gz' && \ tar -xzf freetds-patched.tar.gz && \ cd freetds-* && \ ./configure --prefix=${INSTALLDIR} --with-openssl=$(openssl version -d | sed -r 's/OPENSSLDIR: "([^"]*)"/\1/') && \ make && make install ENV CPPFLAGS="-I/usr/include/python3.9m" RUN pip install --upgrade pip RUN mkdir /opt/python/ RUN yum install -y perl RUN wget https://www.openssl.org/source/openssl-1.1.1w.tar.gz && \ tar -zxf openssl-1.1.1w.tar.gz && \ cd openssl-1.1.1w && \ ./config && make && \ cp libcrypto.so.1.1 libssl.so.1.1 /opt/python/ && \ cd .. RUN pip download ctds && \ tar -xzf ctds-1.14.0.tar.gz RUN cd ctds-1.14.0 RUN cp ${INSTALLDIR}/lib/libct.so.4 ${INSTALLDIR}/lib/libsybdb.a ${INSTALLDIR}/lib/libsybdb.so.5 . RUN CTDS_INCLUDE_DIRS=${INSTALLDIR}/include \ CTDS_LIBRARY_DIRS=${INSTALLDIR}/lib \ CTDS_RUNTIME_LIBRARY_DIRS=${INSTALLDIR}/lib \ python setup.py build_ext --rpath=/glue/lib/installation && \ python setup.py bdist_wheel --universal RUN cp *.whl /opt/python/ - Run
docker build -t ctds .to build everything - Run
docker run --rm --entrypoint bash -v $PWD:/local pymssql-3-9 -c "cp -R /opt /local"to copy the built Python library to your host machine - For Lambda layer: move libraries from
opt/python/*.sotoopt/lib/*, then zip the python & lib directories to create Lambda layer - For Glue wheel: zip everything from
opt/python/*toctds-1.14.0-cp39-cp39-linux_x86_64.whl- Upload the whl file to S3 and point the
--additional-python-modulesproperty of the Glue job to the S3 path
- Upload the whl file to S3 and point the
- Running
python setup.py build_ext --rpath /glue/lib/installationconfigures the Python library to automatically look in this directory for the libraries it needs. Skipping this line will require the Glue script to manually load the dependency libraries before importing the Python library.# With `build_ext --rpath` import ctds print(ctds.__version__) # Without `build_ext --rpath` # Load order matters from ctypes import * cdll.LoadLibrary('/glue/lib/installation/libcrypto.so.1.1') cdll.LoadLibrary('/glue/lib/installation/libssl.so.1.1') cdll.LoadLibrary('/glue/lib/installation/libsybdb.so.5') cdll.LoadLibrary('/glue/lib/installation/libct.so.4') import ctds print(ctds.__version__)