AWS Glue does not provide an easy way of adding the Microsoft ODBC Driver to your Python Shell Glue job. To allow pyodbc to recognize our driver, we need to upload the driver's shared library files to a location the Glue job can access.
- Compile the Microsoft ODBC Driver for SQL Server. I am using the Docker image for AWS Lambda since the environment for Glue is likely also Amazon Linux 2 based.
FROM public.ecr.aws/lambda/python:3.9 as builder RUN yum update -y RUN yum install wget tar gzip zip gcc make gcc gcc-c++ unixODBC-devel -y RUN curl https://packages.microsoft.com/config/rhel/7/prod.repo | tee /etc/yum.repos.d/mssql-release.repo RUN ACCEPT_EULA=Y yum install -y msodbcsql17 RUN ACCEPT_EULA=Y yum install -y mssql-tools
docker build -t msodbcsql17 .docker run --rm --entrypoint bash -v $PWD:/local msodbcsql17 -c "cp -R /opt/microsoft/msodbcsql17 /local"
- Create a
odbcinst.inifile containing the following information. /tmp/ is the location we will extract the driver to on the AWS Glue runtime.[ODBC Driver 17 for SQL Server] Description=Microsoft ODBC Driver 17 for SQL Server Driver=/tmp/msodbcsql17/lib64/libmsodbcsql-17.10.so.5.1 UsageCount=1
- Zip
odbcinst.iniand themsodbcsql17directory zip -r msodbcsql17.zip odbcinst.ini msodbcsql17
- Upload the zip to an accessible location on S3. In my case, I added the zip to the script's GitLab repo which puts it into the same S3 location as the script file on deployment.
- In the advanced options for your Glue Job, point the
Referenced files pathto the S3 location of your zip file.
- Within your actual script, unzip the driver to /tmp/ at runtime. PyODBC will now be able to see the SQL Server driver but will not be able to see the default MySQL and PostgreSQL drivers. If you also need access to them, edit the
odbcinst.inito include the default values in/etc/odbcinst.ini.import os, zipfile for d in [d for d in os.listdir('/tmp') if d.startswith('glue-python-libs-')]: path = os.path.join('/tmp', d) file = 'msodbcsql17.zip' if os.path.isfile(os.path.join(path, file)): with zipfile.ZipFile(os.path.join(path, file), 'r') as z: z.extractall('/tmp') os.environ['ODBCSYSINI'] = '/tmp' import pyodbc print(pyodbc.drivers())