Skip to content

Instantly share code, notes, and snippets.

@lwang
Created October 23, 2023 01:34
Show Gist options
  • Select an option

  • Save lwang/ad699fddabeea7e43a717c1433816672 to your computer and use it in GitHub Desktop.

Select an option

Save lwang/ad699fddabeea7e43a717c1433816672 to your computer and use it in GitHub Desktop.
Adding the Microsoft ODBC Driver for SQL Server to an AWS Glue Python Shell Job for use with pyodbc

Adding the Microsoft ODBC Driver for SQL Server to an AWS Glue Python Shell Job for use with pyodbc

AWS Glue does not provide an easy way of adding the Microsoft ODBC Driver to your Python Shell Glue job. To allow pyodbc to recognize our driver, we need to upload the driver's shared library files to a location the Glue job can access.

Step 1

  • Compile the Microsoft ODBC Driver for SQL Server. I am using the Docker image for AWS Lambda since the environment for Glue is likely also Amazon Linux 2 based.
    FROM public.ecr.aws/lambda/python:3.9 as builder
    
    RUN yum update -y
    RUN yum install wget tar gzip zip gcc make gcc gcc-c++ unixODBC-devel -y
    RUN curl https://packages.microsoft.com/config/rhel/7/prod.repo | tee /etc/yum.repos.d/mssql-release.repo
    RUN ACCEPT_EULA=Y yum install -y msodbcsql17
    RUN ACCEPT_EULA=Y yum install -y mssql-tools
  • docker build -t msodbcsql17 .
  • docker run --rm --entrypoint bash -v $PWD:/local msodbcsql17 -c "cp -R /opt/microsoft/msodbcsql17 /local"

Step 2

  • Create a odbcinst.ini file containing the following information. /tmp/ is the location we will extract the driver to on the AWS Glue runtime.
    [ODBC Driver 17 for SQL Server]
    Description=Microsoft ODBC Driver 17 for SQL Server
    Driver=/tmp/msodbcsql17/lib64/libmsodbcsql-17.10.so.5.1
    UsageCount=1

Step 3

  • Zip odbcinst.ini and the msodbcsql17 directory
  • zip -r msodbcsql17.zip odbcinst.ini msodbcsql17

Step 4

  • Upload the zip to an accessible location on S3. In my case, I added the zip to the script's GitLab repo which puts it into the same S3 location as the script file on deployment.
  • In the advanced options for your Glue Job, point the Referenced files path to the S3 location of your zip file.

Step 5

  • Within your actual script, unzip the driver to /tmp/ at runtime. PyODBC will now be able to see the SQL Server driver but will not be able to see the default MySQL and PostgreSQL drivers. If you also need access to them, edit the odbcinst.ini to include the default values in /etc/odbcinst.ini.
    import os, zipfile
    for d in [d for d in os.listdir('/tmp') if d.startswith('glue-python-libs-')]:
        path = os.path.join('/tmp', d)
        file = 'msodbcsql17.zip'
        if os.path.isfile(os.path.join(path, file)):
            with zipfile.ZipFile(os.path.join(path, file), 'r') as z:
                z.extractall('/tmp')
    os.environ['ODBCSYSINI'] = '/tmp'
    import pyodbc
    print(pyodbc.drivers())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment