SSL For RDS With Glue Python Job and AWS SDK For Pandas

This blog post is the result of a recent interaction with AWS Support. As always they were very helpful in resolving the issue.

AWS SDK For Pandas

Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses and Databases.

I was looking to use the integration with AWS Glue to use a glue connection within some Python ETL code. The connection in my case was to an Amazon RDS PostgreSQL database.

For example:

import sys
import awswrangler as wr
import pandas as pd
con_postgresql = wr.postgresql.connect(connection=”My-RDS-PostgreSQL-Connection”)
con_postgresql.close()

The theory was that the connection could be defined in Glue once and used by multiple AWS Glue ETL

Amazon RDS Ready — Encryption Requirements

This program requires that products follow AWS security, availability, reliability, performance and other architecture best practices while integrating with Amazon RDS.

At CDL our software has been accredited as Amazon RDS Ready and we apply these standards when developing new solutions. Specifically on Data encryption the Amazon RDS Ready states:

DBCONN-004 — Data Encryption:
For business applications where data encryption is a requirement for security compliance, the product must support encryption of data at rest and in transit for Amazon RDS.

At CDL we ensure that data to RDS is encrypted in transit by setting the rds.force_ssl parameter to 1. See Using SSL with a PostgreSQL DB instance — Amazon Relational Database Service

Attempting an SSL Connection From Glue To RDS

This is done via the legacy glue connection screen in the console as this allows us to test the connection.

Glue Connection

As you can see running the test works.

Glue Connection OK

The Problem

import sys
import awswrangler as wr
import pandas as pd
con_postgresql = wr.postgresql.connect(connection=”My-RDS-PostgreSQL-Connection”)con_postgresql.close()

Running the job will return errors about SSL. I got a couple of different errors when trying to debug different versions of the code.

SSL Error
SSL Error

After a bit of back and forth with AWS Support trying to debug the issue the service team identified the following.

Currently, awswrangler loads and uses default SSL configuration for creating boto3 session clients.

It was clear from the errors we receive that this default did not include the Amazon RDS Root CA.

To overwrite a default configuration, it’s possible to use the connect() function in awswrangler that allows to pass an SSL context.

We need to download the RDS root certificate and point to it.

import sys
import awswrangler as wr
import pandas as pd
import ssl
import os
import urllib.request
def download_rds_root_ca(filename: str):
print(“Downloading RDS CA root cert…”)
urllib.request.urlretrieve(‘https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem', filename=filename)
print(“Downloaded RDS CA root cert.”)
def create_rds_ssl_context():
cafile = ‘/tmp/rds-ca-2019-root.pem’
download_rds_root_ca(cafile)
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS);
ssl_context.verify_mode = ssl.CERT_REQUIRED;
ssl_context.load_verify_locations(cafile=cafile, capath=None, cadata=None)
return ssl_context
print(“Connecting to RDS database…”)
rds_ssl_context = create_rds_ssl_context()
con_postgresql = wr.postgresql.connect(connection=”My-RDS-PostgreSQL-Connection”, ssl_context=rds_ssl_context)
print(“Successfully connected to RDS database.”)

Run With SSL

Job Run OK
Job Run Logs

--

--

Data Architect @CDL_Software , AWS Community Builder, 13 x AWS Certified. Qlik Global Luminary 50.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Matt Houghton

Data Architect @CDL_Software , AWS Community Builder, 13 x AWS Certified. Qlik Global Luminary 50.