Using Athena Views As A Source In Glue

Whilst working with AWS Glue recently I noticed that I was unable to use a view created in Athena as a source for an ETL job in the same way that I could use a table that had been cataloged.

The error I received was this.

Rather than try and recreate the view using a new PySpark job I used the Athena JDBC drivers as a custom JAR in a glue job to be able to query the view I wanted to use.

This blog are my notes on how this works.


Create or reuse an existing S3 bucket to store the Athena JDBC drivers JAR file. The JAR files are available to download from AWS. I used the latest version which at the time of writing was JDBC Driver with AWS SDK AthenaJDBC42_2.0.27.1000.jar (compatible with JDBC 4.2 and requires JDK 8.0 or later).


The Glue job will need not only Glue Service privileges but also IAM privileges to access the S3 Buckets and also the AWS Athena Service.

For Athena this would provide Glue will full permissions.

Create Glue ETL Job

My use case for the Glue job was to query the view I had and save the results into Parquet format to speed up future queries against the same data.

The following code allows you to query an Athena view as a source for a data frame.

The key things in this code snippet to be aware of are.

We are telling Glue which class within the JDBC driver to use.

This uses the IAM role assigned to the Glue job to authenticate to Athena. You can use other authentication method like AWS_ACCESS_KEY or federated authentication but using IAM I think makes most sense for an ETL job that will most likely run on a schedule or event.

I am using Athena in Ireland (EU-WEST-1) if you are using a different region update this accordingly.

The fully qualified name of view in your Athena catalog. It’s in the format of ‘AwsDataCatalog.Database.View’. For example this query run in Athena.

You would set the dbtable option to this

The last option tells Glue which S3 location to use as temporary storage to store the data returned from Athena.

At this point you can test it works. When running the job you need to tell Glue about the location for the Athena JDBC drivers JAR file that was uploaded to S3.

If you are working in the AWS Glue Console the parameter to set can be found under Job Details → Advanced → Dependent JARs path.

The parameter needs to be set to the full path and filename of the JAR file. For example s3://yours3bucket/jdbc-drivers/AthenaJDBC42_2.0.27.1000.jar

By setting this in the console it ensures that the correct argument is passed into the Glue job.

The final code including the conversion to Parquet format looked like this.

Originally published at on February 16, 2022.



Data Architect @CDL_Software , AWS Community Builder, 13 x AWS Certified. Qlik Global Luminary 50.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Matt Houghton

Data Architect @CDL_Software , AWS Community Builder, 13 x AWS Certified. Qlik Global Luminary 50.