Skip to main content

Athena workgroup upgrade

Background

On 28th April Athena workgroups will be upgraded to Athena engine version 3 due to a requirement from Amazon Web Services (AWS). The default workgroup that all Analytical Platform (AP) users currently use is set to version 2. Version 3 comes with a number of improvements but also some breaking changes that could impact analytical projects that run SQL queries using Athena.

How do I prepare for the new version?

We have created a new testing workgroup (test-athena-v3) that uses Athena engine version 3 which is available to all AP users that have standard database access. You can and should use this new workgroup to test any impact upgrading to the new version may have on your existing work. You can find guidance below on how to use this new workgroup in your projects.

How do I use the testing workgroup?

AWS Console

In the AP control panel navigate to the bottom and click on the AWS resources link:

In the AWS console search for Athena and click on the result. In the query editor, find the workgroup dropdown in the top right corner of the page and select test-athena-v3.

Queries you run in the query editor will now use the upgraded engine.

Python

For python based projects that make use of pydbtools or awswrangler to query Athena you can set an environment variable to switch which workgroup your queries use by default. To implement this environment variable you would run the following in your terminal:

export WR_WORKGROUP=test-athena-v3

You can also set this environment variable in your python script but you must do so before you import pydbtools and / or awswrangler e.g.

import os

os.environ["WR_WORKGROUP"] = "test-athena-v3"

import awswrangler
import pydbtools as pydb

If for any reason you need to set the workgroup once either of these modules have been imported then you can either:

  • set the workgroup using the awswrangler config
import awswrangler as wr

wr.config.workgroup = "test-athena-v3"
  • explicitly set the workgroup in the function call

import pydbtools as pydb

df = pydb.read_sql_query( "SELECT * FROM database.table", workgroup="test-athena-v3", )

R

If you are using dbtools to query Athena then you must set an environment variable to use the new workgroup. This environment variable must be set before the dbtools library is first called. One way of making sure this happens is by setting the environment variable in your .Rprofile file in your directory as follows:

Sys.setenv(WR_WORKGROUP="test-athena-v3")

If you are using Rdbtools then the environment variable you need to set is slightly different:

Sys.setenv(AWS_ATHENA_WORK_GROUP="test-athena-v3")

Airflow

If your code is running as part of an airflow pipeline then you can set the appropriate environment variable (see R / Python sections above) in the KubernetesPodOperator for your task:

tasks["my-task"] = KubernetesPodOperator(
    ...,
    env_vars={
        ...,
        "WR_WORKGROUP": "test-athena-v3",  # for python
        "AWS_ATHENA_WORK_GROUP": "test-athena-v3",  # for R
        ...,
        },
)

How can I check if my queries are running on the correct version?

One way of checking that you are using the updated workgroup is to run this simple SQL statement:

SELECT contains_sequence(ARRAY [1,2,3,4,5,6], ARRAY[1,2]) AS test;

contains_sequence is new and the query will fail on version 2 but will return a result when using version 3.

What if I’m using create-a-derived-table for my work?

Please check the #ask-data-modelling slack channel for announcements related to work on upgrading to version 3. Further comms will be sent out at a later date.

What should I do if I get stuck?

Please post a question on the #ask-data-engineering slack channel and hopefully someone can help you with your questions.

Where can I find out more about Athena engine version 3?

Please have a look at the official AWS documentation which can be found here.

This page was last reviewed on 9 March 2023. It needs to be reviewed again on 9 May 2023 by the page owner #ask-data-engineering .