Unable To Read Table With Data Catalog Using dplyr::tbl(con, in_catalog())

I am trying to read a table in AWS Athena. I am using RAthena to establish a connection.

conn <- RAthena::dbConnect(
    drv            = RAthena::athena()
    schema_name    = "Info-Gateway-Prod-Catalog",
    s3_staging_dir =  "s3://mgic-pipelines-aws-analytics-qa/mgic_analytics/gse_mgic_matching_algorithm/athena/query_output/rathena",

I am able to read from the table using dbGetQuery()

# message("\n", ">> gse_tbl_string        = ", gse_tbl_string)
query_string <- 'SELECT * FROM "Info-Gateway-Prod-Catalog"."processed_data"."l3_gse_acquisition_combined" limit 1;'
results <- RAthena::dbGetQuery(conn, query_string)

I am able to read from a table in the AwsDefaultCatalog using tbl(conn, in_schema())

gse_tbl_string <- in_schema(
gse_rtbl_in_schema <- tbl(conn, gse_tbl_string)  %>% head(1)

When I try to read the same table and explicitly define the catalog as

gse_tbl_string <- in_catalog(
gse_rtbl_in_catalog <- tbl(conn, gse_tbl_string) %>% head(1)

The following error message is returned

Error in py_call_impl(callable, dots$args, dots$keywords) :
  EntityNotFoundException: An error occurred (EntityNotFoundException) when calling the GetTable operation: Database awsdatacatalog not found.
Detailed traceback:

  File "/home/ec2-user/anaconda3/envs/gse_mgic_matching_algorithm/lib/python3.9/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)

  File "/home/ec2-user/anaconda3/envs/gse_mgic_matching_algorithm/lib/python3.9/site-packages/botocore/client.py", line 676, in _make_api_call

There are two things that are interesting about the error message.

  1. The Data Catalog is reported as a Database.
  2. The Data Catalog is reported in all lowercase letters. I have verified that the case is not an issue when reading the table with dbGetQuery()

I have verified that I can access the tables with Python using boto3 calls with the same IAM role. I am running these tests in a Docker container with the following setup

# The following test script is run in a Docker container
#  >>> uname -a > /opt/ml/processing/output/uname_22-11-17-22-24.log
#     Linux 6242217335f5 4.14.296-222.539.amzn2.x86_64 #1
#     SMP Wed Oct 26 20:36:53 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

#  "Package": "tidyverse", "Version": "1.3.1",
#  "Package": "RAthena",  "Version": "2.6.0",
#  "Package": "DBI", "Version": "1.1.1",
#  "Package": "dplyr","Version": "1.0.10",

What am I doing wrong?
Is there a bug in the in_catalog() function?
Is there something I can do differently?

My total R experience is debugging this issue, which is about 24 hours. Any help you can provide is greatly appreciated.

Thank you,


Hi @bkraft4257,

I am the author of RAthena, thanks for using the package :D. I believe you have found a bug within RAthena sorry about that. Please raise an issue on Issues · DyfanJones/RAthena · GitHub.

Many thanks,


1 Like