I am trying to read a table in AWS Athena. I am using RAthena to establish a connection.
conn <- RAthena::dbConnect(
drv = RAthena::athena()
schema_name = "Info-Gateway-Prod-Catalog",
s3_staging_dir = "s3://mgic-pipelines-aws-analytics-qa/mgic_analytics/gse_mgic_matching_algorithm/athena/query_output/rathena",
)
I am able to read from the table using dbGetQuery()
# message("\n", ">> gse_tbl_string = ", gse_tbl_string)
query_string <- 'SELECT * FROM "Info-Gateway-Prod-Catalog"."processed_data"."l3_gse_acquisition_combined" limit 1;'
results <- RAthena::dbGetQuery(conn, query_string)
I am able to read from a table in the AwsDefaultCatalog using tbl(conn, in_schema())
gse_tbl_string <- in_schema(
"mgic_analytics",
"dw_cert_curr"
)
gse_rtbl_in_schema <- tbl(conn, gse_tbl_string) %>% head(1)
When I try to read the same table and explicitly define the catalog as
gse_tbl_string <- in_catalog(
"AwsDataCatalog",
"processed_data",
"l3_gse_acquisition_combined"
)
gse_rtbl_in_catalog <- tbl(conn, gse_tbl_string) %>% head(1)
The following error message is returned
Error in py_call_impl(callable, dots$args, dots$keywords) :
EntityNotFoundException: An error occurred (EntityNotFoundException) when calling the GetTable operation: Database awsdatacatalog not found.
Detailed traceback:
File "/home/ec2-user/anaconda3/envs/gse_mgic_matching_algorithm/lib/python3.9/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/ec2-user/anaconda3/envs/gse_mgic_matching_algorithm/lib/python3.9/site-packages/botocore/client.py", line 676, in _make_api_call
There are two things that are interesting about the error message.
- The Data Catalog is reported as a Database.
- The Data Catalog is reported in all lowercase letters. I have verified that the case is not an issue when reading the table with dbGetQuery()
I have verified that I can access the tables with Python using boto3 calls with the same IAM role. I am running these tests in a Docker container with the following setup
# The following test script is run in a Docker container
#
# >>> uname -a > /opt/ml/processing/output/uname_22-11-17-22-24.log
#
# Linux 6242217335f5 4.14.296-222.539.amzn2.x86_64 #1
# SMP Wed Oct 26 20:36:53 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# "Package": "tidyverse", "Version": "1.3.1",
# "Package": "RAthena", "Version": "2.6.0",
# "Package": "DBI", "Version": "1.1.1",
# "Package": "dplyr","Version": "1.0.10",
What am I doing wrong?
Is there a bug in the in_catalog() function?
Is there something I can do differently?
My total R experience is debugging this issue, which is about 24 hours. Any help you can provide is greatly appreciated.
Thank you,
Bob