When using dbWriteTable with noctua, is it possible to not update the metastore?

I am using the noctua package to send data to AWS S3 using DBI interface. E.g.

dbWriteTable(conn = con_s3,
               name = paste0("revenue_predictions.", game_name),
               value = cohort_data,
               append = T,
               overwrite = F, 
               file.type = "json",
               partition = c(year = yr, month = mt, day = d),
               s3.location = "s3://ourco-emr/tables/revenue_predictions.db", # not our real name :)
               max.batch = 100000)

This works, my dataframe cohort_data is sent to S3.

Noctua appears to update the Athena meta store with txt files so that when I login to S3, I see files of the form:

Since we use the hive meta store and not the Athena meta store, these are not needed since I add partitions with a separate hive_con.

These files exist in the same directory as my table paste0("revenue_predictions.", game_name). I tried deleting all of the .txt meta files and left just the directory with the json and my queries still run, so these meta store txt files are not needed as far as I can tell.

Assuming it is noctua attempting to update the Athena meta store that creates all of these files, is there a way to prevent noctua::dbWriteTable() from creating these?

Hi @dougfir,

Currently noctua only integrates with the default AWS Glue for it's metadata catalog. As I don't have Hive metadata store it is abit difficult to dev this possible new feature.

To get this feature even possible, AWS Athena would need to be able to register the table with Hive instead of AWS Glue. Currently I am not aware that this is possible, if you know a way I am happy to start developing a possible solution.

There is some promising development in AWS Athena preview that could possibly enable this. However I don't want to develop for the AWS Athena preview as it can change dramatically.

For the metadata files I can look over the code to see if there is any S3 resource that is getting away from the clear up.

I was able to identify some metadata not getting captured in the AWS S3 clear down. I have now updated this the dev version of the package (thanks for bringing this to my attention):

remotes::install_github("dyfanjones/noctua")
1 Like

Thank you @larefly, I downloaded that updated noctua version above then sent several dbWriteTable() calls with noctua today and those meta data .txt files seem to have been cleared up.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.