Export a table from DuckDB to Parquet via arrow package

Hello everyone

I would like to ask a question about exporting DuckDB tables to Apache Parquet format using R programs. As indicated in Arrow for R : : CHEAT SHEET the write_dataset function seems to be very handy in this regard, in particular thanks to its partitioning argument which lets me to specify the whole set of keys (that is, specific columns in my table) to be used when the data is split based on Parquet format. While I was checking the online documentation of DuckDB, I found at chapter Parquet the following command

COPY (SELECT * FROM tbl) TO 'result-snappy.parquet' (FORMAT 'parquet');

This seems quite interesting, because in the environment where I am working the amount of available RAM for my program, will never be enough to first load the entire data in a data.frame and then build the Parquet from that. What I need is loading the data from the database directly into the disk in Parquet format and if I understand correctly, apparently this is what the above COPY command allows to do if it is called via DBI::dbExecute. Yet, I didn't find any way to specify partitioning (Hive style) to define Parquet keys before exporting the data. Is there any way to do that?

Thanks in advance

Finally, I found the answer here :

Apparently this functionality will be in the next release of DuckDB

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.