archive 1.1.2

This is a companion discussion topic for the original entry at https://www.tidyverse.org/blog/2021/11/archive-1-1-2


archive 1.1.0 is now on CRAN. archive lets you work with file archives, such as ZIP, tar, 7-Zip and RAR and compression formats like gzip, bzip2, XZ and Zstandard. It does this by building on top of the libarchive C library.

You can install it from CRAN with:

install.packages("archive")

This blog post will explain the main functions of archive, and show how you can use them to read from and write to archives.

You can see a full list of changes in the release notes

library(archive)
my_dir <- fs::file_temp() |> fs::dir_create()
knitr::opts_knit$set(root.dir = my_dir)

Displaying archive contents

Use archive() to return a tibble of the files contained in a given archive.

archive("nycflights13.zip")
#> # A tibble: 5 × 3
#>   path                       size date               
#>                                      
#> 1 nycflights13/airlines.csv   386 2021-11-04 15:09:55
#> 2 nycflights13/airports.csv 71209 2021-11-04 15:09:55
#> 3 nycflights13/flights.csv  90886 2021-11-04 15:09:56
#> 4 nycflights13/planes.csv   72927 2021-11-04 15:09:56
#> 5 nycflights13/weather.csv  86753 2021-11-04 15:09:56

Reading single files from an archive

archive_read() is used to read a single file from an archive. This function returns an R connection, which can be passed to many R functions that take a connection object as input. All base R file system functions use connections, as well as some packages like readr.

The file= argument accepts numeric positions in the archive, or filenames as input.

con1 <- archive_read("nycflights13.zip", file = 2)
readLines(con1, n = 5)
#> [1] "faa,name,lat,lon,alt,tz,dst,tzone"                                                
#> [2] "04G,Lansdowne Airport,41.1304722,-80.6195833,1044,-5,A,America/New_York"          
#> [3] "06A,Moton Field Municipal Airport,32.4605722,-85.6800278,264,-6,A,America/Chicago"
#> [4] "06C,Schaumburg Regional,41.9893408,-88.1012428,801,-6,A,America/Chicago"          
#> [5] "06N,Randall Airport,41.431912,-74.3915611,523,-5,A,America/New_York"
close(con1)

con2 <- archive_read("nycflights13.zip", file = "nycflights13/planes.csv")
readLines(con2, n = 5)
#> [1] "tailnum,year,type,manufacturer,model,engines,seats,speed,engine"
#> [2] "N10156,2004,Fixed wing multi engine,EMBRAER,EMB-145XR,2,55,NA,Turbo-fan"
#> [3] "N102UW,1998,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,NA,Turbo-fan"
#> [4] "N103US,1999,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,NA,Turbo-fan"
#> [5] "N104UW,1999,Fixed wing multi engine,AIRBUS INDUSTRIE,A320-214,2,182,NA,Turbo-fan"
close(con2)

Writing single files to an archive

Similarly archive_write() is used to write a single file to an archive. Again this creates a writable R connection. Like reading, many base R functions work with writable connections, as well as some packages like readr.

The archive and compression formats are automatically guessed based on the output filename file extensions. However you can also specify them explicity with the format and filter options.

Here we create a new zip archive containing the file mtcars.csv.

readr::write_csv(mtcars, archive_write("my-cars.zip", "mtcars.csv"))

archive("my-cars.zip")
#> # A tibble: 1 × 3
#> path size date
#>
#> 1 mtcars.csv 1281 1980-01-01 00:00:00

Writing multiple files to an archive

archive_write_files() writes multiple files to a new archive. In this case the files to be added to the archive should already be written on disk.

archive_write_dir() is a helper to archive all the files in a given directory.

library(readr)

# Write a few files to the temp directory
write_csv(iris, "iris.csv")
write_csv(mtcars, "mtcars.csv")
write_csv(airquality, "airquality.csv")

# Add them to a new XZ compressed tar archive
archive_write_files("data.tar.xz",
c("iris.csv", "mtcars.csv", "airquality.csv"))

# View archive contents
archive("data.tar.xz")
#> # A tibble: 3 × 3
#> path size date
#>
#> 1 iris.csv 3716 2021-11-04 15:09:57
#> 2 mtcars.csv 1281 2021-11-04 15:09:57
#> 3 airquality.csv 2890 2021-11-04 15:09:57

Extracting multiple files from an archive

archive_extract() allows you to extract one or more files to disk from an archive.

Note the archive and compression formats will be automatically detected.

# Create a new directory
my_dir <- fs::file_temp() |> fs::dir_create()

# Extract two of the files in the archive to that directory
archive_extract("data.tar.xz", dir = my_dir, files = c("iris.csv", "mtcars.csv"))

# Show the extracted files
fs::dir_ls(my_dir) |> fs::path_file()
#> [1] "iris.csv" "mtcars.csv"

Acknowledgements

Thanks to the following users who have tried out the development versions of archive and opened issues and feature suggestions to improve it! @cboettig, @jennybc, @jeroen, and @JMcrocs.

1 Like