Download a gzipped file and decompress it


#1

I'm trying to write a Shiny app that'll work with station observations from the HadISD dataset. These obs are served as compressed NetCDF files (.nc.gz)—NetCDFs are a binary file type. If I were working with them locally, I'd download a station file, decompress it, and then open it with a package like ncdf4 or tidync.

Unfortunately, AFAIK these packages don't accept compressed files (at least, NetCDF files that have been gzipped; internal compression is another matter). They also appear to require a filename, so I don't think I can use a connection object like gzip('test.nc.gz') %>% open().

My thinking is that I need to:

  1. download.file() to a tempdir() (I understand this'll be wiped at the end of a session),
  2. decompress this temporary file (.nc.gz) to another temporary file (.nc), and
  3. load that temp file in with nc_open() or tidync().

I'm not sure about the best way to do step 2. Should I just use system2() to do this with gzip? That's not very portable. Is there a way to do this with connections (stream the compressed file into an uncompressed one or something)?

This is essentially what I'm planning:

library(tidync)
library(magrittr)
library(stringr)

ncdf_temp = tempfile(pattern = 'compass-', fileext = '.nc.gz')

# download the .nc.gz file
download.file(
  url = paste0(
    'https://www.metoffice.gov.uk/hadobs/hadisd/v202_2017f/data/',
    'hadisd.2.0.2.2017f_19310101-20171231_948680-99999.nc.gz'),
  destfile = ncdf_temp)

# step 2: decompress?

# step 3: open
obs = tidync(ncdf_temp %>% str_replace('.gz', ''))

Created on 2018-10-06 by the reprex package (v0.2.0).


#2

There is base R function unzip. From the implementation it looks like it handles Windows as well as *nix systems, so should be fine for your use-case.


#3

That's good to know about, @mishabalyasin! Unfortunately, it looks like unzip() only handles ZIP archives, and the similar untar() only handles TAR archives (not single files that have been compressed). It doesn't look like there's an equivalent function for uncompressing gzipped files :confused:


#4

Ahh, got there eventually! The R.utils package offers gzip() and gunzip() functions :smiley:


#5

Ah you beat me to find your own solution ! I was working on reprex. I post it anyway

temp_nc <- fs::file_temp(ext = ".nc.gz")
url <- "https://www.metoffice.gov.uk/hadobs/hadisd/v202_2017f/data/hadisd.2.0.2.2017f_19310101-20171231_010014-99999.nc.gz"
download.file(url, destfile = temp_nc, mode = "wb")
suppressMessages(library(R.utils))
isGzipped(temp_nc)
#> [1] TRUE
nc_path <- gunzip(temp_nc)
res <- ncdf4::nc_open(nc_path)
#> [1] ">>>> WARNING <<<  attribute missing_value is an 8-byte value, but R"
#> [1] "does not support this data type. I am returning a double precision"
#> [1] "floating point, but you must be aware that this could lose precision!"

Created on 2018-10-06 by the reprex package (v0.2.1)

And to talk about the internal, basically gunzip is a smart wrapper around base gzfile to open the file with a connection, readBin to read binary data piece by piece,and writeBin to write them in an uncompressed file. With this in mind, you could always read an unusual file format if a connection function exists


#6

Still very much appreciated, @cderv :blush: It's useful to know about the relationship with gzfile; I'd been thinking it might be possible to use the connection functions this way!