Webscraping using a Shiny App

Hello,

I am trying to convert a web scraping R script that I wrote into a Shiny app that others can use. Basically, I am using the vanilla "download.file" function to download a bunch of files from specified URLs. When I run the app locally, everything works great. However, after I published the Shiny app on shinyapps.io, I realized that the app tries to download files on the server, not the client computer. I know that Shiny offers the download functionality via the "downloadHandler" function, but it does not work for my web scraping script. I need the app to download a bunch of files from the web, not a dataset or a plot generated within Shiny.

If you have any ideas on how to make Shiny download files on the client computer, I would appreciate it.

Here is a simplified version of the app that I am trying to write. I am web scraping the bibliography files from a journal's website. When you click on the download button, the files will be downloaded into your home directory.

library(shiny)
library(fs)

ui <- fluidPage(
  actionButton(
    "download_bibs"
    , "Download citations"
  )
)

server <- function(input, output, session) {

  home_dir <- c(
    Home = fs::path_home()
  )

  observeEvent(input$download_bibs, {

    articles <- c("aer.20200451", "aer.20190586")

    issue_id <-  "10.1257"

    for (j in 1:length(articles)) {

      i <- articles[j]

      name_download <- paste0(str_remove_all(i, "\\."), ".", "bib")
      path <- paste0(
        home_dir
        , .Platform$file.sep
        , name_download
      )
      url_cit <- paste0(
        "https://www.aeaweb.org/articles/citation-export?args%5Bformat%5D="
        , "bib"
        , "&args%5Bdoi%5D="
        , issue_id
        , "%2F"
        , i
        , "&args%5Btype%5D="
        , "bib"
      )

      download.file(
        url = url_cit
        , destfile = path
      )

    }
  }
  )
}

shinyApp(ui = ui, server = server)

The issue is that when you run this app locally, it does what it is supposed to do. But if you publish it on the shinyapps.io, the files are downloaded on the server. The question is how to get them from the server and download on the client.

Best regards,
Alex

I think you rather glossed over your issue ...

Why doesn't it work for you ?
Can you provide a minimal example app illustrating the problem?

It's not possible to write files from the browser directly to the harddrive without opening the "download file" function (as this would be a severe security risk).

You could:
Download the files to a temp-folder on the server,
zip the files (e.g. with zipr),
add the zip file to the download-handler.

I have added the simplified version of the app in my question.

Thanks for the suggestion. This does seem like the way to go. But then I run into the issue that all of the downloaded files remain on the server. And each time I run the app, new files are added to the temp folder on the server. In the end I am downloading all of the files. Also, I am not sure whether someone else running the app will also add files to the same temp folder. Is there a way to clear the temp folder after the file were downloaded?

You mentioned downloadHandler , so I was surprised your example does not contain this. I'm sure if you switched to using the downloadButton / downloadHandler paradigm you could succeed.

Again this migth be difficult. For example what happens if someone clicks the download button but then cancels it at the step of selecting the output folder?
At the end of the for loop you can add the zip function, then delete all the files but the zip file and send the zip file to the download handler.
the zip-function has a feature to either add files to an existing zip (zip_append) or write a new one, so when writing a new one the old instance is also overwritten. To be on the safe side you can delete all files in the temp-dir before adding the new ones.
However I don't think this is necessary, as using tempdir() in the server it should initiate a new instance for every session.

1 Like

Got it, thank you so much for the advice!

I am currently trying to do this, following @Matthias suggestion.

If I'm not wrong, this line takes the home path of the server, not the client. Because the R script is being executed on the server.

Yes, this is exactly the problem. As @Matthias pointed out, the only way to download something on the client is to use the downloadHandler.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.