Can't access content of pdf document inside iframe from parent (but it works for html)

Hi R-Studio Community,

I've already checked your rules on cross-posting here and I think it's ok to do so, if not I apologize. I've tried to get an answer to this on Stackoverflow already, but didn't get any answer for over a month. Here's the original post.

I'm currently working on a little project to create a shiny app for systematic litreature reviews and as part of that I added functionality to download pdf-files and then show them inside an iframe (The pdf shows inside the iframe as expected).
As a next step I want to be able to select text in that PDF and use Javascript to automatically copy it to a textAreaInput in the parent-frame.
The problem now is that my browser console gives me this error:

SecurityError: Permission denied to access property "document" on cross-origin object

So I originally thought it was only a matter of storing the pdf file in the wrong place (see this discussion), but I have myPdf.pdf already stored in the www folder (See Folder Structure below).
Another peculiar thing that makes me doubt that is that this only happens when I embed myPdf.pdf into the iframe, but when I embed myHtml.html I don't get that error and the function works as expected (besides the minor issue that I have to click outside the iframe to trigger the function, but that's another issue for another day).

Any help to solve this issues is highly appreciated (hints about that odd behaviour are also welcome)

Steps to reproduce:

  1. set up the directory as shown under Folder Structure below
  2. Open R-Studio and run the app by executing all the code
  3. Open the browser console
  4. Inside the webbrowser, mark some text inside the iframe
  5. Focus the parent frame by clicking outside the iframe
  6. Press and release Enter to trigger the javascript function (You should now see the marked text inside the textAreaInput in the parent element
  7. Stop the shiny app, and change the source of the tags$iframe() in App.R by commenting out the current one and uncommenting the one above
  8. Repeat steps 2-6 to see the console error

Folder Structure

ShinyAppFolder
|──app.R
|──www/
|   |──js/
|   |   |──getTextFromIframe.js
|   |──myHtml.html
|   |──myPdf.pdf
|   |──cachedFiles

Shiny App.R

library(shiny)
wwwPath <- paste0(getwd(),"/www")
addResourcePath("localfiles", wwwPath)
# fulltext::cache_options_set(full_path = paste0(wwwPath, "/cachedFiles"))
fileName <- "myPdf.pdf"

ui <- fluidPage(
  headerPanel(h3("systematicReviewR")),
  sidebarLayout(
    sidebarPanel(
      width = 4,
      textAreaInput("markedTextFromPdf", label = "Test, if I can copy text from iframed local pdf"),
      includeScript("www/js/getTextFromIframe.js")
      ),
    mainPanel(
      htmlOutput("pdfviewer")
    )
  )
)

server <- function(input, output, session) {
  
  output$pdfviewer <- renderUI({
    tags$iframe(id = "localFile",
                style = "height: 600px; width: 100%; scrolling = yes",
                # src = paste0("localfiles/", fileName)
                src = "localfiles/myHtml.html"
    )
  })  
}

shinyApp(ui = ui, server = server)

Javascript getTextFromIframe.js

Here's the code for my javascript file:

function myFunction() {
  var myIframe = document.getElementById('localFile');
  console.log(myIframe);
  var idoc = myIframe.contentDocument || myIframe.contentWindow.document; // ie compatibility
  var text = idoc.getSelection().toString();
  return(text);
}

onkeyup = function(e) {
  if (e.which ==13) {
    var selection = myFunction();
    var parentDocument = window.parent.document;
    var parentTextbox = parentDocument.getElementById("markedTextFromPdf");
    parentTextbox.textContent = parentTextbox.textContent + selection;
    // based on: https://www.youtube.com/watch?v=6NNe6GWO8us
    //Shiny.setInputValue("getTextFromIframe", selection);
    console.log(parentTextbox);
  }
}

Selecting content using contentDocument and contentWindow is mainly used for web-based documents. Extracting text from pdfs is a bit more challenging as browsers use different applications for rendering and displaying pdfs. However, there are tools such as pdf.js that help with this. I haven't used this before, but it looks like you will need to build pdf using gulp and some an application bundler (maybe?), as well as a bit of JS.

There are two SO posts that provide examples using pdf.js, but I haven't found a step-by-step guide starting from scratch (will update if I find something).

Hope that helps!

Thanks for the suggestion and the links. I couldn't figure out though how to make this work, but I hope to be able to take another look at it at some point.
Regardless of the fact that I would have to mess around with JavaScript quite a bit to get this done, do you think this would solve the cross-origin problem? Because I don't see any reference being made to that topic in the two links you posted?
I also still don't understand why the PDF has a different origin, but the html saved in the same folder does not?
I'll also post updates here, if I make any progress on this issue

This topic was automatically closed 54 days after the last reply. New replies are no longer allowed.