Reading PDF tables into R - failing to find table

Hi all,

I'm attempting to read a table into R using the packages pdftools and tabulizer. On the whole, I've been successful in this, other than with the first page.

The first page of the PDF is split into two halves, the top being a text box and the bottom being the beginning of the table I want to read in. When using the function extract_tables, only the text box at the top is extracted, and the table below ignored. The rest of the table on subsequent pages is successfully read in.

The code I've been using:

library(tidyverse)
library(here)
library(pdftools)
library(tabulizer)
library(plyr)

#Read in file location
pdf_file <- here::here("pdf_location.pdf")

#Exam text recognised - here the whole of the first page is read in, the textbox and teh table
text <- pdf_text(pdf_file)

#Extract table
tables <- tabulizer::extract_tables(pdf_file,
pages = c(1))

#Run to examine results. Here the bottom half of the first page is missing.
tables

I've attempted using GUESS=FALSE and AREA=..., but I'm failing to get results from either. Has anyone solved a similar issue? Thank you!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.