You can try to use guess = TRUE in extract_table(), but depending on the structure of the PDF, it may or may not be able to find the table automatically. When possible, specifying the area is much more reliable, but if the layout of all the PDFs are slightly different, it might be tough.
Since you've done a lot of the hard regex work already extracting rows from the table, here's an approach to turn that text in rows in a data frame. I'm sure there's more elegant ways to do this, but it seems to work!
library(tidyverse)
library(pdftools)
t <- tempfile()
download.file(
"https://www.occitanie.ars.sante.fr/system/files/2020-05/%40ARSOC_%23COVID-19_BulletinInfo54_20200501.pdf",
destfile = t, mode = "wb"
)
text <- pdf_text(t)
departement <- c("Ari.ge", "Aude", "Aveyron", "Gard", "Gers", "Haute.Garonne",
"Hautes.Pyr.n.es", "H.rault", "Lot", "Loz.re", "Pyr.n.es-Orientales",
"Tarn", "Tarn.et.Garonne")
extract_text <- function(x, y) {
strsplit(x, "\r\n")[[1]] %>%
str_subset(paste0(y,"[:blank:]*\\([:digit:]{2}\\)")) %>%
str_extract(paste0(y,"[:blank:]*\\([:digit:]{2}\\)([:blank:]*[:digit:]{1,5}){4}")) %>%
str_split("\\s+") %>%
as_vector() %>%
set_names(paste0("X", 1:6))
}
map2_dfr(text, departement, extract_text)
#> # A tibble: 13 x 6
#> X1 X2 X3 X4 X5 X6
#> * <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Ariège (09) 7 1 29 2
#> 2 Aude (11) 40 3 171 51
#> 3 Aveyron (12) 30 2 113 22
#> 4 Gard (30) 132 28 186 61
#> 5 Gers (32) 26 4 52 17
#> 6 Haute-Garonne (31) 120 38 481 50
#> 7 Hautes-Pyrénées (65) 65 5 96 20
#> 8 Hérault (34) 110 33 563 104
#> 9 Lot (46) 16 1 50 10
#> 10 Lozère (48) 1 0 18 1
#> 11 Pyrénées-Orientales (66) 15 8 260 33
#> 12 Tarn (81) 38 10 79 19
#> 13 Tarn-et-Garonne (82) 12 6 33 4
Created on 2020-05-02 by the reprex package (v0.3.0)