Tesseract 4.0 not working in shinyapps.io

Hi
I believe there is a problem when trying to use tesseract 4.0 in shinyapps.io
After uploading to shinyapps.io the following code:

title: "Tesseract"
runtime: shiny
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: scroll

knitr::opts_chunk$set(echo = TRUE)
library(flexdashboard)
library(shiny)
library(tesseract)

Check

Column

Session Info

renderPrint({
  sessionInfo()
})

Column

Tesseract info

renderPrint({
   tesseract_info()
})

Output from shinyapps.io is the following:

R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] tesseract_4.0 flexdashboard_0.5.1.1 rmarkdown_1.13
[4] shiny_1.3.2

loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 digest_0.6.20 later_0.8.0 rappdirs_0.3.1
[5] mime_0.5 R6_2.4.0 jsonlite_1.6 xtable_1.8-4
[9] magrittr_1.5 evaluate_0.14 rlang_0.4.0 stringi_1.3.1
[13] promises_1.0.1 tools_3.5.0 stringr_1.4.0 httpuv_1.5.1
[17] xfun_0.8 yaml_2.2.0 compiler_3.5.0 htmltools_0.3.6
[21] knitr_1.23

and
Tesseract info

$datapath
[1] "/usr/share/tesseract-ocr/tessdata/"

$available
[1] "eng"

$version
[1] "3.04.01"

$configs
[1] "ambigs.train" "api_config" "bigram" "box.train"
[5] "box.train.stderr" "digits" "hocr" "inter"
[9] "kannada" "linebox" "logfile" "makebox"
[13] "pdf" "quiet" "rebox" "strokewidth"
[17] "txt" "unlv"

Does anybody know what is happening here? It looks like library(tesseract) will load 4.0 version, but in fact shinyapps.io will use 3.04.01 version and read documents using the old technology and not the new one. Needless to say that I do no encounter this problem locally. When running local, I get the following output:

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] tesseract_4.0 devtools_2.1.0 usethis_1.5.1
[4] flexdashboard_0.5.1.1 shiny_1.3.2

loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 compiler_3.5.0 later_0.8.0 remotes_2.1.0
[5] prettyunits_1.0.2 tools_3.5.0 testthat_2.1.1 digest_0.6.20
[9] pkgbuild_1.0.3 pkgload_1.0.2 jsonlite_1.6 evaluate_0.14
[13] memoise_1.1.0 rlang_0.4.0 cli_1.1.0 yaml_2.2.0
[17] xfun_0.8 withr_2.1.2 stringr_1.4.0 knitr_1.23
[21] rappdirs_0.3.1 desc_1.2.0 fs_1.3.1 rprojroot_1.3-2
[25] glue_1.3.1 R6_2.4.0 processx_3.4.0 rmarkdown_1.13
[29] sessioninfo_1.1.1 callr_3.3.0 magrittr_1.5 backports_1.1.2
[33] promises_1.0.1 ps_1.3.0 htmltools_0.3.6 assertthat_0.2.0
[37] mime_0.5 xtable_1.8-4 httpuv_1.5.1 stringi_1.3.1
[41] crayon_1.3.4

and
Tesserat info

$datapath
[1] "C:\Users\Marius\AppData\Local\tesseract4\tesseract4\tessdata/"

$available
[1] "eng" "fra" "osd" "ron"

$version
[1] "4.0.0"

$configs
[1] "ambigs.train" "api_config" "bigram" "box.train"
[5] "box.train.stderr" "digits" "hocr" "inter"
[9] "kannada" "linebox" "logfile" "lstm.train"
[13] "lstmdebug" "makebox" "pdf" "quiet"
[17] "rebox" "strokewidth" "tsv" "txt"
[21] "unlv"

1 Like

The execution environment on shinyapps.io is currently Ubuntu 16.04, which has tesseract 3.0.4.

It is on our roadmap to transition to Ubuntu 18.04, at which time 4.0 will be available.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.