error using TreeTagger package on RStudio Workbench 2022.07.0+548.pro5 (Spotted Wakerobin) on Ubuntu 20.04 (64-bit)

I am trying to use the following TreeTagger package on RStudio Workbench:

https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

I have followed the instructions as per the above and tested with the following code in RStudio Workbench:

library(textstem)

x <- c(
'the dirtier dog has eaten the pies',
'that shameful pooch is tricky and sneaky',
"He opened and then reopened the food bag",
'There are skies of blue and red roses too!',
NA,
"The doggies, well they aren't joyfully running.",
"The daddies are coming over...",
"This is 34.546 above"

)

Default lexicon::hash_lemmas dictionary

lemmatize_strings(x)

Hunspell dictionary

lemma_dictionary <- make_lemma_dictionary(x, engine = 'treetagger')

The above then prompt me for the following:

TreeTagger does not appear to be installed.
Would you like me to open a download browser?

1: Yes
2: No

At this stage I stop and don't proceed further.

If I provide the path to the treetagger package location, which houses all the R packages, with the following:

lemma_dictionary <- make_lemma_dictionary(x, engine = 'treetagger', path = '/opt/repo/CRAN/treetagger')

I get the following error:

Error: None of the following files were found, please check your TreeTagger installation!
/opt/repo/CRAN/treetagger/cmd/utf8-tokenize.perl
/opt/repo/CRAN/treetagger/cmd/tokenize.perl
In addition: Warning message:
NA is replaced by empty string

We have physically checked the above path and the files do exist, which is why the error message is strange. In an attempt to get it working, we tried on Windows and it's works fine with no issues. However, on Ubuntu, we have found the above errors.

Any help, support and advise on next steps would be greatly appreciated.

Hi Abs! I have had some success doing parts-of-speech tagging with the package UDPipe, NLP with R and UDPipe ยท Tokenization, Parts of Speech Tagging, Lemmatization, Dependency Parsing and NLP flows. While I ask around internally about your issue with TreeTagger, feel to see if UDPipe can do what you need. I'll let you know what I find out.

The TreeTagger page has download and install instructions for Linux. Have you followed those steps on your Workbench server? What version of Linux is on your Workbench server?

The following steps are necessary to install the TreeTagger (see below for the Windows version). Download the files by right-clicking on the link. Then select "save file as". All files should be stored in the same directory.

  1. Download the tagger package for your system (PC-Linux, Mac OS-X (Intel), Mac OS-X (M1), ARM64, ARMHF, ARM-Android, PPC64le-Linux).
    If you have problems with your Linux kernel version, download this older Linux version and rename it to tree-tagger-linux-3.2.2.tar.gz.
  2. Download the tagging scripts into the same directory.
  3. Download the installation script install-tagger.sh.
  4. Download the parameter files for the languages you want to process.
  5. Open a terminal window and run the installation script in the directory where you have downloaded the files:
    sh install-tagger.sh
  6. Make a test, e.g.
    echo 'Hello world!' | cmd/tree-tagger-english*
    or
    echo 'Das ist ein Test.' | cmd/tagger-chunker-german
  7. You also might want to have a look at my new part-of-speech tagger RNNTagger

Hi Jemery,

Great to hear from you. We have had luck with installing it on Windows 11 and Windows 8 but when we have tried to install onto Ubuntu 20.04, we get the errors I mentioned. I've followed the instructions, the same as I did on Windows, but it's not working for us on Ubuntu.

Kind Regards

Abs

TreeTagger requires a license for commercial use. If you have that license, I'd try try reaching out to their developer to ask about installation help. Otherwise, I'd give UDPipe a try as its license includes commercial use.

I installed the TreeTagger executable application in my home dir on an Ubuntu 20.04 machine and tested that it works when run from BASH.

# Downloaded relevant files
$ wget https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger-linux-3.2.4.tar.gz
$ wget https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tagger-scripts.tar.gz
$ wget https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/install-tagger.sh
$ wget https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/english.par.gz

# Ran Install script
$ sh install-tagger.sh

TreeTagger version for PC-Linux installed.
Tagging scripts installed.
English parameter file installed.
Path variables modified in tagging scripts.

# Executed Test
$ echo 'Hello world!' | cmd/tree-tagger-english
	reading parameters ...
	tagging ...
	 finished.
Hello	UH	hello
world	NN	world
!	SENT	!

I then attempted to run the R code listed in the OP's original message, modifying it to match my install location.

> library(textstem)
> 
> x <- c(
+ 'the dirtier dog has eaten the pies',
+ 'that shameful pooch is tricky and sneaky',
+ "He opened and then reopened the food bag",
+ 'There are skies of blue and red roses too!',
+ NA,
+ "The doggies, well they aren't joyfully running.",
+ "The daddies are coming over...",
+ "This is 34.546 above")
> 
> lemma_dictionary <- make_lemma_dictionary(x, engine = 'treetagger', path="/home/randre/treetagger")

Error in dplyr::filter(tagged.results@TT.res[c("token", "lemma")], !lemma %in%  : 
  no slot of name "TT.res" for this object of class "kRp.text"
In addition: Warning message:
NA is replaced by empty string 

This seems to imply that TreeTagger is being found, but I have some other error related to the koRpus package, which is likely, since I never installed it.

Have you tried testing the "Hello World!" example in a terminal to see if it works? (Note that they will need to replace the language with one they have installed in the example below)

$ echo 'Hello world!' | /opt/repo/CRAN/treetagger/cmd/tree-tagger-english

Note: I did also verify that (at least one of) the missing files were present in my treetagger install after following the default installation instructions_

$ find ~/treetagger/ -name "utf8-tokenize.perl"
/home/randre/treetagger/cmd/utf8-tokenize.perl

This topic was automatically closed after 45 days. New replies are no longer allowed.


If you have a query related to it or one of the replies, start a new topic and refer back with a link.