different results between chunk and knitted markdown / Cyrillic letters

I encountered a surprising behavior when knitting an .Rmd document.
Below the code.

When running only the chunk, the result of str_count is 1.
When I knit the document, the result is 0.

When I change the search pattern to a term without cyrillic letters, e.g. "decision" there is no difference.

Any idea what's going on? Is this a bug, or am I missing something? Many thanks.


title: "test"
author: ""
date: "1/3/2022"
output: html_document

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
txt <- "30.  In a decision of 18 April 2006 (реш. № 4 от 18 април 2006 г. по конституционно дело № 11 от 2005 г., обн., ДВ, бр. 36 от 2 май 2006 г.) the Constitutional Court declared unconstitutional section 132d(3) of the ESA, which had almost identical wording as the one of section 33(1)(c) but concerned accused detainees. Since the subject-matter of the case was limited to the former provision, section 33(1)(c) was not reviewed for constitutionality."
str_count(txt, regex("обн\\."))

Try this:


title: "test"
author: ""
date: "1/3/2022"
output: html_document
---

```{r}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)

txt <- "30.  In a decision of 18 April 2006 (реш. № 4 от 18 април 2006 г. по конституционно дело № 11 от 2005 г., обн., ДВ, бр. 36 от 2 май 2006 г.) the Constitutional Court declared unconstitutional section 132d(3) of the ESA, which had almost identical wording as the one of section 33(1)(c) but concerned accused detainees. Since the subject-matter of the case was limited to the former provision, section 33(1)(c) was not reviewed for constitutionality."

str_count(txt, regex("обн\\."))
```

Hi, many thanks for your reply.

The result remains the same, meaning they differ. When I run the chunk it's 1, when I knit it's 0. I strongly assume it has something to do with the Cyrillic letters. When I change the search pattern to a term written in the Latin alphabet it's fine.

I ran the code on my RStudio Cloud account and there the results are fine. It's the same R Version ("R version 4.0.3), the same knitr (1.37), stringr (1.4.0) and Rmarkdown (2.11) version.

Below the output. Any idea?

test

1/3/2022

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.5
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
txt <- "30.  In a decision of 18 April 2006 (реш. № 4 от 18 април 2006 г. по конституционно дело № 11 от 2005 г., обн., ДВ, бр. 36 от 2 май 2006 г.) the Constitutional Court declared unconstitutional section 132d(3) of the ESA, which had almost identical wording as the one of section 33(1)(c) but concerned accused detainees. Since the subject-matter of the case was limited to the former provision, section 33(1)(c) was not reviewed for constitutionality."

stringr::str_count(txt, regex("обн\\."))
## [1] 0

Just to add - I update to R vers 4.1.2 and the riddle remains.

Try File | Reopen with Encoding | UTF8.

Sort of a long shot, though.

1 Like

Thanks again. Unfortunately, the problem remains. Maybe it's best to file an issue at the knitr (?) repository.

got this as an answer. so you were on to something with UTF8. thx again.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.