different results between chunk and knitted markdown / Cyrillic letters

zoowalk · January 3, 2022, 10:30pm

I encountered a surprising behavior when knitting an .Rmd document.
Below the code.

When running only the chunk, the result of str_count is 1.
When I knit the document, the result is 0.

When I change the search pattern to a term without cyrillic letters, e.g. "decision" there is no difference.

Any idea what's going on? Is this a bug, or am I missing something? Many thanks.

title: "test"
author: ""
date: "1/3/2022"
output: html_document

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)

txt <- "30.  In a decision of 18 April 2006 (реш. № 4 от 18 април 2006 г. по конституционно дело № 11 от 2005 г., обн., ДВ, бр. 36 от 2 май 2006 г.) the Constitutional Court declared unconstitutional section 132d(3) of the ESA, which had almost identical wording as the one of section 33(1)(c) but concerned accused detainees. Since the subject-matter of the case was limited to the former provision, section 33(1)(c) was not reviewed for constitutionality."

str_count(txt, regex("обн\\."))

technocrat · January 3, 2022, 10:56pm

Try this:

title: "test"
author: ""
date: "1/3/2022"
output: html_document
---

```{r}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)

txt <- "30.  In a decision of 18 April 2006 (реш. № 4 от 18 април 2006 г. по конституционно дело № 11 от 2005 г., обн., ДВ, бр. 36 от 2 май 2006 г.) the Constitutional Court declared unconstitutional section 132d(3) of the ESA, which had almost identical wording as the one of section 33(1)(c) but concerned accused detainees. Since the subject-matter of the case was limited to the former provision, section 33(1)(c) was not reviewed for constitutionality."

str_count(txt, regex("обн\\."))
```

zoowalk · January 4, 2022, 9:15am

Hi, many thanks for your reply.

The result remains the same, meaning they differ. When I run the chunk it's 1, when I knit it's 0. I strongly assume it has something to do with the Cyrillic letters. When I change the search pattern to a term written in the Latin alphabet it's fine.

I ran the code on my RStudio Cloud account and there the results are fine. It's the same R Version ("R version 4.0.3), the same knitr (1.37), stringr (1.4.0) and Rmarkdown (2.11) version.

Below the output. Any idea?

test

1/3/2022

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.0.5

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1

## Warning: package 'ggplot2' was built under R version 4.0.5

## Warning: package 'tidyr' was built under R version 4.0.5

## Warning: package 'readr' was built under R version 4.0.5

## Warning: package 'dplyr' was built under R version 4.0.5

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

txt <- "30.  In a decision of 18 April 2006 (реш. № 4 от 18 април 2006 г. по конституционно дело № 11 от 2005 г., обн., ДВ, бр. 36 от 2 май 2006 г.) the Constitutional Court declared unconstitutional section 132d(3) of the ESA, which had almost identical wording as the one of section 33(1)(c) but concerned accused detainees. Since the subject-matter of the case was limited to the former provision, section 33(1)(c) was not reviewed for constitutionality."

stringr::str_count(txt, regex("обн\\."))

## [1] 0

zoowalk · January 4, 2022, 10:08am

Just to add - I update to R vers 4.1.2 and the riddle remains.

technocrat · January 4, 2022, 10:13am

Try File | Reopen with Encoding | UTF8.

Sort of a long shot, though.

zoowalk · January 4, 2022, 10:42am

Thanks again. Unfortunately, the problem remains. Maybe it's best to file an issue at the knitr (?) repository.

zoowalk · January 9, 2022, 9:37pm

got this as an answer. so you were on to something with UTF8. thx again.

system · January 16, 2022, 9:38pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

different results between chunk and knitted markdown / Cyrillic letters

title: "test" author: "" date: "1/3/2022" output: html_document

test

1/3/2022

title: "test"
author: ""
date: "1/3/2022"
output: html_document