Regular expression in RStudio

I would like to use the regex option in RStudio to substitute/replace a sequence such as \texttt{.doc} into **.doc**. The original file extension where these sequences are is Rnw, and I want to transform it into Rmd. I looked for an RStudio manual but could not find any that can answer my question. I do not have an example because I do not have a clue on how to use the regex option in RStudio.

Hi Dan,

I can highly recommend chapter 14 of the R4DS book

14.1 Introduction

This chapter introduces you to string manipulation in R. You’ll learn the basics of how strings work and how to create them by hand, but the focus of this chapter will be on regular expressions, or regexps for short. Regular expressions are useful because strings usually contain unstructured or semi-structured data, and regexps are a concise language for describing patterns in strings. When you first look at a regexp, you’ll think a cat walked across your keyboard, but as your understanding improves they will soon start to make sense.

Hope this helps you to get a first start on this topic, otherwise let us know.
lars

Hi Lars, thank you for your answer. While I do not have problems working with strings in R (or regexp sequences), I wanted to use the regex option in RStudio. When I want to do a replacement, I see that I can tick: In selection, Match case, Whole word, Regex and Wrap. I am interested in that Regex option, how do I work with it?

Hi Dan,

I'm sorry, but where are you exactly referring to? I suppose you're not referring to the RegExplain addin?

Ah, I found it: you're using the Find & Replace function :smile:

Have you tried ?regex in the console?
I think that the topic Regular Expressions as used in R is the one you're looking for?

Yes, that is correct, I wan to use the Find & Replace function. I looked for the manual but could not find anything related to those options I mentioned, am I right? I do not want to use the console but the RStudio menu.

So, you're looking for instructions how the regex pattern should look like for your case?

  • Well, you can find the helppages by entering ?regex in the RStudio console.

  • If these helppages are not sufficient, you may be interested in the RegExplain addin which contains a nice overview of the many available resources:

For more info on this addin:

Screenshot from 2021-02-12 14-30-35
I am referring to this part, I am unaware if it uses the RegExplain addin or not...

When selecting the Regex option, the Find & Replace accepts any well constructed regex pattern.

The RegExplain addin can help in constructing the right pattern after which you can use that pattern in the Find & Replace.

If you wish, provide a small sample text and specify which part of text need to be updated with the desired text.

That would be awesome! Thank you! Here is the text, very simple but this is what I need:
Let us consider the following functions, \texttt{head()} and \texttt{tail()}. I want to transform into Let us consider the following functions **head()** and **tail()**. The Rmd file will transform the sequence between the two asterisks into a bold.

With the option Regex enabled, I think this is the pattern you're looking for: \\texttt{|}.

Where,

  • the \ in the text is a special regex character and therefor needs to be escaped with another \.
  • the need for an OR-operator | to find the two different text elements.

HTH

In the Find box, put:

(\\texttt\{)(.*?)(\})

and replace:

**\2**

Explanation:
The Find box contains three sets of brackets: (\\texttt\{), (.*?) and (\}), this means it is trying to capture patterns that match these three statements in a row. The slashes in the first capture indicate that they are escapes, so we're looking for the literal \texttt{ but we need to escape the first slash and the curly brace by adding a slash before them. Similar for the third capturing brackets. The middle capture is what's most interesting: (.*?) will capture any symbol because of ., any number of times with *, and it will do so lazily, thanks to?. The laziness means that if a string matches the third capture, it will stop searching. This is to avoid it matching something like "\textt{foo} bar. Hello \texttt{World}" as single instance, which would be greedy. Lazily, this string would match the search regex twice as you'd expect.

The Replacement, **\2** says whenever we find something in the find box, replace it with ** followed by the second capture, \2 and then another `**.

Thank you, it was helpful to get there!

1 Like

Thank you! This actually solved my problem. My main issue was how to use the RStudio search and replace function. Now it is solved!

1 Like