How to use capture() in rebus package?

bobby · February 13, 2020, 3:55pm

Hi,

I need to use the capture() function from rebus package, but for the life of me, I cannot seem to find decent references or examples online. Makes me wonder if this function is even in common use anymore.

Does anyone know of a good reference for this function? I am a newbie in R.

The code that I am trying to understand is:

pattern = capture(optional(DGT) %R% DGT) %R%
capture(or('A', 'B','C','-', " "))

Thanks for any help!

technocrat · February 13, 2020, 7:07pm

library(rebus)
require(stringi)
#> Loading required package: stringi
# use help(capture) to get the function signature and example
# Usage
# capture is good with match functions
(rx_price <- capture(digit(1, Inf) %R% DOT %R% digit(2)))
#> <regex> ([[:digit:]]+\.[[:digit:]]{2})
(rx_quantity <- capture(digit(1, Inf)))
#> <regex> ([[:digit:]]+)
(rx_all <- DOLLAR %R% rx_price %R% " for " %R% rx_quantity)
#> <regex> \$([[:digit:]]+\.[[:digit:]]{2}) for ([[:digit:]]+)
stringi::stri_match_first_regex("The price was $123.99 for 12.", rx_all)
#>      [,1]             [,2]     [,3]
#> [1,] "$123.99 for 12" "123.99" "12"

^{Created on 2020-02-13 by the reprex package (v0.3.0)}

First, look for any vignettes and, if none, check the Description in the package's index. Here we find

Description: Build regular expressions piece by piece using human readable code. This package contains core functionality, and is primarily intended to be used by package developers.

(If I were in your position, that would make me wonder if rebus and its capture is the right tool. See the task view for possible alternatives.)

One of the hardest hurdles I had to overcome in learning R was learning out to decipher the help pages and the function signature.

Think of the user-facing portion of R as school algebra writ large:

f(x) = y)

The help page describes the arguments x and the result(s) y of the function.

The usage is complicated by the sprinkling of non-standard operators like %R%

Why not come back with a description of the problem you're trying to solve and the reasons that you chose this approach.

bobby · February 14, 2020, 2:47am

Hi, thanks very much for that. I too find the help pages not that helpful for beginners. So, i have been relying on online samples, and youtube videos & tutorials. The thing with rebus and capture() is that, for some reason these online resources are very much lacking.

I have to use rebus and capture() for a piece of work that I am submitting, so I can't use another function.

In my assignment, the following has been given:

pattern = capture(optional(DGT) %R%DGT) %R% capture(or('Y', 'YO','M','-', ""))
wanted_portion = str_match(sample_string, pattern)

My problem is that I don't know what this line means:

capture(optional(DGT) %R%DGT) %R% capture(or('Y', 'YO','M','-', ""))

technocrat · February 14, 2020, 5:00am

Also see, if applicable, FAQ: Homework Policy

capture takes a character vector (i.e., a string) as an argument and returns

[a] character vector representing part or all of a regular expression.

So, in your example, the argument is

optional(DGT) %R%DGT) %R% capture(or('Y', 'YO','M','-', "")

optional is another function that takes a character vector and returns a regex

library(rebus)
optional(DGT)
#> <regex> [\d]?

^{Created on 2020-02-13 by the reprex package (v0.3.0)}

The result is a lazy regex expressions that captures 1 or 0 digits, preferring zero. Buried in the documentation is that DGT is generic class for 'digit'.

%R% is a concatenation operator, named because %c% was considered too hard to type. (You can't make this up.)

Next comes another capture enclosing

or('Y', 'YO','M','-', "")

The or operator

or takes multiple character vector inputs and returns a character vector of the inputs separated by pipes

Putting it all together in loose terms;

Gimme a regular expression for something that may or may have in it a number before another number, but at least let there be one number followed by one of those other characters.

Regular expressions are totally great. They are hard but powerful. I can't for the life of me understand why someone would interpose a barrier to learning them effectively with this meta-regex tool.

bobby · February 15, 2020, 4:51pm

Thanks, I think I am slowly getting the hang of it.

system · March 7, 2020, 4:51pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.