The problem wasn't that I wanted to just split at every plus or minus, I wanted to wrap with a certain maximum width. This is what str_wrap() is designed for, but it doesn't let you specify what characters you will allow a break to happen. It doesn't break up words, since that's what it's designed for, so it optimally choose cut-points.
So my plan was to replace all the spaces in the string with the dummy character, "¬", then replace the newly formed "¬+¬" and "¬-¬" with " +¬" and " -¬". then pass through str_wrap() which should only be able to choose these spaces as points to wrap at. Then at the end, sub out the dummy "¬" for " " again.
Very convoluted, I know. But I couldn't find a better way to induce a word-wrap and specify which characters I wanted to word-wrap at.
In the end, this process didn't work because I think str_wrap() is allowed to word wrap at non-space characters anyway (such as the + and - symbols), so to do this kind of algorithm, would involve a lot of dummy characters (and I'd have to ensure they're non-linebreaking characters too).
In the end, I just wrote my own (semi-pseudo here):
- Match the pattern:
pattern <- "( \\+ )|( - )"
- Find the pattern matches with
matches <- gregexpr(pattern,str))[[1]]
- If we want the split after the matched string, add the
match.length attribute
- Find the last position that is less than the specified width
position <- min(matches[which(matches > width)[1L] - 1L], width + 1L, na.rm=T)`
- Save the string before that position
res <- c(res,substring(str,1,position-1))
- Repeat the process with the remaining string
str <- substring(str,position) until we're done: while(nchar(str) > width)
- Also, throw a warning if a split had to be forced because the first substring is longer than the permitted width
(My actual code is more complicated, vectorised and is in a function that allows to specify the regex pattern)