load_all() and the "¬" character

Ran into a strange error while loading my package today. I have a line in one of my functions that loads the ¬ character as a string (with the intention of using it as a dummy character, since it's not likely to be used anywhere within a string provided by the user.

The culprit is the following line:

dummy_chr <- "¬"

And the error output by load_all() is

Error in parse(text = lines, n = -1, srcfile = srcfile) : 
  C:/Users/mbrxsmbc/Documents/R/My Packages/mscpm/R/assorted.R:208:16: unexpected INCOMPLETE_STRING
207:   
208:   dummy_chr <- "
                    ^
In addition: Warning message:
In readLines(con, warn = FALSE, n = n, ok = ok, skipNul = skipNul) :
  invalid input found on input connection 'C:/Users/mbrxsmbc/Documents/R/My Packages/mscpm/R/assorted.R'

I've switch to using a different character instead and it works fine.

But I'm curious if anybody knows the background to this error. I know this character is used in mathematical logic, but is it some sort of undocumented escape character?

Hi,

Could you explain the need to use this special character? The issue might be with encoding, as not all special characters are recognised in every system or encoding format.

PJ

1 Like

It's not specific to this special character and I got around the problem using a different character anyway. I was just curious if anybody knew the reason behind this character in particular not parsing. You're probably right in regards to it being an encoding issue.

Hi,

I'm just curious why you need a dummy character in a string. Is it because you need to mark or replace something?

PJ

1 Like

My intention was to try and replace certain spaces with a special character and then use strwrap() (or str_wrap()) to only break line at certain places rather than at every space. I'm trying to wrap equations written as strings, but I only want a break at a plus or a minus, not a times and there are spaces between al symbols and numbers. eg "(1 + 3 * x) * t - (1 - 2 * y) * t^2"

Hi,

I was thinking that was what you wanted to do, hence my question :slight_smile:
If I may suggest a likely more elegant solution, I'd recommend using regex to solve this together with the str_split function form the stringr package.

library(stringr)

myString = "(1 + 3 * x) * t - (1 - 2 * y) * t^2"

#Split before the plus or minus - positive lookahead
str_split(myString, "(?=\\+|\\-)")
#> [[1]]
#> [1] "(1 "            "+ 3 * x) * t "  "- (1 "          "- 2 * y) * t^2"

#Split after the plus or minus - positive lookbehind
str_split(myString, "(?<=\\+|\\-)")
#> [[1]]
#> [1] "(1 +"          " 3 * x) * t -" " (1 -"         " 2 * y) * t^2"

#Split and remove the plus or minus
str_split(myString, "\\+|\\-")
#> [[1]]
#> [1] "(1 "           " 3 * x) * t "  " (1 "          " 2 * y) * t^2"

Created on 2020-08-13 by the reprex package (v0.3.0)

There a lot more regex can do, so you can create even more complex regex to get more detailed braking patterns. Feel free to provide more details if you like me to help out. Regex takes a while to grasp, but is so powerful once you get to know it!

Hope this helps,
PJ

1 Like

The problem wasn't that I wanted to just split at every plus or minus, I wanted to wrap with a certain maximum width. This is what str_wrap() is designed for, but it doesn't let you specify what characters you will allow a break to happen. It doesn't break up words, since that's what it's designed for, so it optimally choose cut-points.

So my plan was to replace all the spaces in the string with the dummy character, "¬", then replace the newly formed "¬+¬" and "¬-¬" with " +¬" and " -¬". then pass through str_wrap() which should only be able to choose these spaces as points to wrap at. Then at the end, sub out the dummy "¬" for " " again.

Very convoluted, I know. But I couldn't find a better way to induce a word-wrap and specify which characters I wanted to word-wrap at.

In the end, this process didn't work because I think str_wrap() is allowed to word wrap at non-space characters anyway (such as the + and - symbols), so to do this kind of algorithm, would involve a lot of dummy characters (and I'd have to ensure they're non-linebreaking characters too).

In the end, I just wrote my own (semi-pseudo here):

  • Match the pattern: pattern <- "( \\+ )|( - )"
  • Find the pattern matches with matches <- gregexpr(pattern,str))[[1]]
  • If we want the split after the matched string, add the match.length attribute
  • Find the last position that is less than the specified width
position <- min(matches[which(matches > width)[1L] - 1L], width + 1L, na.rm=T)`
  • Save the string before that position res <- c(res,substring(str,1,position-1))
  • Repeat the process with the remaining string str <- substring(str,position) until we're done: while(nchar(str) > width)
  • Also, throw a warning if a split had to be forced because the first substring is longer than the permitted width

(My actual code is more complicated, vectorised and is in a function that allows to specify the regex pattern)

HI,

I understand now :slight_smile:
I tried once a similar thing, and it can be a headache. This issue with custom implementations is that sometimes you don't think of certain scenarios and end up with a weird result. For example, if you don't have + or - in the string, but it's longer than the max, where should it break?

The stringi package (which underlies the stringr) has more details on how splits are generated and it seems theoretically you can write your own rules using the stri_opts_brkiter. They say it's for advanced users, and it seems indeed it is, but maybe it's worth looking into?

Good luck!
PJ

1 Like

Yeah, it's a complicated thing to figure out. I just had a quick look at those bits of stringi and that looks wayy too deep for what I need. Although it may constitute a future rabbit-hole (and then inevitably re-reading my old code and crying at how terribly it's written once I know how to define my own characters/words in an ICU context).

My function seems to work okay. Since I'm wanting my text to be presented within a box in the console, I've made sure that the width is a maximum width and is strictly adhered to, so if any do overflow, it forces a break at the width (even if it's in the middle of a word). This works when there is no matching string (although I hadn't considered that and had to change an x into x[x>0] just in case, so thanks for the heads-up). Hopefully I've covered enough edge-cases.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.