bug or feature?: data viewer renders repeated whitespaces as single whitespace

rstudio
#1

I just figured out why some strings were not matched by a regular expression and learned that the data viewer is rendering repeated whitespaces as a single one (from now on I will be using str_squish().).

example:

View(rbind("foo bar", "foo  bar"))

I imagine there might be datatypes where this is intended? probably related: https://datatables.net/forums/discussion/43122/space-in-fields

0 Likes

#2

Hi marco, I noticed this as well but I actually used it to my advantage when I was importing messy data that were originally .txt files. To scrape it in, I had to work around a lot of arbitrary, inconsistent white spaces. I can see how it can be an obstacle when you want to match exact whitespace lengths.

My stringr cheat sheet also says that for regular expressions, \s means "any whitespace", while [:space:] means "space characters" and [:blank:] means "space and tab (but not new line)". I wonder if you can match a [:space:]{1} pattern where {n} quantifies "exactly n times."

Not sure if [:space:] is effectively the same as \s.

0 Likes

#3

This is a bug in old versions of the data viewer:

It's fixed in the current release of RStudio (1.2).

0 Likes

#4

I'm pretty sure [[:space:]] is equivalent to \s.

And for the quantifier: that would be what the {n} is for, right? e.g.

string <- c("foo bar", "foo  bar")
stringr::str_detect(string, "[[:space:]]{2}")
#> [1] FALSE  TRUE

Created on 2019-04-18 by the reprex package (v0.2.1)

0 Likes

#5

Ah great. thank you.

0 Likes