Comparing String Version Numbers

Does someone have an explanation of how logical operators work with strings?

It don't understand why the following seems to work:

a <- "0.0.1"
b <- "0.0.5"
c <- "0.1.0"
            
a < b       
#> [1] TRUE
a < c       
#> [1] TRUE
b < c       
#> [1] TRUE
            
a > b       
#> [1] FALSE
a > c       
#> [1] FALSE
b > c       
#> [1] FALSE

I'm not really 100% clear, but from the help file for comparison (e.g. ?"<") I found this:

Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see locales. The collating sequence of locales such as en_US is normally different from C (which should use ASCII) and can be surprising. Beware of making any assumptions about the collation order: e.g. in Estonian Z comes between S and T, and collation is not necessarily character-by-character – in Danish aa sorts as a single letter, after z. In Welsh ng may or may not be a single sorting unit: if it is it follows g. Some platforms may not respect the locale and always sort in numerical order of the bytes in an 8-bit locale, or in Unicode code-point order for a UTF-8 locale (and may not sort in the same order for the same language in different character sets). Collation of non-letters (spaces, punctuation signs, hyphens, fractions and so on) is even more problematic.

Character strings can be compared with different marked encodings (see Encoding): they are translated to UTF-8 before comparison.

So my assumption is that whatever locale your system is set to, 0.0.1 evaluates as less than 0.05 (as I might expect it to in English) and so on.

Sorry I can't be more concrete, but I hope that helps.

3 Likes

Just adding a link to this answer re. comparison functions

https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/Comparison

1 Like

It's not clear to me what the results you show don't seem to be understandable. As @jim89 said you are comparing strings and strings are compared using lexical, sometimes called dictionary, sort order.

Pointing out specifically what it is you don't understand about the results or at least what you expected the results to be would help us a lot to understand what the issue you are running into is.

You can sort the strings you are using to see what the lexical order is by using stringr:str_sort.

suppressPackageStartupMessages(library(tidyverse))
a <- "0.0.1"
b <- "0.0.5"
c <- "0.1.0"

v <- c(a, b, c)
v
#> [1] "0.0.1" "0.0.5" "0.1.0"
# sorted using current locale
str_sort(v)
#> [1] "0.0.1" "0.0.5" "0.1.0"
# sorted using French
str_sort(v, locale = "fr_FR")
#> [1] "0.0.1" "0.0.5" "0.1.0"

Created on 2018-03-09 by the reprex package (v0.2.0).

Both of these sorts show in my locale (en_US) and France lexically a is less than b and b is less than c. That is the same as what the less than comparisons in your examples show

suppressPackageStartupMessages(library(tidyverse))
a <- "0.0.1"
b <- "0.0.5"
c <- "0.1.0"
a < b  # a less than b      
#> [1] TRUE
a < c  # a less than c
#> [1] TRUE
b < c  # b less than c
#> [1] TRUE

Created on 2018-03-09 by the reprex package (v0.2.0).

2 Likes

If your goal is to compare strings that represent version-numbers, you might be interested in the base function compareVersion: https://www.rdocumentation.org/packages/utils/versions/3.4.3/topics/compareVersion

3 Likes

Thanks everyone.

I could have better phrased my question, but didn't have the knowledge to do so (though I should have thought to check the ">" documentation, I didn't :pensive: ) "How does R determine the equality of characters?" would have been much clearer.

Jim's answer provided the much needed help in understanding how R determines whether a character is >, <, or == another character... As an aside this does seem to be an unreliable method of analysis, as is stated.

Ian's answer should win too, it was right on the :moneybag:! For some reason my subconscious didn't think base would have this implemented.

1 Like

I am continually surprised by the gems that that turn up in base R :slight_smile:

And compareVersion() simply works and is super easy to implement! Thanks for the enlightenment

image

1 Like