For example, I have a string of text:
Hello, I have three apples and fourteen pears.
And I want the output to be:
Hello, I have 3 apples and 14 pears.
I found this function, but I was hoping there is a more elegant way out there.
For example, I have a string of text:
Hello, I have three apples and fourteen pears.
And I want the output to be:
Hello, I have 3 apples and 14 pears.
I found this function, but I was hoping there is a more elegant way out there.
I'm not sure this qualifies as more elegant, but I do think it generalizes nicely.
Consider that numbers repeat themselves cyclically over the the range of 1 to 9, we really only need to know how to translate the text ranging from "one" to "ninety nine". Anything larger is just a matter of adding a scalar to the translation. For example:
55 = 50 + 5
155 = (1 * 100) + 50 + 5
1155 = (1 * 1000) + (1 * 100) + 50 + 5
for larger numbers
155,155 = (100 + 50 + 5) * 1000 + (100 + 50 + 5)
The code below seems to behave reasonably well for translating text numbers to their numeric representations and is configured to work up to the quadrilions. By adding magnitudes to magnitude_reference
, it could be extended further. But I suspect it will lose precision somewhere along the order of 4.5 quadrillion.
I've added a fully vectorized version of this to my playground package at https://github.com/nutterb/Bluegrass/blob/devel-main/R/word_to_number.R
word_to_number <- function(x){
# Remove punctuation and 'and'
x <- tolower(gsub("([[:punct:]]| and )", " ", x))
# separate into distinct words
x <- trimws(unlist(strsplit(x, "\\s+")))
# verify that all words are found in the reference vectors.
if (!(all(x %in% names(c(word_to_number_reference, magnitude_reference)))))
stop("Text found that is not compatible with conversion. Check your spelling?")
# translate words to the numeric reference
num <- c(word_to_number_reference, magnitude_reference)[x]
# Identify positions with a magnitude indicator
magnitude_at <-
which(names(num) %in%
c("quadrillion", "trillion", "billion",
"million", "thousand"))
# Create an indexing vector for each magnitude class of the number
magnitude_index <-
cut(seq_along(num),
breaks = unique(c(0, magnitude_at, length(num))))
# Make a list with each magnitude
num_component <-
lapply(unique(magnitude_index),
FUN = function(i) num[magnitude_index == i])
# Transate each component
num_component <-
vapply(num_component,
FUN = word_to_number_translate_hundred,
FUN.VALUE = numeric(1))
# Add the components together
num <- sum(num_component)
if (is.na(num))
warning(sprintf("Unable to translate %s", x))
num
}
word_to_number_translate_hundred <- function(n){
# set a magnitude multiplier for thousands and greater
if (tail(names(n), 1) %in% names(magnitude_reference)){
magnitude <- tail(n, 1)
n <- head(n, -1)
} else {
magnitude <- 1
}
# if hundred appears anywhere but the second position or of the
# value preceding hundred is greater than 9, handle with care
# (for instance, 1200)
if ( ("hundred" %in% names(n) && which(names(n) == "hundred") != 2) ||
("hundred" %in% names(n) && n[1] > 1) )
{
which_hundred <- which(names(n) == "hundred")
(sum(n[seq_along(n) < which_hundred]) * 100 +
sum(n[seq_along(n) > which_hundred])) * magnitude
} else {
op <- rep("+", length(n) - 1)
op[names(n)[-1] == "hundred"] <- "*"
op <- c(op, "")
eval(parse(text = paste(paste(n, op), collapse = " "))) * magnitude
}
}
word_to_number_reference <-
c("zero" = 0,
"one" = 1,
"two" = 2,
"three" = 3,
"four" = 4,
"five" = 5,
"six" = 6,
"seven" = 7,
"eight" = 8,
"nine" = 9,
"ten" = 10,
"eleven" = 11,
"twelve" = 12,
"thirteen" = 13,
"fourteen" = 14,
"fifteen" = 15,
"sixteen" = 16,
"seventeen" = 17,
"eighteen" = 18,
"nineteen" = 19,
"twenty" = 20,
"thirty" = 30,
"forty" = 40,
"fifty" = 50,
"sixty" = 60,
"seventy" = 70,
"eighty" = 80,
"ninety" = 90,
"hundred" = 100)
magnitude_reference <-
c("thousand" = 1000,
"million" = 1e6,
"billion" = 1e9,
"trillion" = 1e12,
"quadrillion" = 1e15)