Assign Letters with Values

Hello!

My question might be very simple and a bit stupid but I cant find the answer as I probably dont really know what I am looking for.

Easy explanation: I have two characters, for example "AAB" and "ABC" and now I want to assign a certain "value" to it but every position it self since the values change for each positon.
first position: if it is a A we get 3 points, if it would be a B we get 2 points, if it i a C we get 1 point
second position: if it is a A we get 2 points, if it would be a B we get 3 points, if it i a C we get 1 point
third position: if it is a A we get 2 points, if it would be a B we get 1 points, if it i a C we get 3 point

outcome: AAB = 3 + 2 + 1 = 6
ABC= 3 + 3 + 3 = 9

Maybe someone can help me how to calculate that in R studio. That would really really help :slight_smile:

Let's break it down, you want to:

  • separate each "word" into a set of letters (i.e. get a vector of letters rather than a long string)
  • give a value to each letter
  • apply operations on these values (here, a sum)

So that can be done this way:

x <- "ABC"

strsplit(x, split = "")[[1]] |>
  dplyr::recode(A = 2,
                B = 1,
                C = 3) |>
  sum()

If you want to avoid {dplyr}, you can also do the same with a lookup table (a named vector for example):

letter_values <- setNames(c(2,1,3),
                          LETTERS[1:3])

sum_letters <- function(x){
  characters <- strsplit(x, split = "")[[1]]
  values <- letter_values[characters]
  sum(values)
}

sum_letters("ABC")
2 Likes

I would argue that, since you demand different values for different position of the same letters, you need to have a dictionary (e.g. a Look-up-Table). My solution is working and can be expanded, by simply expanding the dictionary you use. The downside is, that it relies on a for-loop and could be somewhat slow if applied to thousands of rows.

# a Look up table for the differing values
# rows are letters, columns are values
LuT <- matrix(
  data = c(3,2,2,
           2,3,1,
           1,1,3),
  ncol = 3, byrow = TRUE, dimnames = list(LETTERS[1:3],paste0('pos',1:3))
)

chr_to_num <- function(chr_vec, LuT){
  # get the positions and corresponding letters
  chr_mat <- strsplit(chr_vec, split = "")[[1]] |> as.matrix() |> `rownames<-`(paste0('pos',1:3))
  # apply the Look up Table with the wanted numeric values
  res <- vector(length = 1L)
  for (i in seq.default(1,ncol(LuT))){
    res <- res + LuT[[which(chr_mat[[i]] == rownames(LuT)),i]]
  }
  return(res)
}
chr_to_num('AAB', LuT = LuT)
#> [1] 6
chr_to_num('ABC', LuT = LuT)
#> [1] 9

Created on 2022-08-24 by the reprex package (v2.0.1)

Kind regards

1 Like

Thank you so much for your support. That was awesome :slight_smile:

However... I still have another problem. If I know put a list of characters into the function, it only returns the value for the first character.


LuT <- matrix(
  data = c(0,2,0,0,0,2,2,0,0,2,0,
           3,0,0,0,0,0,0,0,0,0,0,
           0,1,3,3,0,1,1,3,0,1,0,
           1,0,0,0,2,0,0,0,2,0,2),
  ncol = 11, byrow = TRUE, dimnames = list(LETTERS[1:4],paste0('pos',1:11))
)


chr_to_num <- function(chr_vec, LuT){
  chr_mat <- strsplit(chr_vec, split = "")[[1]] |> as.matrix() |> `rownames<-`(paste0('pos',1:11))
  res <- vector(length = 1)
  for (i in seq.default(1,ncol(LuT))){
    res <- res + LuT[[which(chr_mat[[i]] == rownames(LuT)),i]]
  }
  return(res)
}

lapply(ListWords, chr_to_num, LuT = LuT)

That gives me the following error:
Error in LuT[[which(chr_mat[[i]] == rownames(LuT)), i]] :
attempt to select less than one element in get1index

This error indicates that there is no value to get from the Look up table. This might be due to letters which are not defined in your LuT. I revisited the function and changed it, so that you have meaningful messages on return without an error:

# expanded LuT
LuT <- matrix(
  data = c(0,2,0,0,0,2,2,0,0,2,0,
           3,0,0,0,0,0,0,0,0,0,0,
           0,1,3,3,0,1,1,3,0,1,0,
           1,0,0,0,2,0,0,0,2,0,2),
  ncol = 11, byrow = TRUE, dimnames = list(LETTERS[1:4],paste0('pos',1:11))
)

chr_to_num_flex <- function(chr_vec, LuT){
  # determine if the number of cols of LuT is sufficient for the length of chr_vec
  if (nchar(chr_vec) > ncol(LuT)) return('LuT is not sufficiently large')
  # determine if all letters of chr_vec are present inside LuT
  present_letters <- unique(unlist(strsplit(chr_vec, '')))
  if (length(union(rownames(LuT),present_letters)) > length(rownames(LuT))) return('You have undefined letters in your string')
  
  # the relevant part
  chr_len <- nchar(chr_vec)
  LuT_adj <- LuT[,1:chr_len]
  # for variable length
  LuT_len <- ncol(LuT_adj)
  # get the positions and corresponding letters
  chr_mat <- strsplit(chr_vec, split = "")[[1]] |> as.matrix() |> `rownames<-`(paste0('pos',1:LuT_len))
  # apply the Look up Table with the wanted numeric values
  res <- vector(length = 1L)
  
  for (i in seq.default(1,ncol(LuT_adj))){
    # for every entry in chr_mat, check which letter it is by comparing with the rownames of LuT and assign corresponding value
    res <- res + LuT[[which(chr_mat[[i]] == rownames(LuT_adj)),i]]
  }
  
  return(res)
}

ListWords <- c(
  # good "words"
  'AAAAAAAAAAA','ABABABABABB','ACBCADAD',
  # too long words
  'AAAAAAAAAAAA','ABABABABABABAB',
  # out of range words
  'AAAAAAAAAE','ABABBBV')
lapply(ListWords, chr_to_num_flex, LuT)
#> [[1]]
#> [1] 8
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 6
#> 
#> [[4]]
#> [1] "LuT is not sufficiently large"
#> 
#> [[5]]
#> [1] "LuT is not sufficiently large"
#> 
#> [[6]]
#> [1] "You have undefined letters in your string"
#> 
#> [[7]]
#> [1] "You have undefined letters in your string"

Created on 2022-08-25 by the reprex package (v2.0.1)

Maybe this already solves your issue

Kind regards