 # Detecting complete and partial patterns in vector of numbers

Hello,

I am looking for some ways to find patterns in set of common numbers. All the numbers can only contain values from 1 to 5. I unfortunately don't know all the ways in which patterns will present themselves but I want to at least pick up partial patterns and quantify it.

Below I have added some sequences. Any idea how to approach this?

``````#should find the repetition of 1,2,3
c(1,2,3,1,2,3,1,2,3)

#should find the repetition of 3,2,1
c(3,2,1,3,2,1,3,2,1)

#should find the 3,2 and 5,3 repititon
c(3,2,3,2,5,3,5,3,4)

#should find the larger pattern 1,4,5,4,1 and or 5,4,1 repeating
c(1,4,5,4,1,5,4,1,2)

#should not find any patterns
c(5,3,1,4,2,1,1,3,4)

``````
``````library(tidyverse)
library(purrr)

get_ngram <- function(numvec,len){
result <- NA

inside_count <- length(numvec)-len+1

if(inside_count>1) {

first_pass <- list()
iloop <- seq_len(inside_count)
for(i in iloop){

first_pass[[i]] <- numvec[i:(i+len -1 )]
}
second_pass <- unique(first_pass[duplicated(first_pass)])
if(length(second_pass)>0)
result<-second_pass
}

result

}

get_ngrams <- function(numvec){
l <- 2:(length(numvec)-1)
map(l,
~get_ngram(numvec,.)) %>% set_names(paste0("length_",l))

}

get_ngrams(c(1,2,3,1,2,3,1,2,3))

#should find the repetition of 3,2,1
get_ngrams(c(3,2,1,3,2,1,3,2,1))

#should find the 3,2 and 5,3 repititon
get_ngrams(c(3,2,3,2,5,3,5,3,4))

#should find the larger pattern 1,4,5,4,1 and or 5,4,1 repeating
get_ngrams(c(1,4,5,4,1,5,4,1,2))
#actually only 541 exists, there is no 14541 pattern to find

#should not find any patterns
get_ngrams(c(5,3,1,4,2,1,1,3,4))``````
1 Like

Ahhh you're amazing! It is like you have answers to everything I understand your point regarding the c(1,4,5,4,1) example. Is there some way in R to get R to work out which number should likely come next in a set so if we say had c(1,4,5,4,x) that it would substitute in 1? I want to be able to find less obvious patterns like that too. Ngrams in general make a lot of sense for this. I think in part I am going to use your solution and flip the set around to read it from right to left as well (given it doesn't feature as parts of a word here)

Yes, that would be it - basically a detectable pattern. In some cases I will have lengths up to 20 long in a respective vector. I was hoping there was some sort of mathmatical solver or way to run some clever set of `diff` to derive that set. I suppose to fit a `lm` or such wouldn't work as you can't know the shape of that line beforehand or readily find a way to solve it either?