Similarity between sents in two columns

I have two columns in a data frame occ1 and occ2 and I want to know their similarity. Like first sent in occ1 against all sentences in occ2, second sent in occ1 against all sentences occ2...
Cosine similarity or jw similarity

OCC1 = c(" Appoint department heads or managers and assign or delegate responsibilities to them", "Directing or coordinating business activities involved in the purchase or sale of investment products or financial services",  "Analyze operations to assess the performance of a company or its staff in meeting objectives or to determine areas of potential cost reduction, program improvement, or policy change", "Directing, planning or implementing policies, objectives or activities of organizations or businesses to ensure continuity of operations, maximize return on investment or increase productivity", "Negotiate or approve contracts or agreements with suppliers, distributors, federal or state agencies or other organizational entities", "Coordinate the development or implementation of budget control systems, record keeping systems or other administrative control processes")
OCC2 = c("Define unit to participate in the production process", "Recommend types of investments to make", " Analyze political-economic, national and international trends", " Analyze industry of potential customers", " Implement human resources development policy", " Supervise the execution of commercial, industrial, administrative and financial activity plans", "Discuss results and their corrections with direct reports", "manage conflicts", " Manage the implementation of the quality system"," analyze scenarios", " Plan contracting services")


max_ln <- max(c(length(OCC1), length(OCC2)))
gfg_data<- data.frame(col1 = c(OCC1,rep(NA, max_ln - length(OCC1))),
                      col2 = c(OCC2,rep(NA, max_ln - length(OCC2))))
gfg_data
is.data.frame((gfg_data))


library(textTinyR)

#some text
vec1 = c('use this', 'function to compute the',"a")
vec2 = c('use this', 'between to sequences',"b")

#cosine similarity of one text to the other
(out = COS_TEXT(text_vector1 = vec1, text_vector2 = vec2, separator = " "))

#for all combinations of vec1's parts against vec2's parts we do:
(df_of_all_combinations <- expand.grid(vec1,vec2, stringsAsFactors = FALSE))

df_of_all_combinations$cos_sim_val <- COS_TEXT(text_vector1 = 
                                                 df_of_all_combinations$Var1,
                                               text_vector2 = 
                                                 df_of_all_combinations$Var2)
df_of_all_combinations

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.