Error when using the conText() function in the conText package

I am using the new conText package in R to run a context embedding regression model. This model allows me to assess whether the context in which a focal word appears -- the words before and after it -- varies as a function of covariates. Below I provide the code I have written thus far:

# load packages
library(quanteda)
library(ldatuning)
library(topicmodels)
library(tidytext)
library(tidyverse)
library(parallel)
library(conText)
library(data.table)
library(text2vec)

# load speeches
speeches <- read_csv("speeches_final.csv")

# create corpus
# preparing speeches
speeches$text <- as.character(speeches$text)
speeches$docnames <- seq.int(nrow(speeches))
speeches_corpus <- quanteda::corpus(speeches,text_field ="text")

# tokenize corpus removing unnecessary (i.e. semantically uninformative) elements
toks <- tokens(speeches_corpus, remove_punct = T, remove_symbols = T, remove_numbers = T, 
               remove_separators = T)

# clean out stopwords and words with 2 or fewer characters
toks_nostop <- tokens_select(toks, pattern = stopwords("ru", source = "snowball"), selection = "remove",
                             min_nchar = 3
)

# only use features that appear at least 5 times in the corpus
feats <- dfm(toks_nostop, tolower = T, verbose = TRUE) %>% dfm_trim(min_termfreq = 5) %>% 
  featnames()

# leave the pads so that non-adjacent words will not become adjacent
toks <- tokens_select(toks_nostop, feats, padding = TRUE)

# build a tokenized corpus of contexts sorrounding the target term 'economy'
economy_toks <- tokens_context(x = toks, pattern = "экономи*", window = 6L)

# build document-feature matrix
economy_dfm <- dfm(economy_toks)
economy_dfm[1:3, 1:3]

# construct the feature co-occurrence matrix for our toks object (see above)
toks_fcm <- fcm(toks, context = "window", window = 6, count = "frequency", tri = FALSE)

# estimate glove model using text2vec
glove <- GlobalVectors$new(rank = 300, x_max = 10, learning_rate = 0.05)
wv_main <- glove$fit_transform(toks_fcm, n_iter = 10, convergence_tol = 0.001, n_threads = parallel::detectCores())  # set to 'parallel::detectCores()' to use all available cores

wv_context <- glove$components
local_glove <- wv_main + t(wv_context)  # word vectors

local_transform <- compute_transform(x = toks_fcm, pre_trained = local_glove, weighting = "log")

All of the above code executes without issue. The problem occurs when I try to run the next chunk of code, the actual conText model. In this case, my focal word is экономи* (Russian for economy) and my covariates are dummy variables for date and party affiliation.

# run the context embedding regression model
set.seed(2021L)
model1 <- conText(formula = "экономи*" ~ Date_dummy + party_ur,
                  data = toks,
                  pre_trained = local_glove,
                  transform = TRUE, transform_matrix = local_transform,
                  bootstrap = TRUE, num_bootstraps = 10,
                  permute = TRUE, num_permutations = 100,
                  window = 6L, case_insensitive = TRUE,
                  verbose = TRUE)

When I run this code, I receive the following error message: Error in solve.default(t(X_mat) %*% X_mat) : system is computationally singular: reciprocal condition number = 0. This suggests that the design matrix is not invertible. I have performed checks to make sure that my variables are not collinear. I have tried debugging the code to no avail. I am truly lost as to what is going on here. Note that when I run the above model with Date_dummy as the only covariate, I do get results. This leads me to believe that something is going on with the party_ur variable. I am happy to provide my full code and data if that would help. Any feedback would be greatly appreciated.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.