Error when using the conText() function in the conText package

w5698 · March 31, 2022, 6:58pm

I am using the new conText package in R to run a context embedding regression model. This model allows me to assess whether the context in which a focal word appears -- the words before and after it -- varies as a function of covariates. Below I provide the code I have written thus far:

# load packages
library(quanteda)
library(ldatuning)
library(topicmodels)
library(tidytext)
library(tidyverse)
library(parallel)
library(conText)
library(data.table)
library(text2vec)

# load speeches
speeches <- read_csv("speeches_final.csv")

# create corpus
# preparing speeches
speeches$text <- as.character(speeches$text)
speeches$docnames <- seq.int(nrow(speeches))
speeches_corpus <- quanteda::corpus(speeches,text_field ="text")

# tokenize corpus removing unnecessary (i.e. semantically uninformative) elements
toks <- tokens(speeches_corpus, remove_punct = T, remove_symbols = T, remove_numbers = T, 
               remove_separators = T)

# clean out stopwords and words with 2 or fewer characters
toks_nostop <- tokens_select(toks, pattern = stopwords("ru", source = "snowball"), selection = "remove",
                             min_nchar = 3
)

# only use features that appear at least 5 times in the corpus
feats <- dfm(toks_nostop, tolower = T, verbose = TRUE) %>% dfm_trim(min_termfreq = 5) %>% 
  featnames()

# leave the pads so that non-adjacent words will not become adjacent
toks <- tokens_select(toks_nostop, feats, padding = TRUE)

# build a tokenized corpus of contexts sorrounding the target term 'economy'
economy_toks <- tokens_context(x = toks, pattern = "экономи*", window = 6L)

# build document-feature matrix
economy_dfm <- dfm(economy_toks)
economy_dfm[1:3, 1:3]

# construct the feature co-occurrence matrix for our toks object (see above)
toks_fcm <- fcm(toks, context = "window", window = 6, count = "frequency", tri = FALSE)

# estimate glove model using text2vec
glove <- GlobalVectors$new(rank = 300, x_max = 10, learning_rate = 0.05)
wv_main <- glove$fit_transform(toks_fcm, n_iter = 10, convergence_tol = 0.001, n_threads = parallel::detectCores())  # set to 'parallel::detectCores()' to use all available cores

wv_context <- glove$components
local_glove <- wv_main + t(wv_context)  # word vectors

local_transform <- compute_transform(x = toks_fcm, pre_trained = local_glove, weighting = "log")

All of the above code executes without issue. The problem occurs when I try to run the next chunk of code, the actual conText model. In this case, my focal word is экономи* (Russian for economy) and my covariates are dummy variables for date and party affiliation.

# run the context embedding regression model
set.seed(2021L)
model1 <- conText(formula = "экономи*" ~ Date_dummy + party_ur,
                  data = toks,
                  pre_trained = local_glove,
                  transform = TRUE, transform_matrix = local_transform,
                  bootstrap = TRUE, num_bootstraps = 10,
                  permute = TRUE, num_permutations = 100,
                  window = 6L, case_insensitive = TRUE,
                  verbose = TRUE)

When I run this code, I receive the following error message: Error in solve.default(t(X_mat) %*% X_mat) : system is computationally singular: reciprocal condition number = 0. This suggests that the design matrix is not invertible. I have performed checks to make sure that my variables are not collinear. I have tried debugging the code to no avail. I am truly lost as to what is going on here. Note that when I run the above model with Date_dummy as the only covariate, I do get results. This leads me to believe that something is going on with the party_ur variable. I am happy to provide my full code and data if that would help. Any feedback would be greatly appreciated.

system · April 21, 2022, 6:59pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.