Arguments imply differing number of rows. data.frame() Error / STM

Hello,

I am relatively new to R and and Im currently working on a structural topic model of a big amount of news headlines/leads.

I am trying to preprocess the data I have collected using the quanteda package but I produce the following error:

Error in data.frame(doc_id = rownames(iran$created_time, ), iran$created_time, : Arguments imply differing number of rows: 0, 67172

Here is my code:

iran <- read.csv("iran_complete.csv", encoding = "UTF-8") #reimporting the data I have previously collected and saved as CSV file. Three columns: created_time, snippet & headline. 67172 observations.
iran.meta <- data.frame(doc_id=rownames(iran$created_time,), iran$created_time, snippet=iran$snippet, headline=iran$headline, stringsAsFactors=FALSE)

I guess I have to convert the created_time column into a row as I later want to use it as the metadata in my stm analysis but this turns out be really tricky for me.

Any help is appreciated as I tried to solve this using other threads on this issues but without success.

Best
mmm

I do not understand what you are trying to do with the following part of your data.frame() function call
doc_id=rownames(iran$created_time,)
Are you trying to set the row names of the new data frame iran.meta? The vector iran$created_time does not have row names, so rownames(iran$created_time,) is returning a zero-length vector or NULL and causing your problem.

My goal here is to create a data frame for further preprocessing of my raw data to then use it in stm. This data frame should ideally consist of rows indicating the time when an article was published (created_time) and two columns (head, leads) for the respective article/rows with time of publication. My overall plan is to use the created time rows as the the document-level metadata in my structure topic model later on.

The function read.csv() returns a data frame and it looks like that has a column named create_time. What are you trying to do by making the data frame iran.meta? It looks like iran.meta will contain the three columns from iran plus a new column named doc_id. What do you want that column to be?

With iran.meta I tried to create a new data.frama consisting of two column (heads, leads) and the rows to be create_time. The doc_id column was a mistake. So I guess, its basically about converting the column create_time into row(s) create_time.

There are functions for pivoting data so that columns become rows or vice versa but I do not see how that would work in this case. Please post the output of running

head(iran)

or show a simplified version of iran and then show how that data frame would look after you convert create_time from a column to a row. For example, here is my understanding of the form of the data frame iran.

> iran <- data.frame(create_time = c("2020-02-03", "2020-02-06", "2020-03-01"),
                    heads = c("A", "B", "C"), leads = c("X", "Y", "Z"))
> iran
  create_time heads leads
1  2020-02-03     A     X
2  2020-02-06     B     Y
3  2020-03-01     C     Z

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.