counting frequencies of multiple features in multiple EXL files using For loops

Hi, RStudio community!
I want to ask you about how to rewrite the for-loop function.

I wanted to count 39 features tagged in 1,115 XML files. Since both features and files are large, I used the for-loop function to iterate the counting and create a frequency table of 40 columns (39 features + filename) and 1,115 rows (XML text files). Many kind R wizards helped me write the following script.

library(xml2)

# create a frequency table

freqTable <- data.frame(matrix(NA, ncol=40, nrow=0))

a <- read.csv('39 syntactic complexity measures.csv')
a <- c(a$tags)

colnames(freqTable) <- a

freqTable[, 2:40] <- sapply(freqTable[, 2:40], as.numeric)

# list all files in folder:

  myFiles <- list.files(path = "~/text_parsed", full.names = FALSE, pattern=".xml")
 
#  iterate counting of 39 features in 1,115 files

 for (i in 1:length(myFiles)) {
   print(myFiles[i])
   freqTable[i,1] <- myFiles[i]
   text <- read_xml(x = myFiles[i])
   dependencies <- xml_find_all(text, './/dependencies')
   collapsed <- dependencies[grep('collapsed-dependencies', dependencies)]
   deps <- xml_find_all(collapsed, './/dep')
   for(j in 1:length(a)) {
     MySuperTag <- deps[grep('type="paste0(a,i)"', deps)]
     freqTable[i,j+1] <- length(MySuperTag)  
   }
   
 } 

However, when the script above runs, the output (a frequency table) returns only 0 values for all the columns. I think the function 'paste' within the second for-loop needs to be changed. But I am not sure which function should work.

I really appreciate any help you can provide.

Looks like your attributes have other names than you are using in grep :slight_smile:

Assuming, that a is a list like: a <- c("advcl", "advmod", "aux") then try to replace

MySuperTag <- deps[grep('type="paste0(a,i)"', deps)]

with

MySuperTag <- deps[grep(paste("'type=\"", a[i], "\"'", sep=""), deps)]

First of all I would test it on one file, checking if MySuperTag returns expected value.

G.

1 Like

Hi gsapijaszko,
Thank you so much for your kind help!
The code still returned zeros after replacing the bit, so I converted a into a list (the class of a was just character, but after using the function as.list, it became a list). the rewritten script is as below.

library(xml2)

#create a frequency table

freqTable <- data.frame(matrix(NA, ncol=40, nrow=0))

a <- read.csv('39 syntactic complexity measures.csv')
a <- c(a$tags)

colnames(freqTable) <- a

a <- as.list(a)

freqTable[, 2:40] <- sapply(freqTable[, 2:40], as.numeric)

#list all files in folder:

myFiles <- list.files(path = "~/text_parsed", full.names = FALSE, pattern=".xml")


for (i in 1:length(myFiles)) {
  print(myFiles[i])
  freqTable[i,1] <- myFiles[i]
  text <- read_xml(x = myFiles[i])
  dependencies <- xml_find_all(text, './/dependencies')
  collapsed <- dependencies[grep('collapsed-dependencies', dependencies)]
  deps <- xml_find_all(collapsed, './/dep')
  for(j in 1:length(a)) {
    MySuperTag <- deps[grep(paste("'type=\"", a[i], "\"'", sep=""), deps)]
    freqTable[i,j+1] <- length(MySuperTag)  
  }
}

What am I missing? Thank you so much for helping me :smiley:

Mai we have an example XML file, please? It will be much easier.

Hi again,
Thank you so much for considering my question!!
One of the XML file looks like:
image

Unfortunately, I don't seem to be allowed to upload XML files here :neutral_face:
I don't think the screenshot of the XML file above can be of much help, though.

In this XML file, I was trying to count the frequencies of each of "nsubj", "poss", and "dobj", and so on.
Would it be possible to send you via email?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.