Hi, RStudio community!
I want to ask you about how to rewrite the for-loop function.
I wanted to count 39 features tagged in 1,115 XML files. Since both features and files are large, I used the for-loop function to iterate the counting and create a frequency table of 40 columns (39 features + filename) and 1,115 rows (XML text files). Many kind R wizards helped me write the following script.
library(xml2)
# create a frequency table
freqTable <- data.frame(matrix(NA, ncol=40, nrow=0))
a <- read.csv('39 syntactic complexity measures.csv')
a <- c(a$tags)
colnames(freqTable) <- a
freqTable[, 2:40] <- sapply(freqTable[, 2:40], as.numeric)
# list all files in folder:
myFiles <- list.files(path = "~/text_parsed", full.names = FALSE, pattern=".xml")
# iterate counting of 39 features in 1,115 files
for (i in 1:length(myFiles)) {
print(myFiles[i])
freqTable[i,1] <- myFiles[i]
text <- read_xml(x = myFiles[i])
dependencies <- xml_find_all(text, './/dependencies')
collapsed <- dependencies[grep('collapsed-dependencies', dependencies)]
deps <- xml_find_all(collapsed, './/dep')
for(j in 1:length(a)) {
MySuperTag <- deps[grep('type="paste0(a,i)"', deps)]
freqTable[i,j+1] <- length(MySuperTag)
}
}
However, when the script above runs, the output (a frequency table) returns only 0 values for all the columns. I think the function 'paste' within the second for-loop needs to be changed. But I am not sure which function should work.
Hi gsapijaszko,
Thank you so much for your kind help!
The code still returned zeros after replacing the bit, so I converted a into a list (the class of a was just character, but after using the function as.list, it became a list). the rewritten script is as below.
library(xml2)
#create a frequency table
freqTable <- data.frame(matrix(NA, ncol=40, nrow=0))
a <- read.csv('39 syntactic complexity measures.csv')
a <- c(a$tags)
colnames(freqTable) <- a
a <- as.list(a)
freqTable[, 2:40] <- sapply(freqTable[, 2:40], as.numeric)
#list all files in folder:
myFiles <- list.files(path = "~/text_parsed", full.names = FALSE, pattern=".xml")
for (i in 1:length(myFiles)) {
print(myFiles[i])
freqTable[i,1] <- myFiles[i]
text <- read_xml(x = myFiles[i])
dependencies <- xml_find_all(text, './/dependencies')
collapsed <- dependencies[grep('collapsed-dependencies', dependencies)]
deps <- xml_find_all(collapsed, './/dep')
for(j in 1:length(a)) {
MySuperTag <- deps[grep(paste("'type=\"", a[i], "\"'", sep=""), deps)]
freqTable[i,j+1] <- length(MySuperTag)
}
}
What am I missing? Thank you so much for helping me