Hello everyone,
I ran the following piece of code, which took about 18 hrs to complete.
library(tidyverse)
library(mclm) #a package for e.g. concordancing developed at a university
ES_LLT <- get_fnames("Txt")%>%
print(n=10) #There are two large .txt files in this folder.
pattern <- r"--[(?xi) <b>.*</b>]--" #The code for the tagged 'matches' I was looking for in the text.
cd_ES_LLT <- ES_LLT %>%
conc(pattern, c_left = 5000, c_right = 5000) #Making the concordance list with left and right context, 5000 characters each ("Step 1").
cd_ES_LLT <- cd_ES_LLT %>%
separate(left,
into = c("reference", "left"), sep = "</t>(?!.*(</t))") #Splitting the text name (reference) from the left context ("Step 2").
cd_ES_LLT <- cd_ES_LLT %>%
separate(reference,
into = c(NA, "reference"), sep = "<t>(?!.*(<t>))") #Trimming the text name.
cd_ES_LLT <- cd_ES_LLT %>%
separate(right,
into = c("right", NA), sep = "<t>") #Trimming the right context.
write_conc(cd_ES_LLT, "ES_LLT.tsv") #Exporting the concordance list to a .tsv file
The working directory (which includes a folder with the .txt files) is on a server location for which I need a VPN connection. After running the whole code, the console displayed the following error between Step 1 and Step 2:
Warning message: File monitoring failed for project at "(server location)"
Error 2 (The system cannot find the specified file)
Features disabled: R source file indexing, Diagnostics
However, R continued to run the code and I ended up with a perfectly fine .tsv file. The only problem is: can I be sure that it is complete? That is, does the concordance list contain all the matches it should have found? I could verify the file by running the code again, but I have more of these codes and it would be useful to know for sure for future reference.