distance between two documents with jaccard distance

Yarnabrina · March 25, 2019, 6:21pm

This is, once again, not a reprex.

You may ask why. Here are a few reasons:

We don't have access to your local files. I understand that the allowable file formats are limited, but you could have written a code to write those files, and then reading it later.
You haven't included any library call. I (and possible many others on this community) do not know what is a jaccard distance. You used a function textrank_jaccard, but have not mentioned its package. I'm guessing textrank, but it may not be the case.
What is top? I've no idea regarding this one.

Please go through the reprex guide. A minimal reproducible example helps others to figure out what problems you may have been facing, and consequently, to help you.

There are a few problems with your code.

files_names3 is a vector. You can't use nrow with it.
You used i in both the for loops.
Why are you using all <- ''? It's a character vector, and you can't add rows with this later.
I'm unable to figure out why do you expect that output will be in that format in your post. The documentation says it returns a single number, so why a tuple? Also, as all the files are identical, why do you think different values will be produced? I've no idea regarding this particular distance measure, but I don't think this is how it is expected to behave.
This is not a problem, but 1.txt, 2.txt, 3.txt as names of some objects is probably a bad idea. It's very confusing in my opinion.

Since it is your first post, I'm making providing a reprex after modifying your code a little bit.

a working code

# loading required library
library(textrank)

# creating files
write.table(x = "ok, good, funny",
            file = "1.txt",
            row.names = FALSE,
            col.names = FALSE)
write.table(x = "ok, good, funny",
            file = "2.txt",
            row.names = FALSE,
            col.names = FALSE)
write.table(x = "ok, good, funny",
            file = "3.txt",
            row.names = FALSE,
            col.names = FALSE)

# listing files
file_names <- list.files(pattern="*.txt")

# reading files
file_contents <- vector(mode = "list",
                        length = length(x = file_names))

for (i in seq_len(length.out = length(x = file_names)))
{
  file_contents[[i]] <- read.delim(file = file_names[i])
}

# calculation
all <- matrix(ncol = 3,
              nrow = ((length(x = file_names)) ^ 2))

for(i in seq_len(length.out = length(x = file_names)))
{
  for(j in seq_len(length.out = length(x = file_names)))
  {
    all[((i - 1) * length(x = file_names) + j), ] <- c(file_names[i], file_names[j], textrank_jaccard(termsa = file_contents[[i]],
                                                                                                      termsb = file_contents[[j]]))
  }
}

all <- as.data.frame(x = all)

all
#>      V1    V2 V3
#> 1 1.txt 1.txt  1
#> 2 1.txt 2.txt  1
#> 3 1.txt 3.txt  1
#> 4 2.txt 1.txt  1
#> 5 2.txt 2.txt  1
#> 6 2.txt 3.txt  1
#> 7 3.txt 1.txt  1
#> 8 3.txt 2.txt  1
#> 9 3.txt 3.txt  1

^{Created on 2019-03-25 by the reprex package (v0.2.1)}