Double loop to create a dataframe

#1

Hello !
I am a newbie on R and i am face a problem that i hope someone here can help me ^^

I have several .txt document whose i want to study

I import and create several data frame for each document

folder <- "/Users/sylvain/Desktop/php/linking-interne-semantique/"      # path to folder that holds multiple .csv files
files_names3 <- list.files(path="/Users/sylvain/Desktop/php/linking-interne-semantique", pattern="*.txt")

for (i in 1:length(files_names3)){
  assign(files_names3[i],
         read.delim(files_names3[i])
  )}

I have in total, 33 documents, now i want to create a data frame, which each document will be compare each other.

Example : i have, " doc1.txt ", " doc2.txt ", "doc3.txt"

i need a result similar to :
doc1.txt doc1.txt
doc1.txt doc2.txt
doc1.txt doc3.txt
doc2.txt doc1.txt
doc2.txt doc2.txt
doc2.txt doc3.txt
doc3.txt doc1.txt
doc3.txt doc2.txt
doc3.txt doc3.txt

I have this code :

for (i in 1:length(files_names3)) {
  for (j in 1:length(files_names3)) {
   print(paste(files_names3[i],files_names3[j],sep=","))
  }
}

When i print, it works, but i dont know how to set the whole resultat to a dataframe

Any help ? :slight_smile:

Thx !!!!

0 Likes

#2

Welcome to the community!

I'm not sure, but is this what you want?

file_names <- c("doc1.txt ", "doc2.txt ", "doc3.txt")
df <- expand.grid(file_names, file_names)
df
#>        Var1      Var2
#> 1 doc1.txt  doc1.txt 
#> 2 doc2.txt  doc1.txt 
#> 3  doc3.txt doc1.txt 
#> 4 doc1.txt  doc2.txt 
#> 5 doc2.txt  doc2.txt 
#> 6  doc3.txt doc2.txt 
#> 7 doc1.txt   doc3.txt
#> 8 doc2.txt   doc3.txt
#> 9  doc3.txt  doc3.txt

Created on 2019-03-25 by the reprex package (v0.2.1)

0 Likes

#3

LOL,

This is exactly what i want !

I tried with expand.grid but it doesn't work :s

I have this response : "Error in View : arguments imply differing number of rows: 1, 33"

You know why ?

1 Like

#4

It's hard to say without looking at your exact code.

Can you please turn this into a REPRoducible EXample of your problem?

In case you don't know how to make a reprex, here's a great link:

1 Like

#5

thx so much for your help :slight_smile:

I found the problem , i had to add " df " on expand.grid !
And have something like :
-> expand.grid.df(doc2,doc2)

^^

But, i am still curious to try to do it with loop, in php, we can do it and it works, but not in R, it weird,
don't you think ?

The whole code :

#preface : i have several .txt document on a folder 
#I need to create a data frame for each document .txt 

folder <- "/Users/sylvain/Desktop/php/linking-interne-semantique/"      # path to folder that holds multiple .csv files
files_names3 <- list.files(path="/Users/sylvain/Desktop/php/linking-interne-semantique", pattern="*.txt")

for (i in 1:length(files_names3)){
  assign(files_names3[i],
         read.delim(files_names3[i])
  )}


# Now about the loop 
#When i print it, it works ! i have the same result as expand.grid. 

for (i in 1:length(files_names3)) {
  for (j in 1:length(files_names3)) {
   print(paste(files_names3[i],files_names3[j],sep=","))
}
}


#So i tried now to put all of that on a data frame, in order to have something like (if i have 3 documents) :  
doc1.txt doc1.txt
doc1.txt doc2.txt
doc1.txt doc3.txt
doc2.txt doc1.txt
doc2.txt doc2.txt
doc2.txt doc3.txt
doc3.txt doc1.txt
doc3.txt doc2.txt
doc3.txt doc3.txt

data <- ''

for (i in 1:length(files_names3)) {
  for (j in 1:length(files_names3)) {
     for (k in 1:9) {
   data[k] <- print(paste(files_names3[i],files_names3[j],sep=","))
}
}
}

I have as result : 

doc 1 doc 1
doc 1 doc 1
doc 1 doc 1
doc 1 doc 1

etc.

0 Likes

#6

This is not a reprex. Please go through the link I shared earlier. Also, please format your code while posting.

You can create the data frame via for loops, but I guess you'll have to do it via matrix. The reason your code fails is that you are printing the output of paste, instead of storing it. Also, data <- '' is a wrong thing to do.

The following works.

file_names <- c("doc1.txt ", "doc2.txt ", "doc3.txt")

df1 <- expand.grid(file_names, file_names)
df1
#>        Var1      Var2
#> 1 doc1.txt  doc1.txt 
#> 2 doc2.txt  doc1.txt 
#> 3  doc3.txt doc1.txt 
#> 4 doc1.txt  doc2.txt 
#> 5 doc2.txt  doc2.txt 
#> 6  doc3.txt doc2.txt 
#> 7 doc1.txt   doc3.txt
#> 8 doc2.txt   doc3.txt
#> 9  doc3.txt  doc3.txt

df2 <- matrix(ncol = 2,
              nrow = (length(x = file_names) ^ 2))

for(i in seq_along(along.with = file_names))
{
  for(j in seq_along(along.with = file_names))
  {
    df2[(((i - 1) * length(x = file_names)) + j), ] <- c(file_names[i], file_names[j])
  }
}

df2 <- as.data.frame(x = df2)
df2
#>          V1        V2
#> 1 doc1.txt  doc1.txt 
#> 2 doc1.txt  doc2.txt 
#> 3 doc1.txt   doc3.txt
#> 4 doc2.txt  doc1.txt 
#> 5 doc2.txt  doc2.txt 
#> 6 doc2.txt   doc3.txt
#> 7  doc3.txt doc1.txt 
#> 8  doc3.txt doc2.txt 
#> 9  doc3.txt  doc3.txt

Created on 2019-03-25 by the reprex package (v0.2.1)

0 Likes

#7

Done :slight_smile: I edited sorry !
Thx for your help !!

0 Likes

split this topic #9

A post was merged into an existing topic: distance between two documents with jaccard distance

0 Likes

closed

This topic has been closed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.
#10
0 Likes