how to extract text from excel file and paste in another text file in R?

I want to extract "something" from column2 in fil1.xls using "word" matching in column3 and store it after "name:" in file2.txt, how to do this in Rstudio?

file1.xls is like this:

column1 column2 column3 column4 column5

AAAA1   BBBB2   CCCCC1  DDDD1   EEEEE1

AAAA2   BBBB3   CCCCC2  DDDD2   EEEEE2

AAAA3   something   word1   DDDD3   EEEEE3

AAAA4   BBBB5   CCCCC4  DDDD4   EEEEE4

AAAA5   BBBB6   CCCCC5  DDDD5   EEEEE5

AAAA6   BBBB7   CCCCC6  DDDD6   EEEEE6

AAAA7   BBBB8   CCCCC7  DDDD7   EEEEE7

file2.txt is like this:

name:
 
city: 

place: 

type: 

color:

Thanks for your help

There are four steps in that:

  1. read the Excel input
  2. find the word
  3. generate the content of file2
  4. save file2

1. Read xlsx

This is made easy with the package readxl.

file1_contents <- readxl::read_excel("fil1.xlsx")
file1_contents
# A tibble: 7 x 5
#  column1 column2   column3 column4 column5
#    <chr>   <chr>     <chr>   <chr>   <chr>  
# 1 AAAA1   BBBB2     CCCCC1  DDDD1   EEEEE1 
# 2 AAAA2   BBBB3     CCCCC2  DDDD2   EEEEE2 
# 3 AAAA3   something word1   DDDD3   EEEEE3 
# 4 AAAA4   BBBB5     CCCCC4  DDDD4   EEEEE4 
# 5 AAAA5   BBBB6     CCCCC5  DDDD5   EEEEE5 
# 6 AAAA6   BBBB7     CCCCC6  DDDD6   EEEEE6 
# 7 AAAA7   BBBB8     CCCCC7  DDDD7   EEEEE7

2. Extract value

That is the harder part. We will use the packages stringr and dplyr, part of the tidyverse:

library(tidyverse)

For a given vector, we can look for a specific word using str_detect():

str_detect(file1_contents$column3, "word")
# [1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE

(note the TRUE in 3rd position)

Now we can use filter() to keep only the row of interest:

file1_contents %>%
  filter(str_detect(column3, "word"))
# A tibble: 1 x 5
#   column1 column2   column3 column4 column5
#   <chr>   <chr>     <chr>   <chr>   <chr>  
# 1 AAAA3   something word1   DDDD3   EEEEE3 

So all we need to do is pull out the content of column2 and save it:

out_word <- file1_contents %>%
  filter(str_detect(column3, "word")) %>%
  pull(column2)

out_word
# [1] "something"

We want to make sure we have a single value, so let's create an error if several rows do match:

stopifnot(length(out_word) == 1)

So if for any reason the number of rows is not 1, we get an error and know we have to investigate (that's better than the script executing without problem but the result being horribly wrong).

3. Generate the output

Here, several methods can work, depending on what your problem really looks like. To do exactly what you describe in the question, we can simply use paste():

paste("name:",out_word,"
city: 
place: 
type: 
color:")
# [1] "name: something \ncity: \nplace: \ntype: \ncolor:"

(the "\n" indicate a newline, it will be saved to file with actual newlines)

But if the problem is more complex, it can make sense to have a template text file with a placeholder, and only replace that:

file_2 <- read_lines("file2_template.txt")
file_2
# [1] "name: %PLACEHOLDER%" "city:"              "place:"            
# [4] "type:"              "color:"

I had saved the file previously, with a big %PLACEHOLDER% where I want to insert the string. Note that this was read line-by-line, so I have a vector with one element for each line. Now we just need to replace the placeholder:

file_2_edited <- str_replace(file_2, "%PLACEHOLDER%", out_word)
file_2_edited
# [1] "name: something" "city: "          "place: "         "type: "         
# [5] "color:"   

There can be other ways to generate this, especially if you also want to fill in the other fields using R. For example:

exple_df <- tibble(name = out_word,
       city= "my_city",
       place = "my_place",
       type = "my_type",
       color = "my_color")

yaml::as.yaml(exple_df)
# [1] "name: something\ncity: my_city\nplace: my_place\ntype: my_type\ncolor: my_color\n"

4. Save to file

Once you have the content of file2, it's trivial to write it with write_lines (or yaml::write_yaml).

3 Likes

Dear AlexisW,
Many thanks, great solution. it worked perfectly.

I am wonder how it is posible to make it in a way that it repeats all process for extracting the text in column2 based on key words in column2. I meat to have a loop that repeats it for extracting BBBB, BBBB3,... using matching words in column3 (CCCCC1, CCCCC2, ..) and subsequently generate the content of file2 and save all of them in file2?

Exact match:

filter(column3 %in% c("word1", "CCCCC6"))

Approximate using an OR in the regular expression:

filter(str_detect(column3, "word|CC6"))
1 Like

Thanks, very helpful.
what if the words (all words in column3) don't have anything in common and they are totally different? I mean we only know that the are located in column 3 (for example located in column and row 5) and we want to extract their correspondent text in the column2 (text in column 2 row 5).

I don't understand. How do you tell which rows to keep or discard? Or, if you want to keep them all, what are you trying to change?

1 Like

sorry for typo, I'll try to ask my request as simple as possible:
how to extract words using their location for example in column 3 and row 5 (I mean "CCCCC5" )? and how to repeat same thing for the words located in column 3 and row 6, and so on? and how to stop it wen it is blank cell (there is no more words like end of document)?

thanks for your time

file1_contents[5,3]

That could technically be done with a for loop

for(i in 1:nrow(file1_contents)){
  word <- file1_content[i, 3]
}

But for loops are not generally the best way to do such things (in R). Also if you just want all the words in column 3 you can just do

file1_content[,3]

What do you plan to do with these words? What is your desired output?

1 Like

thanks alot for the solutions

I want to put these words in the subsequent block of data and save the file as a text file, like:
name: word1
city:
place:
type:
color:

name: word2
city:
place:
type:
color:

name: word3
city:
place:
type:
color:

and so on, but I haven't figured out how to do it yet. It would be so kind of you to advice me on that.

Bests

Something like that?

file1_contents %>%
  mutate(new_content = paste0("name: ", column3, "\ncity: \nplace: \ntype: \ncolor: "))
1 Like

Yes, its great

Many thanks

this results in adding the outcome to the end of file1 table:

file1_contents %>%

  • mutate(new_content = paste0("name: ", column3, "\ncity: \nplace: \ntype: \ncolor: "))
    

A tibble: 7 x 6

column1 column2 column3 column4 column5 new_content

1 AAAA1 BBBB2 CCCCC1 DDDD1 EEEEE1 "name: CCCCC1\ncity: \nplace: \ntype: \ncolo…
2 AAAA2 BBBB3 CCCCC2 DDDD2 EEEEE2 "name: CCCCC2\ncity: \nplace: \ntype: \ncolo…
3 AAAA3 something word1 DDDD3 EEEEE3 "name: word1\ncity: \nplace: \ntype: \ncolor…
4 AAAA4 BBBB5 CCCCC4 DDDD4 EEEEE4 "name: CCCCC4\ncity: \nplace: \ntype: \ncolo…
5 AAAA5 BBBB6 CCCCC5 DDDD5 EEEEE5 "name: CCCCC5\ncity: \nplace: \ntype: \ncolo…
6 AAAA6 BBBB7 CCCCC6 DDDD6 EEEEE6 "name: CCCCC6\ncity: \nplace: \ntype: \ncolo…
7 AAAA7 BBBB8 CCCCC7 DDDD7 EEEEE7 "name: CCCCC7\ncity: \nplace: \ntype: \ncolo…

how to save it in another separated file such as file3.txt? I mean like this:
name: CCCCC1\ncity: \nplace: \ntype: \ncolo…
name: CCCCC2\ncity: \nplace: \ntype: \ncolo…
name: word1\ncity: \nplace: \ntype: \ncolor…
name: CCCCC4\ncity: \nplace: \ntype: \ncolo…
name: CCCCC5\ncity: \nplace: \ntype: \ncolo…
name: CCCCC6\ncity: \nplace: \ntype: \ncolo…
name: CCCCC7\ncity: \nplace: \ntype: \ncolo…

thanks for your time

With write_lines() it should work.

Use pull() to extract the new column with the content of file3, if necessary collapse that vector into a single string with paste() and its argument collapse, then use write_lines() to save it to file.

1 Like

Thanks for the help.

Bests