How to separate strings separated by semicolon?

I'm trying to separate thousands of strings containing semicolons. Could anyone help me with it?
I searched for previous questions, but in my case, every value in the "pest" column starts with different characters, so I think I might not be able to use group() command.
Here is the dummy input tibble:

  df <- data.frame(
    Item = c("apple","pear","banana","mango"),
    Pest = c("a;b;c;d","e;f","g;h;i","j;k")
  )

And this is what I wish to get:

df1<- data.frame(
    Item = c("apple","pear","banana","mango"),
    pest_1=c("a","e","g","j"),
    pest_2=c("b","f","h","k"),
    pest_3=c("c",NA,"i",NA),
    pest_4=c("d",NA,NA,NA)
  )

Thank you so much.

Thanks for posting the data, @hellovivvvv -- have you tried the separate() function from the tidyr package?

Oh thanks! I don't know the function separate() before.
I tried, and I got this:

separate(df,Pest,into = c("pest_1","pest_2","pest_3","pest_4"), sep = ";",fill = "warn")

But how can R automatically generate column names untill it reaches the final number of "pest, which looks like "pest_n"? Becasue in a large dataset, I don't know how many column it will finally generate,so I can't type column names manually.

I think I might need to use something like 1:n()? Or maybe a loop? I'm not sure.
*This is the code I'm trying to use, but it only generates the fourth column:

mydf<-data.frame(matrix(ncol=4))
for (n in 1:ncol(mydf))
{
  colnames(mydf)[n]<-paste("col",n,sep="")
}
print(colnames(mydf)[n])->name
separate(df,Pest,into = name, sep = ";",fill = "warn")

Thank you!

1 Like

My suggestion was based on your desired result, but you might find that the keeping just the two columns, Item and Pest, but expanding the contents of Pest, ends up being more manageable. Could you say a little about what you're hoping to use the final table for?

In the meantime, if you make a complete list of all the pests, you could cross join it with df (maybe by using the complete() function) and use str_detect() from the stringr package to help you extract the rows you want. Are you familiar with these functions or joins in general?

P.S. You could use str_split() to get a quick and dirty version of your desired result, but with little control of the column names:

df %>% add_column(pest_matrix = df$Pest %>% str_split(';', simplify = T))

The str_split() function is also what could help you create a complete list of all the pests.

2 Likes

Yes it works! Thank you so much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.