I'm trying to separate thousands of strings containing semicolons. Could anyone help me with it?
I searched for previous questions, but in my case, every value in the "pest" column starts with different characters, so I think I might not be able to use group() command.
Here is the dummy input tibble:
df <- data.frame(
Item = c("apple","pear","banana","mango"),
Pest = c("a;b;c;d","e;f","g;h;i","j;k")
)
But how can R automatically generate column names untill it reaches the final number of "pest, which looks like "pest_n"? Becasue in a large dataset, I don't know how many column it will finally generate,so I can't type column names manually.
I think I might need to use something like 1:n()? Or maybe a loop? I'm not sure.
*This is the code I'm trying to use, but it only generates the fourth column:
mydf<-data.frame(matrix(ncol=4))
for (n in 1:ncol(mydf))
{
colnames(mydf)[n]<-paste("col",n,sep="")
}
print(colnames(mydf)[n])->name
separate(df,Pest,into = name, sep = ";",fill = "warn")
My suggestion was based on your desired result, but you might find that the keeping just the two columns, Item and Pest, but expanding the contents of Pest, ends up being more manageable. Could you say a little about what you're hoping to use the final table for?
In the meantime, if you make a complete list of all the pests, you could cross join it with df (maybe by using the complete() function) and use str_detect() from the stringr package to help you extract the rows you want. Are you familiar with these functions or joins in general?