I'm a novice in R and am struggling with extracting percentages/numbers from strings in a data frame. For example,
df <- data.frame(
Species =c("Bidens pilosa","Orobanche ramose"),
Impact = c("Soyabean yield loss was 10%. A density of one plant resulted in a yield loss of 9.4%; two plants, 17.3%; and four to eight plants, 28%...In contrast, suppression of the weed by the crop was only 10%","Cypress was estimated to have a 28% loss annually. The annual increase of the disease in some stands in the Peloponnesus, with an initial attack of 20%, ranged from 5% to 20% ")
My questions are the following:
In this case, I only want to extract yield loss for different crops, which is 10 and 28, and hope to skip percentages and numbers regarding other aspects (such as 9.4%,17.3%, 5* etc.) Can I achieve this objective through R? Or it requires some skill about natural language processing?
If it's hard to distinguish different types of percentages, how to extract all percentages/numbers at one time so that I can pick the right number manually. I have tried to use
df %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric
or
parse_number(df$Impact)
But I think none of them works, because they give me continuous lines of numbers.
Thanks for your help.