I'm trying to clean a dataframe that looks like this:
df <- data.frame(First = c("Mark", "John", "Anthony"),
Last = c("Joshua", "Wellberg", "Kennedy"),
Notes = c("DIS# 430477541 Plan Manager: Susan Long McArthur Community Care susan.long@mcarthur.com.au DOB: 19/04/1963 NDIS# 430477541 Start - 15/11/2018 Finish - 15/11/2019",
"Plan managed – national disability support partners – invoices@ndsp.com.au",
"Self managed
Natalia O/T
NDIS number - 431141456"),
NDIS = c(NA,NA,NA),
Col = c(NA, NA, NA))
df$Notes <- tolower(df$Notes)
I want to fill the Plan column based on the presence of certain strings in the Notes column. For example, if the Notes column contains the string "self manag", I wan to fill the Plan column with "S". I've tried the following code:
for (row in 1:df) {
if (grep("self manag", df$Notes)) {
Plan == "S"
}
}
When I try this, I get an error saying:
Error in 1:df : NA/NaN argument
In addition: Warning message:
In 1:df : numerical expression has 5 elements: only the first used
Hello, your issue is that if() is not vectorise, so r provides ifelse() for that requirement.
There is a good tutorial to look at here : 5 Control flow | Advanced R (hadley.nz)
To build on the previous answer, something like this might work well for you:
library(tidyverse)
df <- data.frame(First = c("Mark", "John", "Anthony"),
Last = c("Joshua", "Wellberg", "Kennedy"),
Notes = c("DIS# 430477541 Plan Manager: Susan Long McArthur Community Care susan.long@mcarthur.com.au DOB: 19/04/1963 NDIS# 430477541 Start - 15/11/2018 Finish - 15/11/2019",
"Plan managed – national disability support partners – invoices@ndsp.com.au",
"Self managed
Natalia O/T
NDIS number - 431141456"),
NDIS = c(NA,NA,NA),
Col = c(NA, NA, NA))
df$Notes <- tolower(df$Notes)
df
#> First Last
#> 1 Mark Joshua
#> 2 John Wellberg
#> 3 Anthony Kennedy
#> Notes
#> 1 dis# 430477541 plan manager: susan long mcarthur community care susan.long@mcarthur.com.au dob: 19/04/1963 ndis# 430477541 start - 15/11/2018 finish - 15/11/2019
#> 2 plan managed – national disability support partners – invoices@ndsp.com.au
#> 3 self managed\nnatalia o/t\nndis number - 431141456
#> NDIS Col
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
df %>%
mutate(Plan = case_when(
# When "self manag" is found, fill Plan with "S"
grepl(pattern = "self manag", x = Notes) ~ "S",
# Same, with "P"
grepl(pattern = "plan manag", x = Notes) ~ "P"
))
#> First Last
#> 1 Mark Joshua
#> 2 John Wellberg
#> 3 Anthony Kennedy
#> Notes
#> 1 dis# 430477541 plan manager: susan long mcarthur community care susan.long@mcarthur.com.au dob: 19/04/1963 ndis# 430477541 start - 15/11/2018 finish - 15/11/2019
#> 2 plan managed – national disability support partners – invoices@ndsp.com.au
#> 3 self managed\nnatalia o/t\nndis number - 431141456
#> NDIS Col Plan
#> 1 NA NA P
#> 2 NA NA P
#> 3 NA NA S