Hi all.
I've got a large dataset with a field of drug names and ATC categories, as so:
>head(unique(state$drug))
[1] "Amphotericin B (A01AB04)" "Nystatin (A07AA02)" "Clotrimazole (G01AF02)" "Doxycycline (J01AA02)" "Ampicillin (J01CA01)"
[6] "Amoxicillin (J01CA04)"
The pattern is the same - string name (maybe, maybe not including spaces, see "Ampho B" above), followed by a space, followed by seven characters in parentheses.
What I don't want:
>head(str_trunc(state$drug, 10, side = "right"))
[1] "Amphote..." "Nystati..." "Clotrim..." "Doxycyc..." "Ampicil..." "Amoxici..."
> head(str_trunc(state$drug, 13, side = "left"))
[1] "... (A01AB04)" "... (A07AA02)" "... (G01AF02)" "... (J01AA02)" "... (J01CA01)" "... (J01CA04)"
This is the inverse of what I want (without the ellipsis)
> head(str_split_fixed(state$drug, " \\(", n=2))
[,1] [,2]
[1,] "Amphotericin B" "A01AB04)"
[2,] "Nystatin" "A07AA02)"
[3,] "Clotrimazole" "G01AF02)"
[4,] "Doxycycline" "J01AA02)"
[5,] "Ampicillin" "J01CA01)"
[6,] "Amoxicillin" "J01CA04)"
This would also do, if the second string still included the opening parenthesis, or omitted the final one.
What am I missing?
Thanks in advance.