Harvest names from email address?

Is there a package in R that can (at a minimum) make an educated guess as to splitting first name and last name from an email address? I have this; johnsmith@samething.org. I can eliminate the @samething.org bit from all, but will be left with johnsmith which I would really not like to manually separate a few hundred addresses' into first name and last name if I can help it. Any thoughts or suggestions?

Here, I think the only way is to use a reference database of first names to match against. I've seen these as text files online.

1 Like

Here's a bit of inspirational code @mmahoney :

library("stringr")
db = c("dan", "john", "kyle")
s = str_replace(string = "johnsmith", pattern = db, replacement = "")
i = which.min(nchar(s))
n = c(db[i], s[i])

Yielding

> n
[1] "john"  "smith"

Hope it helps :slightly_smiling_face:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.