I have a problem extracting the gender information from the first name. I have a bunch of email addresses and I used one regular expression pattern to extract the first name information. It contains some non-first names information and then I need to tell whether it's a first name before extracting the gender information. It returns me an error and I cannot fix it by myself. Any suggestions or help? Really appreciate it.
Here is my dataset: https://drive.google.com/file/d/1kQAcjo2Ewuy43SLK111NwFerNiv_tbPK/view?usp=share_link
Here is my code:
library(httr)
library(jsonlite)
library(lexicon)
# API endpoint URL
url <- "https://api.genderize.io"
df <- read.csv("TESTNAME.csv")
email_test <- df$x
# Get list of common first names from reference corpus
first_names <- freq_first_names[["Name"]]
# Filter out non-first name information
names <- tolower(email_test)[tolower(email_test) %in% tolower(first_names)]
# Function to call Genderize.io API and extract gender information for a single name
get_gender <- function(name) {
response <- GET(paste0(url, "?name=", name))
data <- content(response, as = "text", encoding = "UTF-8")
json <- fromJSON(data)
if (json$gender != "") {
return(json$gender)
} else {
return(NA)
}
}
# Apply get_gender function to each name in the list
genders <- sapply(names, get_gender)
# Combine names and genders into a data frame
df <- data.frame(name = names, gender = genders)
# Print the result
df
Here is the error: