Fail to extract gender information from a single name in R

I have a problem extracting the gender information from the first name. I have a bunch of email addresses and I used one regular expression pattern to extract the first name information. It contains some non-first names information and then I need to tell whether it's a first name before extracting the gender information. It returns me an error and I cannot fix it by myself. Any suggestions or help? Really appreciate it.

Here is my dataset: TESTNAME.csv - Google Drive

Here is my code:

library(httr)
library(jsonlite)
library(lexicon)

# API endpoint URL
url <- "https://api.genderize.io"

df <- read.csv("TESTNAME.csv")
email_test <- df$x

# Get list of common first names from reference corpus
first_names <- freq_first_names[["Name"]]

# Filter out non-first name information
names <- tolower(email_test)[tolower(email_test) %in% tolower(first_names)]

# Function to call Genderize.io API and extract gender information for a single name
get_gender <- function(name) {
  response <- GET(paste0(url, "?name=", name))
  data <- content(response, as = "text", encoding = "UTF-8")
  json <- fromJSON(data)
  if (json$gender != "") {
    return(json$gender)
  } else {
    return(NA)
  }
}

# Apply get_gender function to each name in the list
genders <- sapply(names, get_gender)

# Combine names and genders into a data frame
df <- data.frame(name = names, gender = genders)

# Print the result
df

Here is the error:

1 Like

Hi, Im run your code and is well.
In my case don't show an error

The result of final df
image

Thanks for your reply and I'll check it now.

It works out on my laptop as well and thanks for your help.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.