Fail to extract gender information from a single name in R

YuJiang · March 1, 2023, 9:06pm

I have a problem extracting the gender information from the first name. I have a bunch of email addresses and I used one regular expression pattern to extract the first name information. It contains some non-first names information and then I need to tell whether it's a first name before extracting the gender information. It returns me an error and I cannot fix it by myself. Any suggestions or help? Really appreciate it.

Here is my dataset: TESTNAME.csv - Google Drive

Here is my code:

library(httr)
library(jsonlite)
library(lexicon)

# API endpoint URL
url <- "https://api.genderize.io"

df <- read.csv("TESTNAME.csv")
email_test <- df$x

# Get list of common first names from reference corpus
first_names <- freq_first_names[["Name"]]

# Filter out non-first name information
names <- tolower(email_test)[tolower(email_test) %in% tolower(first_names)]

# Function to call Genderize.io API and extract gender information for a single name
get_gender <- function(name) {
  response <- GET(paste0(url, "?name=", name))
  data <- content(response, as = "text", encoding = "UTF-8")
  json <- fromJSON(data)
  if (json$gender != "") {
    return(json$gender)
  } else {
    return(NA)
  }
}

# Apply get_gender function to each name in the list
genders <- sapply(names, get_gender)

# Combine names and genders into a data frame
df <- data.frame(name = names, gender = genders)

# Print the result
df

Here is the error:

M_AcostaCH · March 2, 2023, 1:50am

Hi, Im run your code and is well.
In my case don't show an error

The result of final df

YuJiang · March 2, 2023, 2:53am

Thanks for your reply and I'll check it now.

YuJiang · March 2, 2023, 2:56am

YuJiang:

library(httr)
library(jsonlite)
library(lexicon)

# API endpoint URL
url <- "https://api.genderize.io"

df <- read.csv("TESTNAME.csv")
email_test <- df$x

# Get list of common first names from reference corpus
first_names <- freq_first_names[["Name"]]

# Filter out non-first name information
names <- tolower(email_test)[tolower(email_test) %in% tolower(first_names)]

# Function to call Genderize.io API and extract gender information for a single name
get_gender <- function(name) {
  response <- GET(paste0(url, "?name=", name))
  data <- content(response, as = "text", encoding = "UTF-8")
  json <- fromJSON(data)
  if (json$gender != "") {
    return(json$gender)
  } else {
    return(NA)
  }
}

# Apply get_gender function to each name in the list
genders <- sapply(names, get_gender)

# Combine names and genders into a data frame
df <- data.frame(name = names, gender = genders)

# Print the result

It works out on my laptop as well and thanks for your help.

system · April 13, 2023, 2:57am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.