How to scrape only tweets with lat_lon (geocode) information?

I'm scraping tweets by using the rtweet package in R. As I'm interested in getting only tweets who bring with them the information about lat and long (geocode) about a specific location, I use the google Api and the lookup_coords function to get the coordinates of the specified location.

Usually I do like this:

db <-  search_tweets(q = "xxx", n= 1000, lang = "it",
            geocode = lookup_coords("Italy", apikey = apiKey)

and then to filter and remove tweets without lat and lon I do this way:

db_tweets <- lat_lng(db) 
db_tweets.geo <- db_tweets %>%
            filter(is.na(lat) == FALSE & is.na(lng) == FALSE)

In order to make the last step more efficient I would like to set a filter while scraping tweets with search_tweets function, so that I can get only tweets with lat long information since the beginning, but I'm not sure how to do this. Do you have any suggestion?

I see you already use lookup_coords and geocode, as you have seen still this returns many tweets without geolocation. In principle this is the official and right way to do this, if the API doesn't return better results there is few you or rtweet can do.

The next step could be filtering the query with something like q = "#rstats has:geo" (the operator might be premium) this in principle would provide higher number of tweets with geocode.

Last, the spatial data of tweets returned by the API is quite confusing as there are several places where they might be and if the geocode is more specific (Rome) the API might not provide those tweets.
With the API v2 it might be slightly better, but I doubt it.

Note: With rtweet we don't scrap from the website but use the API instead to look for tweets that match your query.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.