Get Google trends data - problem with output


#1

Hi all,
I have a question relating to the output of R when downloading Google trends data.
I want to download Google trends data for the past 7 days.
My code is:

sink("output1.txt ")
print(head(gtrends(c("VCB"), geo = "VN", time = "now 7-d", gprop = c("web"), category = 7, hl = "en-US",  low_search_volume = TRUE) $interest_over_time))
sink()

And the output is:

                 date hits keyword geo gprop category
1 2018-05-01 16:00:00   41     VCB  VN   web        7
2 2018-05-01 17:00:00   73     VCB  VN   web        7
3 2018-05-01 18:00:00   81     VCB  VN   web        7
4 2018-05-01 19:00:00   72     VCB  VN   web        7
5 2018-05-01 20:00:00   60     VCB  VN   web        7
6 2018-05-01 21:00:00   44     VCB  VN   web        7

My question is: Why there is only 6 observations in the output?

And I also get this:

Warning messages:
1: In as.POSIXlt.POSIXct(x, tz) : unknown timezone 'Asia/Taipei'
2: In as.POSIXlt.POSIXct(x, tz) : unknown timezone 'GMT'
3: In as.POSIXlt.POSIXct(x, tz) : unknown timezone 'America/New_York'
4: In as.POSIXlt.POSIXct(x, tz) : unknown timezone 'Asia/Taipei'
5: In as.POSIXlt.POSIXct(x, tz) : unknown timezone 'GMT'
6: In as.POSIXlt.POSIXct(x, tz) : unknown timezone 'America/New_York'

I guess I need to specify time zone, even if I already set geo=VN?

Thank you.


#2

Fun question,
With these kinds of coding questions, it's usually polite to provide a fully reproducible to save responders' time. (If you're not familiar with what that is; FAQ: What's a reproducible example (`reprex`) and how do I do one?)


Having said that, I couldn't help but notice that you are saving this data to a text file, and calling head.

  • head's default is to return the top 6 results in a situation like this.
  • why are you saving your data this way? when you could just save it as a csv, rdata, or feather or whathaveyou?

For example

library(gtrendsR)
library(readr)
temp = gtrends(c("VCB"), geo = "VN", time = "now 7-d", gprop = c("web"), category = 7, hl = "en-US", low_search_volume = TRUE)
temp = temp$interest_over_time

write_csv(
  temp,
  "output1.csv",
  
)

Created on 2018-05-09 by the reprex package (v0.2.0).


#3

Hi,
Thank you for replying. I know the question may sound ridiculous. I've just got to know R for 2 days. I have to download data from Google trends for a bunch of search keys, and want to find a quicker way to do rather than manually download one by one each time. Someone told me I can use R. The code I wrote is the result after reading some stuff on the Internet. I combined it from different websites but don't really know how the commands work.

I tried to run the code you gave me. But there are errors:

library(gtrendsR)
library(readr)
Error in library(readr) : there is no package called ‘readr’
temp = gtrends(c("VCB"), geo = "VN", time = "now 7-d", gprop = c("web"), category = 7, hl = "en-US", low_search_volume = TRUE)
temp = temp$interest_over_time

write_csv( temp, "output1.csv",)
Error in write_csv(temp, "output1.csv", ) :
could not find function "write_csv"

Could you patiently help me to solve the problem? I need to download Google trends data for the past 7 days for a list of search key.

Thank you.


#4

You need to install readr first:

install.packages("readr")


#5

I did it successfully. Thank both of you for helping me.
I still wonder: is there any way to create a loop so that I can download data for a list of search key without coding for each search key?
And how can I adjust the time of downloading SVI to match with the geographic location I choose?


#6

If your question's been answered, would you mind choosing a solution? (see FAQ below for how) It makes it a bit easier to visually navigate the site and see which questions still need help.

Thanks


#7

Yes I will notice next time. Thanks.


#8

You could create a for loop for multiple separate queries:

search_terms <- c("VCB", "ABC", "DEF")

for (i in search_terms) {
temp = gtrends(i, geo = "VN", time = "now 7-d", gprop = c("web"), category = 7, hl = "en-US", low_search_volume = TRUE)
temp = temp$interest_over_time
write_csv(temp, paste0("output_", i, ".csv")
}

Or you could just include all your search terms into one vector and extract them together:

search_terms <- c("VCB", "ABC", "DEF")
temp = gtrends(search_terms, geo = "VN", time = "now 7-d", gprop = c("web"), category = 7, hl = "en-US", low_search_volume = TRUE)
temp = temp$interest_over_time
write_csv(temp, paste0("output_all".csv")

(I haven't tested the code because google API stuff always fails here at work).

I don't know anything about your second question.


#9

No problem! Just a friendly reminder


#10

Have a look at the documentation for gtrends by typing ?gtrends in your console and checking out the docs online Package ‘gtrendsR’

If you look at the date variable in the interest_over_time data frame you qurey, you'll note it contains timezone information.

library(gtrendsR)
temp = gtrends(c("VCB"), geo = "VN", time = "now 7-d", gprop = c("web"), category = 7, hl = "en-US", low_search_volume = TRUE)
temp$interest_over_time$date[1]
#> [1] "2018-05-02 10:00:00 CEST"
attr(temp$interest_over_time$date[1],"tzone")
#> [1] "Europe/Berlin"
#> I happen to be in Europe...

Created on 2018-05-09 by the reprex package (v0.2.0).

It appears that gtrends sets time to your machine's timezone...? (not sure about that).

Also, sadly, I suggested saving your files as a CSV, which strips timezone info.


I'd check out the lubridate package's handy tz() function to help with this.. Note the wikipedia link for a list of timezone names.
You can use with_tz to change the timezone to whatever zone you want.

For example:

library(gtrendsR)
temp = gtrends(c("VCB"), geo = "VN", time = "now 7-d", gprop = c("web"), category = 7, hl = "en-US", low_search_volume = TRUE)

library(dplyr)
library(lubridate)


# Get Time Zone: 
temp$interest_over_time$date %>% head(2)
#> [1] "2018-05-02 10:00:00 CEST" "2018-05-02 11:00:00 CEST"
tz(temp$interest_over_time$date)
#> [1] "Europe/Berlin"
# I happen to be in Berlin

# Set Time Zone: https://lubridate.tidyverse.org/reference/with_tz.html
temp$interest_over_time$date <- with_tz(temp$interest_over_time$date, "America/New_York")
temp$interest_over_time$date %>% head(2)
#> [1] "2018-05-02 04:00:00 EDT" "2018-05-02 05:00:00 EDT"
tz(temp$interest_over_time$date)
#> [1] "America/New_York"

Created on 2018-05-09 by the reprex package (v0.2.0).


#11

When I open the output data, which is saved as CSV, I saw that the time of last observation is not set to both geo location's time zone and machine's time zone. I downloaded past 7 day data, which is hourly data. I can see big difference when look at the hour. Thats why I ask this question. I wonder if there is any relation with CRAN.


#12

In the code above, lubridate's tz function will reveal the timezone of those datetime observations. As I said, saving to CSV will stripe timezone information away.

It appears to be the machine's timezone in my example.