Transforming api data into JSON data using content function in httr

httr

#1

Hi,

I can't get the the following code to run:

threatfeedsj <- GET("http://isc.sans.edu/api/threatfeeds/?json")
content(threatfeedsj,"text")

Returns:

content(threatfeedsj,"text")
Error in content(threatfeedsj, "text") : is.response(x) is not TRUE

It is a JSON data type and is a character vector so I don't understand why it's not running as I ran this exact code on another parameter using the same API.

Thanks!


#2

content doesn't want to parse the response because the MIME type is specified as text/json instead of application/json, but you can specify explicitly:

threatfeedsj <- httr::GET("http://isc.sans.edu/api/threatfeeds/?json")

threatfeedsdf <- httr::content(threatfeedsj, type = 'application/json', simplifyDataFrame = TRUE)

str(threatfeedsdf)
#> 'data.frame':    58 obs. of  6 variables:
#>  $ type       : chr  "bebloh" "blindferret" "blocklistde110" "blocklistde143" ...
#>  $ description: chr  "bebloh C&C servers from John Bambenek" "Project Blindferret zmap scanners" "Blocklist.de Port 110 Scanner" "Blocklist.de Port 143 Scanner" ...
#>  $ name       : chr  "bebloh C&C server" "Blindferret" "Port 110 Scanner" "Port 143 Scanner" ...
#>  $ lastupdate : chr  "2019-01-13 18:30:22" "2019-01-14 03:41:34" "2019-01-12 21:05:02" "2019-01-12 21:08:11" ...
#>  $ datatype   : chr  "is_ipv4" "is_ipv4" "is_ipv4" "is_ipv4" ...
#>  $ frequency  : int  4800 0 86400 86400 86400 86400 86400 86400 86400 86400 ...

Or just pass the text to jsonlite::fromJSON yourself.

Something more complicated may be going on, though; that error makes it sound like the request didn't work. What does threatfeedsj look like?


#3

It's a list of cyber threats pulled from various sources into a feed.

The other feed I pulled using the same code had the same MIME type too (text/json) and that worked so I'm not sure why...

Thanks for your help @alistaire


#4

I can confirm this work as is

threatfeedsj <- httr::GET("http://isc.sans.edu/api/threatfeeds/?json")
httr::content(threatfeedsj,"text")
#> [1] "[{\"type\":\"bebloh\",\"description\":\"bebloh C&C servers from John Bambenek\",\"name\":\"bebloh C&C server\",\"lastupdate\":\"2019-01-13 18:30:22\",\"datatype\":\"is_ipv4\",\"frequency\":4800},{\"type\":\"blindferret\",\"description\":\"Project Blindferret zmap scanners\",\"name\":\"Blindferret\",\"lastupdate\":\"2019-01-14 17:35:53\",\"datatype\":\"is_ipv4\",\"frequency\":0},{\"type\":\"blocklistde110\",\"description\":\"Blocklist.de Port 110 Scanner\",\"name\":\"Port 110 Scanner\",\"lastupdate\":\"2019-01-14 15:33:57\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistde143\",\"description\":\"Blocklist.de Port 143 Scanner\",\"name\":\"Port 143 Scanner\",\"lastupdate\":\"2019-01-14 15:39:45\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistde21\",\"description\":\"Blocklist.de Port 21 Scanner\",\"name\":\"Port 21 Scanner\",\"lastupdate\":\"2019-01-14 12:57:30\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistde22\",\"description\":\"Blocklist.de Port 22 Scanner\",\"name\":\"Port 22 Scanner\",\"lastupdate\":\"2019-01-14 13:18:14\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistde25\",\"description\":\"Blocklist.de Port 25 Scanner\",\"name\":\"Port 25 Scanner\",\"lastupdate\":\"2019-01-14 15:25:01\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistde443\",\"description\":\"Blocklist.de Port 443 Scanner\",\"name\":\"Port 443 Scanner\",\"lastupdate\":\"2019-01-14 15:42:11\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistde80\",\"description\":\"Blocklist.de Port 80 Scanner\",\"name\":\"Port 80 Scanner\",\"lastupdate\":\"2019-01-14 15:28:24\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistde993\",\"description\":\"Blocklist.de Port 993 Scanner\",\"name\":\"Port 993 Scanner\",\"lastupdate\":\"2019-01-14 15:46:54\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistdeapache\",\"description\":\"Blocklist.de Apache Scanner\",\"name\":\"Apache Web Server Scanner\",\"lastupdate\":\"2019-01-14 15:48:07\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistdeasterisk\",\"description\":\"Blocklist.de Astersik VoIP Scanner\",\"name\":\"Asterisk VoIP Scanner\",\"lastupdate\":\"2019-01-14 15:48:13\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistdebots\",\"description\":\"Blocklist.de Bots\",\"name\":\"Suspect Bots\\/Infected\",\"lastupdate\":\"2019-01-14 15:48:15\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistdebruteforcelogin\",\"description\":\"Blocklist.de Bruteforce Login\",\"name\":\"Bruteforce\",\"lastupdate\":\"2019-01-14 15:48:28\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistdecourierimap\",\"description\":\"Blocklist.de Courier IMAP\",\"name\":\"courier imap attacker\",\"lastupdate\":\"2019-01-12 21:19:13\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"blocklistdecourierpop3\",\"description\":\"Blocklist.de Courier POP3\",\"name\":\"courier pop3 attacker\",\"lastupdate\":\"2019-01-12 21:19:13\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"ciarmy\",\"description\":\"CI Army List. Combined CINS Threat Intelligence Feed\",\"name\":\"CI Army List\",\"lastupdate\":\"2019-01-12 21:35:16\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"cryptowall\",\"description\":\"Cryptowall C&C servers from John Bambenek\",\"name\":\"Cryptowall C&C server\",\"lastupdate\":\"2019-01-12 21:35:16\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"cybergreen\",\"description\":\"Cybergreen Network Security Research Project\",\"name\":\"Cybergreen\",\"lastupdate\":\"2019-01-14 17:36:21\",\"datatype\":\"is_ipv4\",\"frequency\":0},{\"type\":\"dyreza\",\"description\":\"Dyreza List from techhelplist.com\",\"name\":\"Dyreza Servers\",\"lastupdate\":\"2019-01-13 18:32:33\",\"datatype\":\"is_ipv4\",\"frequency\":3600},{\"type\":\"emergincompromised\",\"description\":\"Emerging Threats Compromised IPs\",\"name\":\"Emergingthreats\",\"lastupdate\":\"2019-01-13 02:30:45\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"erratasec\",\"description\":\"Errata Security Masscan\",\"name\":\"Erratasec Masscan\",\"lastupdate\":\"2019-01-14 17:35:51\",\"datatype\":\"is_ipv4\",\"frequency\":0},{\"type\":\"forumspam\",\"description\":\"Forumspam.com List of forum spammers\",\"name\":\"Forum Spammers\",\"lastupdate\":\"2019-01-13 02:25:14\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"hesperbot\",\"description\":\"hesperbot C&C servers from John Bambenek\",\"name\":\"Hesperbot C&C server\",\"lastupdate\":\"2019-01-13 18:30:23\",\"datatype\":\"is_ipv4\",\"frequency\":4800},{\"type\":\"malc0de\",\"description\":\"Malc0de.com IP Blacklist\",\"name\":\"Malc0de Blacklist\",\"lastupdate\":\"2019-01-13 02:32:21\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"malwaredomainlist\",\"description\":\"Malware Domain List.com\",\"name\":\"Malwaredomainlist\",\"lastupdate\":\"2019-01-10 04:39:28\",\"datatype\":\"is_domain\",\"frequency\":3600},{\"type\":\"malwaredomains\",\"description\":\"Domain Blocklist From Malwaredomains\",\"name\":\"Malwaredomains\",\"lastupdate\":\"2019-01-10 04:39:28\",\"datatype\":\"is_domain\",\"frequency\":3600},{\"type\":\"malwaretrafficanalysis\",\"description\":\"Suspicious IPs and Domains from Malware Traffic Analysis\",\"name\":\"Suspect Malware Related\",\"lastupdate\":\"2011-01-01 00:00:00\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"malwaretrafficanalysisdomains\",\"description\":\"Suspicious IPs and Domains from Malware Traffic Analysis\",\"name\":\"Suspect Malware Related\",\"lastupdate\":\"2011-01-01 00:00:00\",\"datatype\":\"is_domain\",\"frequency\":86400},{\"type\":\"matsnu\",\"description\":\"matsnu C&C servers from John Bambenek\",\"name\":\"matsnu C&C server\",\"lastupdate\":\"2019-01-13 18:30:24\",\"datatype\":\"is_ipv4\",\"frequency\":4800},{\"type\":\"miner\",\"description\":\"Cryptocoin Miner Pool Addresses\",\"name\":\"MinerPool\",\"lastupdate\":\"2019-01-14 17:33:37\",\"datatype\":\"is_ipv4\",\"frequency\":3600},{\"type\":\"openbl_ftp\",\"description\":\"OpenBL.org FTP Scanners\",\"name\":\"OpenBL FTP Scanners\",\"lastupdate\":\"2019-01-13 02:30:45\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"openbl_http\",\"description\":\"OpenBL.org HTTP Scanners\",\"name\":\"OpenBL HTTP Scanners\",\"lastupdate\":\"2019-01-13 02:30:45\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"openbl_mail\",\"description\":\"OpenBL.org MAIL Scanners\",\"name\":\"OpenBL MAIL Scanners\",\"lastupdate\":\"2019-01-13 02:30:45\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"openbl_smtp\",\"description\":\"OpenBL.org SMTP Scanners\",\"name\":\"OpenBL SMTP Scanners\",\"lastupdate\":\"2019-01-13 02:30:45\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"openbl_ssh\",\"description\":\"OpenBL.org SSH Scanners\",\"name\":\"OpenBL SSH Scanners\",\"lastupdate\":\"2019-01-13 02:30:45\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"palevodomains\",\"description\":\"Palevo Command and Control Server Domains from Abuse.ch\",\"name\":\"Palevo C&C Domain\",\"lastupdate\":\"2011-01-01 00:00:00\",\"datatype\":\"is_domain\",\"frequency\":86400},{\"type\":\"palevoips\",\"description\":\"Palevo Command and Control Server IPs from Abuse.ch\",\"name\":\"Palevo C&C IP\",\"lastupdate\":\"2019-01-13 02:32:09\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"qakbot\",\"description\":\"Qakbot C&C servers from John Bambenek\",\"name\":\"qakbot C&C server\",\"lastupdate\":\"2019-01-13 18:30:24\",\"datatype\":\"is_ipv4\",\"frequency\":4800},{\"type\":\"ramnit\",\"description\":\"ramnit C&C servers from John Bambenek\",\"name\":\"ramnit C&C server\",\"lastupdate\":\"2019-01-13 18:30:30\",\"datatype\":\"is_ipv4\",\"frequency\":4800},{\"type\":\"ransomware\",\"description\":\"Abuse.ch Ransomware Domain Blocklist\",\"name\":\"Ransomdomains\",\"lastupdate\":\"2019-01-13 09:39:38\",\"datatype\":\"is_domain\",\"frequency\":3600},{\"type\":\"ransomwareips\",\"description\":\"Abuse.ch Ransomware IPs Blocklist\",\"name\":\"Ransomips\",\"lastupdate\":\"2019-01-13 09:39:46\",\"datatype\":\"is_ipv4\",\"frequency\":3600},{\"type\":\"rapid7sonar\",\"description\":\"Rapid 7 Project Sonar\",\"name\":\"Rapid7Sonar\",\"lastupdate\":\"2019-01-14 17:36:09\",\"datatype\":\"is_ipv4\",\"frequency\":0},{\"type\":\"shadowserver\",\"description\":\"Shadowserver Scanners. Consider them \\\"false positives\\\"\",\"name\":\"Shadowserver\",\"lastupdate\":\"2019-01-14 17:36:21\",\"datatype\":\"is_ipv4\",\"frequency\":0},{\"type\":\"shodan\",\"description\":\"Scanners Operated by the ShodanHQ Project\",\"name\":\"ShodanHQ\",\"lastupdate\":\"2019-01-14 17:35:50\",\"datatype\":\"is_ipv4\",\"frequency\":0},{\"type\":\"spyeye\",\"description\":\"Spyeye Command And Control Server from Abuse.ch\",\"name\":\"Spyeye C&C server\",\"lastupdate\":\"2019-01-14 10:49:23\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"spyeyedomains\",\"description\":\"Spyeye Command And Control Server from Abuse.ch\",\"name\":\"Spyeye C&C server\",\"lastupdate\":\"2011-01-01 00:00:00\",\"datatype\":\"is_domain\",\"frequency\":86400},{\"type\":\"symmi\",\"description\":\"Symmi C&C servers from John Bambenek\",\"name\":\"Symmi C&C server\",\"lastupdate\":\"2019-01-13 18:30:30\",\"datatype\":\"is_ipv4\",\"frequency\":4800},{\"type\":\"threatexpert\",\"description\":\"Threatexpert.com Malicious URLs\",\"name\":\"Threatexpert\",\"lastupdate\":\"2019-01-14 18:00:22\",\"datatype\":\"is_domain\",\"frequency\":3600},{\"type\":\"tinba\",\"description\":\"Tiny Banker C&C servers from John Bambenek\",\"name\":\"TinyBanker C&C server\",\"lastupdate\":\"2019-01-13 18:30:30\",\"datatype\":\"is_ipv4\",\"frequency\":4800},{\"type\":\"tldns\",\"description\":\"Root and Top Level Domain Name Servers\",\"name\":\"TLD Name Servers\",\"lastupdate\":\"2019-01-13 18:34:28\",\"datatype\":\"is_ipv4\",\"frequency\":3600},{\"type\":\"torexit\",\"description\":\"Tor Exit Nodes from Tor Project\",\"name\":\"Tor Exit Node\",\"lastupdate\":\"2019-01-14 12:55:34\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"univmichigan\",\"description\":\"University of Michigan scans.io zmap scans\",\"name\":\"UMichigan scans.io\",\"lastupdate\":\"2019-01-14 17:35:53\",\"datatype\":\"is_ipv4\",\"frequency\":0},{\"type\":\"upatre\",\"description\":\"Upatre List from techhelplist.com\",\"name\":\"Upatr Servers\",\"lastupdate\":\"2019-01-13 18:31:47\",\"datatype\":\"is_ipv4\",\"frequency\":3600},{\"type\":\"virustotal\",\"description\":\"Virustotal Domains\",\"name\":\"Virustotal\",\"lastupdate\":\"2000-01-01 00:00:00\",\"datatype\":\"is_domain\",\"frequency\":86400},{\"type\":\"webiron\",\"description\":\"WebIron web application bots\",\"name\":\"WebIron Bots\",\"lastupdate\":\"2019-01-13 18:32:47\",\"datatype\":\"is_ipv4\",\"frequency\":3600},{\"type\":\"zeuscc\",\"description\":\"Zeus Command And Control Server from Abuse.ch\",\"name\":\"Zeus C&C server\",\"lastupdate\":\"2019-01-14 10:49:22\",\"datatype\":\"is_ipv4\",\"frequency\":86400},{\"type\":\"zeusdomains\",\"description\":\"Zeus Command And Control Server from Abuse.ch\",\"name\":\"Zeus C&C server\",\"lastupdate\":\"2019-01-14 12:53:46\",\"datatype\":\"is_domain\",\"frequency\":86400}]"

Created on 2019-01-14 by the reprex package (v0.2.1)

You may just have had an old threatfeedsj that was not a httr response object so it failed.

Using httr::content(threatfeedsj,"parsed") (the default) could be a better idea here to get a JSON parsed directly. Otherwise, using as = "text" as you did work if you use jsonlite::fromJSON after.


#5

Hey @alistaire,

Thanks for that - it worked for some feeds but not others and can't figure out why as the MIME types are the same.

Firewall_log <- GET("http://isc.sans.edu/api/openiocsources?json")

content(Firewall_log, type = "application/json", simplifyDataFrame = TRUE)

Returns:

content(Firewall_log, type = "application/json", simplify2array = TRUE)
Error: lexical error: invalid char in json text.
<?xml version="1.0" encoding="U
(right here) ------^

Thanks!


#6

This new url is returning xml reponse

Firewall_log  <- httr::GET("https://isc.sans.edu/api/openiocsources")
# this is now xml
httr::content(Firewall_log, type = "text/xml")
#> No encoding supplied: defaulting to UTF-8.
#> {xml_document}
#> <ioc id="44233BFE-2019-0115-e82565f68762" last-modified="2019-01-15T05:21:45Z" xmlns="http://schemas.mandiant.com/2010/ioc">
#> [1] <short_description>Firewall Logs</short_description>
#> [2] <description>Firewall logs from 2019-01-15</description>
#> [3] <authored_by>SANS Internet Storm Center</authored_by>
#> [4] <authored_date>2019-01-15T05:21:45Z</authored_date>
#> [5] <links/>
#> [6] <definition>\n  <Indicator operator="OR" id="44233BFE-2019-0115-e825 ...
# it is ok as default here
httr::content(Firewall_log)
#> {xml_document}
#> <ioc id="44233BFE-2019-0115-e82565f68762" last-modified="2019-01-15T05:21:45Z" xmlns="http://schemas.mandiant.com/2010/ioc">
#> [1] <short_description>Firewall Logs</short_description>
#> [2] <description>Firewall logs from 2019-01-15</description>
#> [3] <authored_by>SANS Internet Storm Center</authored_by>
#> [4] <authored_date>2019-01-15T05:21:45Z</authored_date>
#> [5] <links/>
#> [6] <definition>\n  <Indicator operator="OR" id="44233BFE-2019-0115-e825 ...

Created on 2019-01-14 by the reprex package (v0.2.1)

Providing ?json does not change the response object, even if it is specified in documentation. :thinking:


#7

Hi Christophe,

I thought specifying type overrides the default as per the httr documentation? At least it did in previous data pulls... ?


#8

This depends on your API. What is the response for your API and what does it support to change it.

From https://isc.sans.edu/api/

Note: Output formats include xml (default), json , text and php . For some feeds that are simple enough, csv and tab (TAB delimited) are available. Just add on to the url as a parameter such as http://isc.sans.edu/api/handler?text

However, it seems it does not work for all. Precising ?json did not change the response type.

If an API allows several output, httr will help you deal with each and helps parse it. But it can't change what the API is doing. I can't get JSON in a browser for your example.


closed #9

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.