Basic API response for teaching demo

bensoltoff · October 9, 2017, 4:05pm

In my computing class I spend two days on obtaining data from the web (i.e. API requests, web scraping). I used to use the OMDB API because it didn't require authentication, queries were built very easily in the url, and the response converted very nicely to a one row data frame without any hassle. However a) it moved last year to a paid API (which wasn't a huge problem, they gave me an API key for teaching demo purposes), and b) they added a new ratings field which causes the response to take on a nested list format:

{
    "Title": "Sharknado",
    "Year": "2013",
    "Rated": "TV-14",
    "Released": "11 Jul 2013",
    "Runtime": "86 min",
    "Genre": "Comedy, Horror, Sci-Fi",
    "Director": "Anthony C. Ferrante",
    "Writer": "Thunder Levin",
    "Actors": "Ian Ziering, Tara Reid, John Heard, Cassandra Scerbo",
    "Plot": "When a freak hurricane swamps Los Angeles, nature's deadliest killer rules sea, land, and air as thousands of sharks terrorize the waterlogged populace.",
    "Language": "English",
    "Country": "USA",
    "Awards": "1 win & 2 nominations.",
    "Poster": "https://images-na.ssl-images-amazon.com/images/M/MV5BOTE2OTk4MTQzNV5BMl5BanBnXkFtZTcwODUxOTM3OQ@@._V1_SX300.jpg",
    "Ratings": [
        {
            "Source": "Internet Movie Database",
            "Value": "3.3/10"
        },
        {
            "Source": "Rotten Tomatoes",
            "Value": "82%"
        }
    ],
    "Metascore": "N/A",
    "imdbRating": "3.3",
    "imdbVotes": "38,601",
    "imdbID": "tt2724064",
    "Type": "movie",
    "DVD": "03 Sep 2013",
    "BoxOffice": "N/A",
    "Production": "NCM Fathom",
    "Website": "http://www.mtivideo.com/TitleView.aspx?TITLE_ID=728",
    "Response": "True"
}

which does not convert easily to a flat data frame easily using fromJSON() and as_tibble() (I have to convert the Ratings list element to NULL first). Obviously most APIs have this type of structure and I plan to go through these more complex structures with the students, using purrr functions to assist in the flattening process (I love Jenny Bryan's purrr tutorials), however for just an initial demo I'd prefer an API that returns a simple, flat structure. Anyone have recommendations for such an API?

pgensler · October 10, 2017, 8:05pm

I'm not sure if this is too complicated, but there seems to be quite a bit of public data hosted on data.world, and you can easily connect to it via their R package, which I think the JSON responses are pretty easy to work with, and is free to use.

Check out this blog for more information, on how someone has used it at data for democracy:

If you are looking for a good resource to parse JSON with ease, I would encourage you to look at the tidyjson package, which is not on CRAN, but you can install from github:

jennybryan · October 10, 2017, 11:08pm

How about the Random User Generator https://randomuser.me?

library(httr)
res <- GET("https://randomuser.me/api/?format=csv&inc=gender,name,nat&noinfo")
content(res)
#> Parsed with column specification:
#> cols(
#>   gender = col_character(),
#>   name.title = col_character(),
#>   name.first = col_character(),
#>   name.last = col_character(),
#>   nat = col_character()
#> )
#> # A tibble: 1 x 5
#>   gender name.title name.first name.last   nat
#>    <chr>      <chr>      <chr>     <chr> <chr>
#> 1   male         mr      marco     marin    ES

#> Created on Tue Oct 10 16:06:59 2017 using the reprex package

mmuurr · October 11, 2017, 1:25am

FYI, I believe the up-to-date (i.e. maintained) tidyjson package is here: https://github.com/jeremystan/tidyjson.