Using JSONlite to extract cast members from a movie DB

Hi,

I am trying to convert a JSON (i think) file into a tidy frame format of some description. The dataset is taken from Kaggle and is the file credits.csv

The first couple of rows look like this and when i try and extract the JSON i get the error

Error: lexical error: invalid char in json text.
[{'cast_id': 14, 'character': 'Wo
(right here) ------^

a <- "[{'cast_id': 14, 'character': 'Woody (voice)', 'credit_id': '52fe4284c3a36847f8024f95', 'gender': 2, 'id': 31, 'name': 'Tom Hanks', 'order': 0, 'profile_path': '/pQFoyx7rp09CJTAb932F2g8Nlho.jpg'}, {'cast_id': 15, 'character': 'Buzz Lightyear (voice)', 'credit_id': '52fe4284c3a36847f8024f99', 'gender': 2, 'id': 12898, 'name': 'Tim Allen', 'order': 1, 'profile_path': '/uX2xVf6pMmPepxnvFWyBtjexzgY.jpg'}, {'cast_id': 16, 'character': 'Mr. Potato Head (voice)', 'credit_id': '52fe4284c3a36847f8024f9d', 'gender': 2, 'id': 7167, 'name': 'Don Rickles', 'order': 2, 'profile_path': '/h5BcaDMPRVLHLDzbQavec4xfSdt.jpg'}, {'cast_id': 17, 'character': 'Slinky Dog (voice)', 'credit_id': '52fe4284c3a36847f8024fa1', 'gender': 2, 'id': 12899, 'name': 'Jim Varney', 'order': 3, 'profile_path': '/eIo2jVVXYgjDtaHoF19Ll9vtW7h.jpg'}, {'cast_id': 18, 'character': 'Rex (voice)', 'credit_id': '52fe4284c3a36847f8024fa5', 'gender': 2, 'id': 12900, 'name': 'Wallace Shawn', 'order': 4, 'profile_path': '/oGE6JqPP2xH4tNORKNqxbNPYi7u.jpg'}, {'cast_id': 19, 'character': 'Hamm (voice)', 'credit_id': '52fe4284c3a36847f8024fa9', 'gender': 2, 'id': 7907, 'name': 'John Ratzenberger', 'order': 5, 'profile_path': '/yGechiKWL6TJDfVE2KPSJYqdMsY.jpg'}, {'cast_id': 20, 'character': 'Bo Peep (voice)', 'credit_id': '52fe4284c3a36847f8024fad', 'gender': 1, 'id': 8873, 'name': 'Annie Potts', 'order': 6, 'profile_path': '/eryXT84RL41jHSJcMy4kS3u9y6w.jpg'}, {'cast_id': 26, 'character': 'Andy (voice)', 'credit_id': '52fe4284c3a36847f8024fc1', 'gender': 0, 'id': 1116442, 'name': 'John Morris', 'order': 7, 'profile_path': '/vYGyvK4LzeaUCoNSHtsuqJUY15M.jpg'}, {'cast_id': 22, 'character': 'Sid (voice)', 'credit_id': '52fe4284c3a36847f8024fb1', 'gender': 2, 'id': 12901, 'name': 'Erik von Detten', 'order': 8, 'profile_path': '/twnF1ZaJ1FUNUuo6xLXwcxjayBE.jpg'}, {'cast_id': 23, 'character': 'Mrs. Davis (voice)', 'credit_id': '52fe4284c3a36847f8024fb5', 'gender': 1, 'id': 12133, 'name': 'Laurie Metcalf', 'order': 9, 'profile_path': '/unMMIT60eoBM2sN2nyR7EZ2BvvD.jpg'}, {'cast_id': 24, 'character': 'Sergeant (voice)', 'credit_id': '52fe4284c3a36847f8024fb9', 'gender': 2, 'id': 8655, 'name': 'R. Lee Ermey', 'order': 10, 'profile_path': '/r8GBqFBjypLUP9VVqDqfZ7wYbSs.jpg'}, {'cast_id': 25, 'character': 'Hannah (voice)', 'credit_id': '52fe4284c3a36847f8024fbd', 'gender': 1, 'id': 12903, 'name': 'Sarah Freeman', 'order': 11, 'profile_path': None}, {'cast_id': 27, 'character': 'TV Announcer (voice)', 'credit_id': '52fe4284c3a36847f8024fc5', 'gender': 2, 'id': 37221, 'name': 'Penn Jillette', 'order': 12, 'profile_path': '/zmAaXUdx12NRsssgHbk1T31j2x9.jpg'}]"


b <- "[{'cast_id': 2, 'character': 'Max Goldman', 'credit_id': '52fe466a9251416c75077a8d', 'gender': 2, 'id': 6837, 'name': 'Walter Matthau', 'order': 0, 'profile_path': '/xJVkvprOnzP5Zdh5y63y8HHniDZ.jpg'}, {'cast_id': 3, 'character': 'John Gustafson', 'credit_id': '52fe466a9251416c75077a91', 'gender': 2, 'id': 3151, 'name': 'Jack Lemmon', 'order': 1, 'profile_path': '/chZmNRYMtqkiDlatprGDH4BzGqG.jpg'}, {'cast_id': 4, 'character': 'Ariel Gustafson', 'credit_id': '52fe466a9251416c75077a95', 'gender': 1, 'id': 13567, 'name': 'Ann-Margret', 'order': 2, 'profile_path': '/jx5lTaJ5VXZHYB52gaOTAZ9STZk.jpg'}, {'cast_id': 5, 'character': 'Maria Sophia Coletta Ragetti', 'credit_id': '52fe466a9251416c75077a99', 'gender': 1, 'id': 16757, 'name': 'Sophia Loren', 'order': 3, 'profile_path': '/emKLhbji1c7BjcA2DdbWf0EP9zH.jpg'}, {'cast_id': 6, 'character': 'Melanie Gustafson', 'credit_id': '52fe466a9251416c75077a9d', 'gender': 1, 'id': 589, 'name': 'Daryl Hannah', 'order': 4, 'profile_path': '/4LLmp6AQdlj6ueGCRbVRSGvvFSt.jpg'}, {'cast_id': 9, 'character': 'Grandpa Gustafson', 'credit_id': '53e5fcc2c3a3684430000d65', 'gender': 2, 'id': 16523, 'name': 'Burgess Meredith', 'order': 5, 'profile_path': '/lm98oKloU33Q7QDIIMSyc4Pr2jA.jpg'}, {'cast_id': 10, 'character': 'Jacob Goldman', 'credit_id': '53e5fcd4c3a3684433000e1a', 'gender': 2, 'id': 7166, 'name': 'Kevin Pollak', 'order': 6, 'profile_path': '/kwu2T8CDnThZTzE88uiSgJ5eHXf.jpg'}])"

credits <- c(a, b)
jsonlite::fromJSON(credits)

Can anyone see why this is the case?

Thanks

json uses double quotes " but here are single quotes ' which are invalid.

you may try replace the ' in the string by " before going into fromJSON()

a <- gsub("'","\"",a)

======================
supplement:
I tried your json string, there may be one more problem, you should replace all the None by null in the string, too.

gsub(": None}",": null}",a)
1 Like

Hi @yifanliu

I tried this

a <- gsub("'",'"', a)
jsonlite::fromJSON(a)

# Error: lexical error: invalid char in json text.
#          "order": 11, "profile_path": None}, {"cast_id": 27, "charact
#                     (right here) ------^

I've just found out, see in my last reply

1 Like

Thank you so much. This is exactly what i was looking for

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.