Chompjs for parsing tricky Javascript Objects

python
data
Published

April 28, 2021

Modern Javascript web frameworks often embed the data used to render each webpage in the HTML. This means an easy way of extracting data is capturing the string representation of the object with a pushdown automoton and then parsing it. Sometimes Python’s json.loads won’t cut it for dynamic JSON; one option is demjson but another much faster option is chompjs.

Chompjs converts a javascript string into something that json.loads. It’s a little less strict than demjson; for example {"key": undefined} will be converted by chompjs.parse_js_object to {"key": "undefined"} (contrast with demjson {"key": demjson.undefined} which preserves the type). However it’s much faster, about 20x on the tests I’ve done, which makes it a much better drop-in replacement for json.loads on messy data.