Scrape-It is a node package to scrape web page data with custom required data.
It has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios:
- The ajax response is in JSON format. In this case, you can make the request directly, without needing a scraping library.
- The ajax response gives you HTML back. Instead of calling the main website (e.g. example.com), pass to
scrape-it
the ajax url (e.g.example.com/api/that-endpoint
) and you will you will be able to parse the response - The ajax request is so complicated that you don’t want to reverse-engineer it. In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the
.scrapeHTML
method from scrape it once you get the HTML loaded on the page.
Reference: https://github.com/IonicaBizau/scrape-it#clipboard-example