Thursday, June 25, 2015 - Next Generation Web Crawler

We had used many open source web crawlers in the past, but recently a friend of mine referred me to a cool tool at essentially parses the data on any website and structures it into a table of rows/columns - "Turn web pages into data". This data can be exported as an CSV file and it also provides a REST API to extract the data. This kind of higher abstraction over raw web crawling can be extremely useful for developers.

We can use the magic tool for automatic extraction or use their free tool to teach it how to extract data. 

No comments:

Post a Comment