Thursday, June 25, 2015

import.io - Next Generation Web Crawler

We had used many open source web crawlers in the past, but recently a friend of mine referred me to a cool tool at import.io

Import.io essentially parses the data on any website and structures it into a table of rows/columns - "Turn web pages into data". This data can be exported as an CSV file and it also provides a REST API to extract the data. This kind of higher abstraction over raw web crawling can be extremely useful for developers.

We can use the magic tool for automatic extraction or use their free tool to teach it how to extract data.