crawler

Apache Nutchâ„¢

Nutch is a highly extensible, highly scalable, matured, production-ready Web crawler which enables fine grained configuration and accomodates a wide variety of data acquisition tasks.

Common Crawl - Open Repository of Web Crawl Data

We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.

Wiby - Search Engine for the Classic Web

Wiby is a search engine for older style pages, lightweight and based on a subject of interest. Building a web more reminiscent of the early internet.