Anthelion is a plugin for Apache Nutch to crawl semantic annotations within HTML pages.

nutch-anth Anthelion is a Nutch plugin for focused crawling of semantic data. The project is an open-source project released under the Apache License 2.0. Note: This project contains the complete Nutch 1.6 distribution. The plug

Related Repos



NandanDesai SocialInfo4J - fetch data from Facebook, Instagram and LinkedIn
 

USCDataScience A web crawler is a bot program that fetches resources from the web for the sake of building applications like search engines, knowledge bases, etc.