WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.

WebCollector WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. In addition to a general

Related Repos



reggoodwin About Ferrit is an API driven web crawler service written in Scala using Akka, Spray and Cassandra. I created it to help me learn more about small service design using Akka and the Functional Reactive programming style.
 

dyweb scrala scrala is a web crawling framework for scala, which is inspired by scrapy. Installation From Docker gaocegege/scrala in dockerhub Create a Dockerfile in your project. FROM gaoceg
 

internetarchive Heritrix Introduction Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix
 

fcannizzaro jsoup-annotations Jsoup Annotations POJO Gradle Dependency Step 1. Add the JitPack repository to your build file allprojects { repositories { ... maven { url 'https://jitpack.io'
 

code4craft Readme in Chinese A scalable crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. Features:
 

NandanDesai SocialInfo4J - fetch data from Facebook, Instagram and LinkedIn
 

USCDataScience A web crawler is a bot program that fetches resources from the web for the sake of building applications like search engines, knowledge bases, etc.