A high-level distributed crawling framework.

Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structured data from websites. It provides simple and fast yet flexible

Related Repos

snjoer Broad Crawler Overview This is a project aiming to crawl a variety of web pages(especially pages of news) with a spider, a.k.a. broad crawler. Features The broad crawler should support the following fea

howie6879 Google search results crawler, get google search results that you need

LinuxTerminali A Terminal based program to follow live cricket score by scraping crickbuzz.com

hellysmile fake-useragent info: Up to date simple useragent faker with real world database Features grabs up to date useragent from useragentstring.com randomize with real world statistic

gaojiuli Web crawling framework for everyone. Written with asyncio, uvloop and aiohttp. Requirements Python3.5+ Installation pip install gain pip install uvloop (Only linux) Usage Write

untwisted Sukhoi Minimalist and powerful Web Crawler. Sukhoi is built on top of the concept of miners, it is similar to what happens with scrapy and its spiders. However, in sukhoi the miners can be placed in structures like lists or dict

pythad Selenium extensions Tools that will make writing tests, bots and scrapers using Selenium much easier Free software: MIT license Documentation: https://selenium-extensions.readthedocs.io. Install

jullrich pcap2curl Read a packet capture, extract HTTP requests and turn them into cURL commands for replay. See https://isc.sans.edu/diary.html?storyid=22900 This is a simple (too simple?) Python script that will read a pcap, find HTTP