A high-level distributed crawling framework.

Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structured data from websites. It provides simple and fast yet flexible

Related Repos

LinuxTerminali A Terminal based program to follow live cricket score by scraping crickbuzz.com

hellysmile fake-useragent info: Up to date simple useragent faker with real world database Features grabs up to date useragent from useragentstring.com randomize with real world statistic

gaojiuli Web crawling framework for everyone. Written with asyncio, uvloop and aiohttp. Requirements Python3.5+ Installation pip install gain pip install uvloop (Only linux) Usage Write

untwisted Sukhoi Minimalist and powerful Web Crawler. Sukhoi is built on top of the concept of miners, it is similar to what happens with scrapy and its spiders. However, in sukhoi the miners can be placed in structures like lists or dict

pythad Selenium extensions Tools that will make writing tests, bots and scrapers using Selenium much easier Free software: MIT license Documentation: https://selenium-extensions.readthedocs.io. Install

jullrich pcap2curl Read a packet capture, extract HTTP requests and turn them into cURL commands for replay. See https://isc.sans.edu/diary.html?storyid=22900 This is a simple (too simple?) Python script that will read a pcap, find HTTP

rivermont spidy Web Crawler Spidy (/spˈɪdi/) is the simple, easy to use command line web crawler. Given a list of web links, it uses Python requests to query the webpages, and lxml to extract all links from the page. Pretty simple!

nicodds Introduction In the era of Big Data, the web is an endless source of information. For this reason, there are plenty of good tools/frameworks to perform scraping of web pages. So, I guess, in an ideal world there should be no nee