A high-level distributed crawling framework.

Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structured data from websites. It provides simple and fast yet flexible

Related Repos



LinuxTerminali A Terminal based program to follow live cricket score by scraping crickbuzz.com
 

hellysmile fake-useragent info: Up to date simple useragent faker with real world database Features grabs up to date useragent from useragentstring.com randomize with real world statistic
 

gaojiuli Web crawling framework for everyone. Written with asyncio, uvloop and aiohttp. Requirements Python3.5+ Installation pip install gain pip install uvloop (Only linux) Usage Write
 

untwisted Sukhoi Minimalist and powerful Web Crawler. Sukhoi is built on top of the concept of miners, it is similar to what happens with scrapy and its spiders. However, in sukhoi the miners can be placed in structures like lists or dict
 

pythad Selenium extensions Tools that will make writing tests, bots and scrapers using Selenium much easier Free software: MIT license Documentation: https://selenium-extensions.readthedocs.io. Install
 

jullrich pcap2curl Read a packet capture, extract HTTP requests and turn them into cURL commands for replay. See https://isc.sans.edu/diary.html?storyid=22900 This is a simple (too simple?) Python script that will read a pcap, find HTTP
 

rivermont spidy Web Crawler Spidy (/spˈɪdi/) is the simple, easy to use command line web crawler. Given a list of web links, it uses Python requests to query the webpages, and lxml to extract all links from the page. Pretty simple!
 

nicodds Introduction In the era of Big Data, the web is an endless source of information. For this reason, there are plenty of good tools/frameworks to perform scraping of web pages. So, I guess, in an ideal world there should be no nee