A pure-python HTML screen-scraping library

Scrapely Scrapely is a library for extracting structured data from HTML pages. Given some example web pages and the data to be extracted, scrapely constructs a parser for all similar pages. Overview Scrapinghub wr

Related Repos

guptarohit cryptoCMD: cryptoCurrency Market Data Cryptocurrency historical market price data scraper written in Python. Installation $ pip install cryptocmd to install from the latest source use following c

not-kennethreitz PySoundCloud SoundCloud is a single–page webapp (all content served via JavaScript). This repo serves as an experiment to see how to scrape and parse a website like this using requests-html, which features a full web browser f

initstring linkedin2username OSINT Tool: Generate username lists from companies on LinkedIn. This is a pure web-scraper, no API key required. You use your valid LinkedIn username and password to login, it will create several lists of possi

hardikvasa Google Images Download Python Script for 'searching' and 'downloading' hundreds of Google images to the local hard disk! Documentation Documentation Homepage Installation Input arguments Examples and Code S

indrajithi A Tiny Web Crawler A web crawler written in python. Install Requirements pip install validators beautifulsoup4 lxml Python version: Python 3.6.3 :: Anaconda, Inc. Run python crawler.py Starts c

s0md3v ORBIT Blockchain Transactions Investigation Tool 3.2-blue.svg" style="max-width:100%;"> Introduction Orbit is designed to explore network of a blockchain wallet by recursively crawling through transact

s0md3v Photon Incredibly fast crawler designed for OSINT. Photon Wiki • How To Use • Compatibility • Photon Library • Contribution • Roadmap Key Features Data Extraction Photon

anjia0532 Random proxy middleware for Scrapy (http://scrapy.org/) base on https://github.com/aivarsk/scrapy-proxies , support load proxies from https://github.com/qiyeboy/IPProxyPool Processes Scrapy requests using a random proxy from lis