RoboBrowser: Your friendly neighborhood web scraper

RoboBrowser: Your friendly neighborhood web scraper Homepage: http://robobrowser.readthedocs.org/ RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser can fetch a pag
Information
Category: Python / Web Content Extracting
Watchers: 114
Star: 3.6k
Fork: 338
Last update: Oct 21, 2021

Related Repos



google-research-datasets WebRED is a large and diverse manually annotated dataset for extracting relationships from a variety of text found on the World Wide Web.
 

MayankPandey01 BrokenLinkHijacker(BLH) is a Fast Broken Link Hijacker Tool written in Python. It crawls the website and searches for all the Broken Links.This tool is mainly designed for Bug Bounty Hunters.
 

LaundroMat Extract and index movie information of movies found in open directories posted on r/opendirectories.
 

Alir3z4 html2text html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format). Usage: html2text [(filename
 

michaelhelmick Lassie Lassie is a Python library for retrieving basic content from websites. Usage >>> import lassie >>> lassie.fetch('http://www.youtube.com/watch?v=dQw4w9WgXcQ') { 'description':
 

coleifer A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety of links, such as links to youtube videos. micawber also provides functions
 

buriy python-readability Given a html document, it pulls out the main body text and cleans it up. This is a python port of a ruby port of arc90's readability project. Installation It's easy using pip, just run: $ pi
 

federicotdn wikiquote The wikiquote Python 3 module allows you to search and retrieve quotes from any Wikiquote article, and also retrieve the quote of the day. Please keep in mind that due to Wikiquote's varying HTML article layouts,
 

datalib Libextract: extract data from websites ___ __ __ __ / (_) /_ ___ _ __/ /__________ ______/ /_ / / / __ \/ _ \| |/_/ __/ ___/ __ `/ ___/ __/ / / / /_/ / __/> </ /_/ / / /_/ / /
 

jmcarp RoboBrowser: Your friendly neighborhood web scraper Homepage: http://robobrowser.readthedocs.org/ RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser can fetch a pag