Web Content Extracting

Libraries for extracting web contents.

Newest releases

jmcarp RoboBrowser: Your friendly neighborhood web scraper Homepage: http://robobrowser.readthedocs.org/ RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser
 

federicotdn wikiquote The wikiquote Python 3 module allows you to search and retrieve quotes from any Wikiquote article, and also retrieve the quote of the day. Please keep in mind that due to Wikiquote's varying HTML article
 

buriy python-readability Given a html document, it pulls out the main body text and cleans it up. This is a python port of a ruby port of arc90's readability project. Installation It's easy using pip, just
 

coleifer A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety of links, such as links to youtube videos. micawber also provides
 

michaelhelmick Lassie Lassie is a Python library for retrieving basic content from websites. Usage >>> import lassie >>> lassie.fetch('http://www.youtube.com/watch?v=dQw4w9WgXcQ') { 'des
 

Alir3z4 html2text html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format). Usage: html2text
 

datalib Libextract: extract data from websites ___ __ __ __ / (_) /_ ___ _ __/ /__________ ______/ /_ / / / __ \/ _ \| |/_/ __/ ___/ __ `/ ___/ __/ / / / /_/ / __/> </ /_/ /