BrokenLinkHijacker(BLH) is a Fast Broken Link Hijacker Tool written in Python. It crawls the website and searches for all the Broken Links.This tool is mainly designed for Bug Bounty Hunters.
RoboBrowser: Your friendly neighborhood web scraper
Homepage: http://robobrowser.readthedocs.org/
RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser
wikiquote
The wikiquote Python 3 module allows you to search and retrieve quotes from any Wikiquote article, and also retrieve the quote of the day. Please keep in mind that due to Wikiquote's varying HTML article
python-readability
Given a html document, it pulls out the main body text and cleans it up.
This is a python port of a ruby port of arc90's readability project.
Installation
It's easy using pip, just
A small library for extracting rich content from urls.
what does it do?
micawber supplies a few methods for retrieving rich metadata about a variety of links, such as links to youtube videos. micawber also provides
Lassie
Lassie is a Python library for retrieving basic content from websites.
Usage
>>> import lassie
>>> lassie.fetch('http://www.youtube.com/watch?v=dQw4w9WgXcQ')
{
'des
html2text
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: html2text