HTML Manipulation

Libraries for working with HTML and XML.

Newest releases

felixonmars zhwiki dictionary for fcitx5-pinyin and rime
 

Knio Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure Python very concisely, which eliminates the need to learn another te
 

MagicStack httptools is a Python binding for the nodejs HTTP parser. The package is available on PyPI: pip install httptools. APIs httptools contains two classes httptools.HttpRequestParser, httptools.HttpResponseParser and a
 

postlight Mercury Parser - Extracting content from chaos The Mercury Parser extracts the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpt
 

psf Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When using this library you automatically get: Full JavaScript su
 

rushter selectolax A fast HTML5 parser and CSS selectors using Modest engine. Installation From PyPI using pip: pip install selectolax Development version from github: git clone --recursive https://git
 

matiskay HTML Similarity This package provides a set of functions to measure the similarity between web pages. Install The quick way: pip install html-similarity How it works? Struct
 

MechanicalSoup Home page https://mechanicalsoup.readthedocs.io/ Overview A Python library for automating interaction with websites. MechanicalSoup automatically stores and sends cookies, follows redirects, and can fo
 

Hrabal TemPy Fast Object-Oriented HTML templating With Python! What? Build HTML without writing a single tag. TemPy dynamically generates HTML and accesses it in a pure Python, or jQuery fas
 

googleprojectzero Domato A DOM fuzzer Written and maintained by Ivan Fratric, [email protected] Copyright 2017 Google Inc. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this f
 

gaojiuli tomd When crawling online articles such as news, blogs, etc. I want to save them in markdown files but not databases. Tomd has the ability of converting a HTML that converted from markdown. If a HTML can't be descri
 

gaojiuli tomd When crawling online articles such as news, blogs, etc. I want to save them in markdown files but not databases. Tomd has the ability of converting a HTML that converted from markdown. If a HTML can't be descri
 

sihaelov Harser Harser is a library for easy extracting data from HTML and building XPath. Installation pip install harser Examples >>> from harser import Harser >>> HTML = '
 

matiasb demiurge PyQuery-based scraping micro-framework. Supports Python 2.x and 3.x. Documentation: http://demiurge.readthedocs.org Installing demiurge $ pip install demiurge Quick start Define item
 

martinblech xmltodict xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec": >>> print(json.dumps(xmltodict.parse(""" ... <mydocument has="an attribute"&gt
 

xhtml2pdf XHTML2PDF The current release of xhtml2pdf is xhtml2pdf 0.2.4 which is the first stable version that has Python 3 support. As with all open-source software, its use in production depends on many factors, s
 

stchris untangle Documentation Converts XML to a Python object. Siblings with similar names are grouped into a list. Children can be accessed with parent.child, attributes with element['attribute']. You can call t
 

gawel pyquery: a jquery-like library for python pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation. Thi
 

pallets MarkupSafe Implements a unicode subclass that supports HTML strings: >>> from markupsafe import Markup, escape >>> escape("<script>alert(document.cookie);</script>") Markup(u'<scr
 

html5lib html5lib html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers. Usage Simple usage follows this patte