Convert latex math expressions to HTML for use in markdown documents or package manual pages. The rendering is done in R using the V8 engine, which eliminates the need for embedding the MathJax library in the web pages.
rmrl is a Python library for rendering reMarkable documents to PDF files. It takes the original PDF document and the files describing your annotations, combining them to produce a document close to what reMarkable itself would out
Dominate
Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure Python very concisely, which eliminates the need to learn another te
httptools is a Python binding for the nodejs HTTP parser.
The package is available on PyPI: pip install httptools.
APIs
httptools contains two classes httptools.HttpRequestParser, httptools.HttpResponseParser and a
Mercury Parser - Extracting content from chaos
The Mercury Parser extracts the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpt
Requests-HTML: HTML Parsing for Humansâ„¢
This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible.
When using this library you automatically get:
Full JavaScript su
selectolax
A fast HTML5 parser and CSS selectors using Modest engine.
Installation
From PyPI using pip:
pip install selectolax
Development version from github:
git clone --recursive https://git
HTML Similarity
This package provides a set of functions to measure the similarity between web pages.
Install
The quick way:
pip install html-similarity
How it works?
Struct
Home page
https://mechanicalsoup.readthedocs.io/
Overview
A Python library for automating interaction with websites. MechanicalSoup automatically stores and sends cookies, follows redirects, and can fo
TemPy
Fast Object-Oriented HTML templating With Python!
What?
Build HTML without writing a single tag. TemPy dynamically generates HTML and accesses it in a pure Python, or jQuery fas
Domato
A DOM fuzzer
Written and maintained by Ivan Fratric, [email protected]
Copyright 2017 Google Inc. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this f
tomd
When crawling online articles such as news, blogs, etc. I want to save them in markdown files but not databases. Tomd has the ability of converting a HTML that converted from markdown. If a HTML can't be descri
tomd
When crawling online articles such as news, blogs, etc. I want to save them in markdown files but not databases. Tomd has the ability of converting a HTML that converted from markdown. If a HTML can't be descri
Harser
Harser is a library for easy extracting data from HTML and building XPath.
Installation
pip install harser
Examples
>>> from harser import Harser
>>> HTML = '
xmltodict
xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec":
>>> print(json.dumps(xmltodict.parse("""
... <mydocument has="an attribute">
XHTML2PDF
The current release of xhtml2pdf is xhtml2pdf 0.2.4 which is the first stable version that has Python 3 support. As with all open-source software, its use in production depends on many factors, s
untangle
Documentation
Converts XML to a Python object.
Siblings with similar names are grouped into a list.
Children can be accessed with parent.child, attributes with element['attribute'].
You can call t
pyquery: a jquery-like library for python
pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation.
Thi
MarkupSafe
Implements a unicode subclass that supports HTML strings:
>>> from markupsafe import Markup, escape
>>> escape("<script>alert(document.cookie);</script>")
Markup(u'&lt;scr
html5lib
html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.
Usage
Simple usage follows this patte