Specific Formats Processing

Libraries for parsing and manipulating specific text formats.

Newest releases

dividuum Generates self-contained HTML files protecting secret text content.

alan-turing-institute CleverCSV provides a drop-in replacement for the Python csv package with improved dialect detection for messy CSV files. It also provides a handy command line tool that can standardize a messy file or genera

mrtzh Unbuch A simple pandoc setup to compile a book from markdown sources into html pages and pdf based on pandoc and python filters. Features: Tufte-inspired layout with sidenotes Latex formulas via katex plugin En

Acidham Alfred Markdown Notes Markdown Notes is a comprehensive note taking tool embedded into Aflred with powerful full text search (supports & and |), tag search and search capabilities for todos ( - [ ] or * [ ]) . With

camelot-dev Excalibur: A web interface to extract tabular data from PDFs Excalibur is a web interface to extract tabular data from PDFs, written in Python 3! It is powered by Camelot. Note: Excalibur only works with te

Hultner β™œ safemd A markdown renderer focusing on security first Building upon the strong foundation of GitHub's fork of cmark while adding additional security precautions to be safe out of the box. When auditing appl

erinxocon Requests-XML: XML Parsing for Humans This library intends to make parsing XML as simple and intuitive as possible. Requests-XML is related to the amazing Requests-HTML and delivers the same quality of user exper

hit9 img2txt Image to Ascii Text, can output to html or ansi terminal. See also gif2txt for animated version. Example img2txt.py jiaozhu.jpg > without-color.html : demo img2txt.py jiaozhu.

sevagas macro_pack Short description The macro_pack is a tool used to automatize obfuscation and generation of retro formats such as MS Office documents or VBS like format. Now it also handles various shortcuts for

santalu Mask EditText Sample Usage Gradle allprojects { repositories { maven { url 'https://jitpack.io' } } } dependencies { implementation 'com.github.santalu:mask-edittext:1.

fxsjy jparser A readability parser which can extract title, content, images from html pages Install: pip install jparser (requirement: lxmlοΌ‰ Usage Example: import urllib2 from jparser import PageModel html = urllib2.

facebook Duckling Duckling is a Haskell library that parses text into structured data. "the first Tuesday of October" => {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"} Requirements A Haskell environm

sloria TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language p

lark-parser Lark - a modern parsing library for Python Parse any context-free grammar, FAST and EASY! Beginners: Lark is not just another parser. It can parse any grammar you throw at it, no matter how complicated or ambiguous, an

Jonwing morphling Morphling is a convenient tool that converts Markdown to HTML. Usage Command Line Mode python -m morphling <markdown file> [options...] Use morphling in your code

aio-libs yarl Introduction Url is constructed from str: >>> from yarl import URL >>> url = URL('https://www.python.org/~guido?arg=1#frag') >>> url URL('https://www.python.

mwhite Resume This is a simple Markdown resumΓ© template, LaTeX header, and pre-processing script that can be used with Pandoc to generate professional-looking PDF and HTML output. The Markdown flavor supported is Pandoc mark

xlzd xart: generate art ascii texts. xart is a pure Python library that provides an easy way to generate art ascii texts. Life is short, be cool. β–ˆβ–ˆβ•— β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β•šβ•β•β–ˆβ–ˆβ•”β•β•β• β•šβ–ˆβ–ˆ

r1chardj0n3s Parse strings using a specification based on the Python format() syntax. parse() is the opposite of format() The module is set up to only export parse(), search(), findall(), and with_pattern() when import \* is used: &

eeue56 json-to-elm Create Elm type aliases and decoders based on JSON input This project allows you to automate the creation of: type aliases from JSON data decoders from type aliases and some union types encoders fro

reubano meza: A Python toolkit for processing tabular data Index Introduction | Requirements | Motivation | Hello World | Usage | Interoperability | Installation | Project Structure | Design Principles | Scr

davidmogar normalizr Normalizr is a Python library for text normalization that offers a bunch of actions to manipulate your text as much as you want. With normalizr you can replace symbols, punctuation, remove stop words and much

pudo normality Normality is a Python micro-package that contains a small set of text normalization functions for easier re-use. These functions accept a snippet of unicode or utf-8 encoded text and remove various classes o

mitsuhiko $ unp_ unp is a command line tool that can unpack archives easily. It mainly acts as a wrapper around other shell tools that you can find on various POSIX systems. It figures out how to invoke an unpacker to achieve the

jazzband Tablib: format-agnostic tabular dataset library _____ ______ ___________ ______ __ /_______ ____ /_ ___ /___(_)___ /_ _ __/_ __ `/__ __ \__ / __ / __ __ \ / /_ / /_/ / _ /_/ /_ / _ / _

lepture Mistune The fastest markdown parser in pure Python with renderer features, inspired by marked. Features Pure Python. Tested in Python 2.6+, Python 3.3+ and PyPy. Very Fast. It is the

daviddrysdale phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase, with no 2to3 conversion needed). Original Java code is Copyright (

wireservice csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats. It is inspired by pdftk, gdal and the original csvcut tool by Joe Germuska and Aaron Bycoffe. If

Python-Markdown Python-Markdown This is a Python implementation of John Gruber's Markdown. It is almost completely compliant with the reference implementation, though there are a few known issues. See Features for information on

trentm Markdown is a light text markup format and a processor to convert that to HTML. The originator describes it as follows: Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-

py-pdf PyPDF2 PyPDF2 is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retr

unoconv Automated conversion and styling using LibreOffice Universal Office Converter (unoconv) is a command line tool to convert any document format that LibreOffice can import to any document format that LibreOffi