A tiny library for Python text normalisation. Useful for ad-hoc text processing.

normality Normality is a Python micro-package that contains a small set of text normalization functions for easier re-use. These functions accept a snippet of unicode or utf-8 encoded text and remove various classes of characte

Related Repos

facebook Duckling Duckling is a Haskell library that parses text into structured data. "the first Tuesday of October" => {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"} Requirements A Haskell environment is req

fxsjy jparser A readability parser which can extract title, content, images from html pages Install: pip install jparser (requirement: lxml) Usage Example: import urllib2 from jparser import PageModel html = urllib2.

santalu Mask EditText Sample Usage Gradle allprojects { repositories { maven { url 'https://jitpack.io' } } } dependencies { implementation 'com.github.santalu:mask-edittext:1.1.1' }

sevagas macro_pack Short description The macro_pack is a tool used to automatize obfuscation and generation of retro formats such as MS Office documents or VBS like format. Now it also handles various shortcuts formats. This

hit9 img2txt Image to Ascii Text, can output to html or ansi terminal. See also gif2txt for animated version. Example img2txt.py jiaozhu.jpg > without-color.html : demo img2txt.py jiaozhu.jpg --dith

erinxocon Requests-XML: XML Parsing for Humans This library intends to make parsing XML as simple and intuitive as possible. Requests-XML is related to the amazing Requests-HTML and delivers the same quality of user experience — wi

Hultner ♜ safemd A markdown renderer focusing on security first Building upon the strong foundation of GitHub's fork of cmark while adding additional security precautions to be safe out of the box. When auditing applications r

camelot-dev Excalibur: A web interface to extract tabular data from PDFs Excalibur is a web interface to extract tabular data from PDFs, written in Python 3! It is powered by Camelot. Note: Excalibur only works with text-based P