Text normalization library for Python

normalizr Normalizr is a Python library for text normalization that offers a bunch of actions to manipulate your text as much as you want. With normalizr you can replace symbols, punctuation, remove stop words and much more.
Category: Python / Specific Formats Processing
Watchers: 11
Star: 198
Fork: 28
Last update: Jan 24, 2023

Related Repos

dividuum Generates self-contained HTML files protecting secret text content.

alan-turing-institute CleverCSV provides a drop-in replacement for the Python csv package with improved dialect detection for messy CSV files. It also provides a handy command line tool that can standardize a messy file or generate Python

mrtzh Unbuch A simple pandoc setup to compile a book from markdown sources into html pages and pdf based on pandoc and python filters. Features: Tufte-inspired layout with sidenotes Latex formulas via katex plugin Environments

Acidham Alfred Markdown Notes Markdown Notes is a comprehensive note taking tool embedded into Aflred with powerful full text search (supports & and |), tag search and search capabilities for todos ( - [ ] or * [ ]) . With MD Notes y

camelot-dev Excalibur: A web interface to extract tabular data from PDFs Excalibur is a web interface to extract tabular data from PDFs, written in Python 3! It is powered by Camelot. Note: Excalibur only works with text-based P

Hultner ♜ safemd A markdown renderer focusing on security first Building upon the strong foundation of GitHub's fork of cmark while adding additional security precautions to be safe out of the box. When auditing applications r

erinxocon Requests-XML: XML Parsing for Humans This library intends to make parsing XML as simple and intuitive as possible. Requests-XML is related to the amazing Requests-HTML and delivers the same quality of user experience — wi

hit9 img2txt Image to Ascii Text, can output to html or ansi terminal. See also gif2txt for animated version. Example img2txt.py jiaozhu.jpg > without-color.html : demo img2txt.py jiaozhu.jpg --dith

sevagas macro_pack Short description The macro_pack is a tool used to automatize obfuscation and generation of retro formats such as MS Office documents or VBS like format. Now it also handles various shortcuts formats. This

santalu Mask EditText Sample Usage Gradle allprojects { repositories { maven { url 'https://jitpack.io' } } } dependencies { implementation 'com.github.santalu:mask-edittext:1.1.1' }

fxsjy jparser A readability parser which can extract title, content, images from html pages Install: pip install jparser (requirement: lxml) Usage Example: import urllib2 from jparser import PageModel html = urllib2.

facebook Duckling Duckling is a Haskell library that parses text into structured data. "the first Tuesday of October" => {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"} Requirements A Haskell environment is req

sloria TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing

lark-parser Lark - a modern parsing library for Python Parse any context-free grammar, FAST and EASY! Beginners: Lark is not just another parser. It can parse any grammar you throw at it, no matter how complicated or ambiguous, and do so ef

Jonwing morphling Morphling is a convenient tool that converts Markdown to HTML. Usage Command Line Mode python -m morphling <markdown file> [options...] Use morphling in your code fro