Natural Language Processing

Libraries for working with human languages.

Newest releases

youzanai T'rex Park(霸王龙公园) Trexpark项目由有赞数据智能团队开源,是国内首个基于电商大数据训练的开源NLP和图像项目。我们预期将逐步开放基于商品标题,评论,客服对话等NLP语聊,以及商品主图,品牌logo等进行预训练的NLP和图像模型。 为什么是霸王龙? 霸王龙是有赞的吉祥物。呃,准确
 

quoll remorse Clojure to morse code conversion Usage Dependencies This can be included in deps.edn with the following entry in the :deps map: com.github.quo
 

dakrone Clojure library interface to OpenNLP - https://opennlp.apache.org/ A library to interface with the OpenNLP (Open Natural Language Processing) library
 

lancopku pkuseg:一个多领域中文分词工具包 (English Version) pkuseg 是基于论文[Luo et. al, 2019]的工具包。其简单易用,支持细分领域分词,有效提升了分词准确度。
 

columbia-applied-data-science Rosetta Tools for data science with a focus on text processing. Focuses on "medium data", i.e. data too big to fit into memory but too small to necess
 

proycon This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the
 

gugarosa NALP: Natural Adversarial Language Processing Welcome to NALP. Have you ever wanted to create natural text from raw sources? If yes, NALP is for you!
 

chartbeat-labs textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig
 

proycon Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging
 

machinalis About Yalign is a tool for extracting parallel sentences from comparable corpora. Statistical Machine Translation relies on parallel corpora (eg.. eur
 

isnowfy SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob
 

machinalis __ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/
 

JuliaText WordTokenizers Some basic tokenizers for Natural Language Processing. Installation: As per standard Julia package installation: pkg> add WordTokenizer
 

JuliaText CorpusLoaders A collection of various means for loading various different corpora used in NLP. Installation As per the standard Julia package installa
 

GrowingGit GitHub English Top Charts 「Help you discover excellent English projects and get rid of disturbing by other spoken language.」 Features • Definition of
 

BlackKakapo Icelandic Word Embeddings. Here you can find pre-trained corpora of word embeddings. Current methods: CBOW, Skip-Gram, Fast-Text (from Gensim library). The .vec and .model files are available for download (all in one archive).
 

dizzyliam tome A natural language library for Nim. import tome const text = """ There should be one and only one programming language for everything. That lan
 

yousefelmahdy SiGnAl About SiGnAl is a project to modulate three speech signals using the following scheme: 𝑠(𝑡) = 𝑥1(𝑡) cos 𝜔1𝑡 + 𝑥2(𝑡) cos 𝜔2𝑡 + 𝑥3(𝑡)
 

blcuicall 汉语学习者文本多维标注数据集YACLC V1.0 中文 | English 汉语学习者文本多维标注数据集(Yet Another Chinese Learner Corpus,YACLC)由北京语言大学、清华大学、北京师范大学、云南师范大学、东北大学、上海财经大学等高校组成的团队共同发布。主要项目负
 

ISGNeuroTeam СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма
 

ctripcorp Flybird | English Version 行为驱动开发(Behavior-driven development,缩写BDD),是一种软件过程的思想或者方法,是一种敏捷软件开发的技术. Flybird是基于BDD模式的前端UI自动化测试框架,提供了一系列开箱即用的工具和完善的文档。 基于Be
 

Bellman281 Stat4ML Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP Registration Form: https://forms.gle/uV3qL9Wngtxxca9C6 Statistics an
 

BM-K Korean-Sentence-Embedding 🍭 Korean sentence embedding repository. You can download the pre-trained models and inference right away, also it provides
 

Bellman281 Satat4ML Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP Registration Form: https://forms.gle/uV3qL9Wngtxxca9C6 Statistics a
 

soimort Translate Shell Translate Shell (formerly Google Translate CLI) is a command-line translator powered by Google Translate (default), Bing Translator, Y
 

kon9chunkit GitHub English Top Charts 「Help you discover excellent English projects and get rid of the interference of other spoken language.」 Features • Definiti
 

ashishpatel26 Awesome Treasure of Transformers Models Collection 🧑‍💻 👩‍💻 Collection of All NLP Deep learning algorithm list with Code 🧑‍💻 👩‍💻 Sr No Algorith
 

BlackKakapo Romanian Word Embeddings These vectors was trained with 3 different methods (CBOW, Skip-Gram, FastText). The dataset is a bunch of text that was taken
 

adarshmalviya News-Classifier Webapp Link :-> https://newsclassifier-nlp.herokuapp.com The News Classifier web app was built using concepts of NLP(Natural Language
 

ablaamim born2beroot 🔁 Down the rabbit hole, Sysadmin niggers glow in the dark, therefore darkness is your new master. In this repo you will find all the docu
 

PatrickMassot Lean verbose This project provides tactics for Lean in a very controlled natural language. The original version of those tactics were written in Frenc
 

TakeLab A framework agnostic Python NLP library for data loading and preprocessing. What is Podium? Podium is a framework agnostic Python natural language pro