Natural Language Processing

Libraries for working with human languages.

Newest releases

KuangDD 语音合成工具箱,Text To Speech Toolkit,多种音色可供选择的语音合成工具。

openspeech-team OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recognition. We aim to make ASR technology easier to use for everyone.

SteveMCarroll A practical guide to how to pronounce non-English names for English speakers

Hironsan This python library helps you with augmenting text data for named entity recognition.

LianjiaTech Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing. To make speech processing avail

WelkinYang Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv)

tupleblog Automatic Slim Classifier using WangchanBERTa (RoBERTa trained on Thai text)

thunlp Survey of Surveys for Natural Language Processing (SOS4NLP)

zhijing-jin A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

GEM-benchmark The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language

abhishekkrthakur Approaching (Almost) Any Natural Language Processing Problem

Aryagm Stocksent is a Python library for sentiment analysis of various tickers from the latest news from trusted sources. It also has options for plotting results.

IBM resources for the IBM Airlines Table-Question-Answering Benchmark

eubinecto A BERT-based reverse-dictionary of Korean proverbs

jaywalnut310 VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

asteroid-team Asteroid is a Pytorch-based audio source separation toolkit that enables fast experimentation on common datasets. It comes with a source code that supports a large range of datasets and architectures, and a set of recipes to repro

instadeepai TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identifica

google-research-datasets TimeDial presents a crowdsourced English challenge set, for temporal commonsense reasoning, formulated as a multiple choice cloze task with around 1.5k carefully curated dialogs. The dataset is derived from the DailyDialog (Li et

yangjianxin1 CPM(Chinese Pretrained Models)模型是北京智源人工智能研究院和清华大学发布的中文大规模预训练模型。官方发布了三种规模的模型,参数量分别为109M、334M、2.6B,用户需申请与通过审核,方可下载。 由于原项目需要考虑大模型的训练和使用,需要安装较为复杂的环境依赖,使用上也较为复杂。 本项目采用了109M的CPM模型(若资源允许也可以考虑334M的模型),并且简化了模型的训练和使用。

rahuln Using pretrained language models for biomedical knowledge graph completion.

JerichoWorld Dataset corresponding to the paper Modeling Worlds in Text

PrithivirajDamodaran A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more.For instance, understand What makes text formal or casual

teddylee777 캐글 노트북(Kaggle Notebook) 커널로도 유명한 도커인 Kaggle/docker-python의 GPU Docker(gpu.Dockerfile)를 기반으로 구성하였습니다. Kaggle에서 공개한 도커 이미지는 한글 폰트, 자연어처리 패키지, 형태소 분석기 등이 누락되어 있습니다.

shahules786 Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in these times where decisions and reactions are created and updated in seconds. But, which words actually lead t

graph4ai This repo is to present various code demos on how to use our Graph4NLP library.

Zasder3 A PyTorch Lightning solution to training CLIP from scratch.

sinanuozdemir This repository contains code for the O'Reilly Live Online Training for BERT

lucidrains Implementation of ProteinBERT in Pytorch

graph4ai Graph4NLP is an easy-to-use library for R&D at the intersection of Deep Learning on Graphs and Natural Language Processing

graph4ai This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

TsinghuaAI Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码,MoE 模型的预训练代码请切换到 moe 分支

PrithivirajDamodaran A framework for detecting, highlighting and correcting grammatical errors on natural language text.

salesforce This repository maintains the QAConv dataset, a question-answering dataset on informative conversations including business emails, panel discussions, and work channels.