Natural Language Processing

Libraries for working with human languages.

Newest releases

sberbank-ai Russian GPT trained with 2048 context length (ruGPT2048), Russian GPT3 large (ruGPT3Large) trained with 1024 context length and Russian GPT Medium trained with context 2048 (ruGPT3Medium2048).
 

PluviophileYU Code for "Counterfactual Variable Control for Robust and Interpretable Question Answering"
 

jackaduma This code is a PyTorch implementation for paper: CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion, a nice work on Voice-Conversion/Voice Cloning.
 

yumeng5 [EMNLP 2020] Text Classification Using Label Names Only: A Language Model Self-Training Approach
 

gsarti A 🤗-style implementation of BERT using lambda layers instead of self-attention
 

lxk00 An implementation with PyTorch of model presented in the paper "BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance
 

wangle1218 implementation several deep text match (text similarly) models for keras . cdssm, arc-ii,match_pyramid, mvlstm ,esim, drcn ,bimpm, bert, albert, raberta
 

didi Shared tasks, datasets and state-of-the-art results for Chinese Natural Language Processing (NLP)
 

dmis-lab This repository provides the PyTorch implementation of BioBERT. You can easily use BioBERT with transformers. This project is supported by the members of DMIS-Lab @ Korea University including Jinhyuk Lee, Wonjin Yoon, Minbyul Jeon
 

BADBADBADBOY 基于pytorch的ocr算法库,包括 psenet, pan, dbnet, sast , crnn
 

wyu97 Here is a list of recent publications about Knowledge-enhanced text generation.
 

nateshmbhat pyttsx3 is a offline text-to-speech conversion library in Python. Unlike alternative libraries, it works offline.
 

mlech26l Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence
 

facebookresearch SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in combination with self-training and knowledge-distillation, or for retrieving paraphrases.
 

artitw Text2Text: generate questions and summaries for your texts
 

spotify Klio is an ecosystem that allows you to process audio files – or any binary files – easily and at scale.
 

yya518 FinBERT is a BERT model pre-trained on financial communication text. The purpose is to enhance finaincal NLP research and practice. It is trained on the following three finanical communication corpus. The total corpora size is 4.9
 

indobenchmark The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained models, and a starter code! (AACL-IJCNLP 2020)
 

Debdut A Global Exhaustive First and Last Name Database
 

easonnie What Can We Learn from Collective HumAn OpinionS on Natural Language Inference Data (ChaosNLI)?
 

wenhuchen This respository contains the dataset used in "Open Table-and-Text Question Answering" and the baseline code for the dataset (OTT-QA). This dataset contains open questions which require retrieving tables and text from the web to a
 

ACM-VIT Getting Shakespeare into the Modern Era with the magic of NLP
 

kakaobrain The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)
 

CogStack MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS. Preprint arXiv.
 

MaartenGr BERTopic is a topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions.
 

bojone 基于BERT的无监督分词和句法分析
 

prajjwal1 A deep learning library based on Pytorch focussed on low resource language research and robustness
 

keras-team Layers are the fundamental building blocks for NLP models. They can be used to assemble new layers, networks, or models.
 

EdinburghNLP OPUS-100 is an English-centric multilingual corpus covering 100 languages. It was randomly sampled from the OPUS collection
 

EssayKillerBrain 基于开源GPT2.0的初代创作型人工智能 | 可扩展、可进化
 

qingkongzhiqian 基于GPT2的中文摘要生成模型
 

barissayil Sentiment analysis neural network trained by fine-tuning BERT, ALBERT, or DistilBERT on the Stanford Sentiment Treebank.
 

sshin23 MadNLP is a nonlinear programming (NLP) solver, purely implemented in Julia. MadNLP implements a filter line-search algorithm, as that used in Ipopt. MadNLP seeks to streamline the development of modeling and algorithmic paradigms