Sentence Embeddings with BERT & XLNet

Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch BERT / RoBERTa / XLM-RoBERTa produces out-of-the-box rather bad sentence embeddings. This repository fine-tunes BER
Information
Category: Python / Natural Language Processing
Watchers: 120
Star: 10.6k
Fork: 2k
Last update: May 23, 2023

Related Repos



youzanai T'rex Park(霸王龙公园) Trexpark项目由有赞数据智能团队开源,是国内首个基于电商大数据训练的开源NLP和图像项目。我们预期将逐步开放基于商品标题,评论,客服对话等NLP语聊,以及商品主图,品牌logo等进行预训练的NLP和图像模型。 为什么是霸王龙? 霸王龙是有赞的吉祥物。呃,准确
 

quoll remorse Clojure to morse code conversion Usage Dependencies This can be included in deps.edn with the following entry in the :deps map: com.github.quo
 

dakrone Clojure library interface to OpenNLP - https://opennlp.apache.org/ A library to interface with the OpenNLP (Open Natural Language Processing) library
 

JuliaText WordTokenizers Some basic tokenizers for Natural Language Processing. Installation: As per standard Julia package installation: pkg> add WordTokenizer
 

JuliaText CorpusLoaders A collection of various means for loading various different corpora used in NLP. Installation As per the standard Julia package installa
 

lancopku pkuseg:一个多领域中文分词工具包 (English Version) pkuseg 是基于论文[Luo et. al, 2019]的工具包。其简单易用,支持细分领域分词,有效提升了分词准确度。
 

machinalis __ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/
 

machinalis About Yalign is a tool for extracting parallel sentences from comparable corpora. Statistical Machine Translation relies on parallel corpora (eg.. eur
 

isnowfy SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob
 

columbia-applied-data-science Rosetta Tools for data science with a focus on text processing. Focuses on "medium data", i.e. data too big to fit into memory but too small to necess
 

proycon This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++
 

proycon Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging
 

chartbeat-labs textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig
 

gugarosa NALP: Natural Adversarial Language Processing Welcome to NALP. Have you ever wanted to create natural text from raw sources? If yes, NALP is for you!
 

GrowingGit GitHub English Top Charts 「Help you discover excellent English projects and get rid of disturbing by other spoken language.」 Features • Definition of