Natural Language Processing

Libraries that specialize in processing text.

pemistahl Quick Info this library tries to solve language detection of very short words and phrases, even shorter than tweets makes use of both statistical and rule-based approaches outperforms Apache Tika, Apach

scalanlp Chalk NOTE: This project is currently dormant with no current prospect for further development. Suggestion: check out OpenNLP or StanfordNLP for the JVM or spaCy for Python. (If anyone would like to do something like sp

mimno Mallet Website: MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other

stanfordnlp Stanford CoreNLP Stanford CoreNLP provides a set of natural language analysis tools written in Java. It can take raw human language text input and give the base forms of words, their parts of speech, whether they ar

dkpro DKPro C4CorpusTools DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate removal, language detection, and near-duplicate removal.