Tantivy is a full text search engine library written in Rust.
Tantivy is, in fact, strongly inspired by Lucene's design.
The following benchmark break downs performance for different type of queries / collection.
In general, Tantivy tends to be
- slower than Lucene on union with a Top-K due to Block-WAND optimization.
- faster than Lucene on intersection and phrase queries.
Your mileage WILL vary depending on the nature of queries and their load.
- Full-text search
- Configurable tokenizer (stemming available for 17 Latin languages with third party support for Chinese (tantivy-jieba and cang-jie), Japanese (lindera and tantivy-tokenizer-tiny-segmente) and Korean (lindera + lindera-ko-dic-builder)
- Fast (check out the
🐎 ✨benchmark ✨ 🐎)
- Tiny startup time (<10ms), perfect for command line tools
- BM25 scoring (the same as Lucene)
- Natural query language (e.g.
(michael AND jackson) OR "king of pop")
- Phrase queries search (e.g.
- Incremental indexing
- Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
- Mmap directory
- SIMD integer compression when the platform/CPU includes the SSE2 instruction set
- Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
- Text, i64, u64, f64, dates, and hierarchical facet fields
- LZ4 compressed document store
- Range queries
- Faceted search
- Configurable indexing (optional term frequency and position indexing)
- Cheesy logo with a horse
- Distributed search is out of the scope of Tantivy. That being said, Tantivy is a library upon which one could build a distributed search. Serializable/mergeable collector state for instance, are within the scope of Tantivy.
Tantivy works on stable Rust (>= 1.27) and supports Linux, MacOS, and Windows.
- Tantivy's simple search example
- tantivy-cli and its tutorial -
tantivy-cliis an actual command line interface that makes it easy for you to create a search engine, index documents, and search via the CLI or a small server with a REST API. It walks you through getting a wikipedia search engine up and running in a few minutes.
- Reference doc for the last released version
How can I support this project?
There are many ways to support this project.
- Use Tantivy and tell us about your experience on Gitter or by email ([email protected])
- Report bugs
- Write a blog post
- Help with documentation by asking questions or submitting PRs
- Contribute code (you can join our Gitter)
- Talk about Tantivy around you
- Drop a word on on or even
We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.
Clone and build locally
Tantivy compiles on stable Rust but requires
Rust >= 1.27. To check out and run tests, you can simply run:
git clone https://github.com/tantivy-search/tantivy.git cd tantivy cargo build
Some tests will not run with just
cargo test because of
fail-rs. To run the tests exhaustively, run
You might find it useful to step through the programme with a debugger.
A failing test
Make sure you haven't run
cargo clean after the most recent
cargo test or
cargo build to guarantee that the
target/ directory exists. Use this bash script to find the name of the most recent debug build of Tantivy and run it under
find target/debug/ -maxdepth 1 -executable -type f -name "tantivy*" -printf '%TY-%Tm-%Td %TT %p\n' | sort -r | cut -d " " -f 3 | xargs -I RECENT_DBG_TANTIVY rust-gdb RECENT_DBG_TANTIVY
Now that you are in
rust-gdb, you can set breakpoints on lines and methods that match your source code and run the debug executable with flags that you normally pass to
cargo test like this:
$gdb run --test-threads 1 --test $NAME_OF_TEST
rustc compiles everything in the
examples/ directory in debug mode. This makes it easy for you to make examples to reproduce bugs:
rust-gdb target/debug/examples/$EXAMPLE_NAME $ gdb run