Pure Ruby implementation of the Cuckoo Filter - a probabilistic datastructure which is objectively better than Bloom Filters for set-membership queries.
What the heck is a Cuckoo Filter?
It is a probabilistic data structure which is used to determine set-membership, i.e. finding out if a given element exists in a given set.
For practical uses think - checking if an item is present in a cache, if it is present in a database or YouTube trying to figure out if you have already watched some video before it is recommended to you.
It is closely related to Bloom Filters - but outshines them in performance and space efficiency. You can read more from the original paper that introduced it in 2014. Yes, it is that young!
Add this line to your application's Gemfile:
And then execute:
Or install it yourself as:
$ gem install cuckoo_filter
# Create a filter with 1024 (next power of 2) slots with each bucket holding 4 cf = CuckooFilter.make(size: 1000, kicks: 500, bucket_size: 4) # => returns a CuckooFilter::CuckooFilter instance # Insert an element into the filter cf.insert("foo") # => true # Lookup the existence of an element cf.lookup("foo") # => true cf.lookup("bar") # => false # Delete an existing element cf.delete("foo") # => true
Frequenty Anticipated Questions
Q: Is this useful?
A: Yes, but have a look at the benchmarks below to see if it satisfies your performance requirements before using it in a real world app.
Q: Why did you make this?
A: For fun and education, of course! Although that is not stopping it from having any real world usage!
Q: But why Ruby? Why not Go/Rust/Elixir/FooBar
A: Because why not? I like Ruby and I couldn't find a full blown implementation in it.
Q: Can I use it in a real project?
A: If it satisfies your criteria, then why not? Have a look at the benchmarks for performance stats. Let me know if you do!
You can run the benchmark script to see the iterations per second performance of different methods on an initially half full filter.
$ ruby test/benchmark.rb Setting up for benchmarking... Done. Warming up -------------------------------------- # warmup stats -> can be ignored Calculating ------------------------------------- Iterations per second - Insertions 214.863M (±47.1%) i/s - 1.214B in 9.859735s Iterations per second - Lookups 355.392M (±10.9%) i/s - 3.371B in 9.697599s Iterations per second - Deletions 343.890M (± 9.6%) i/s - 3.286B
After checking out the repo, run
bin/setup to install dependencies. Then, run
rake test to run the tests. You can also run
bin/console for an interactive prompt that will allow you to experiment.
Bug reports and pull requests are welcome on GitHub at https://github.com/pawandubey/cuckoo_filter.
Copyright 2017 Pawan Dubey
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.