HPPC: High Performance Primitive Collections -------------------------------------------- Collections of primitive types (maps, sets, stacks, lists) with open internals and an API twist (no java.util.collections.* compatibility). See the following for more information: Wiki: https://github.com/carrotsearch/hppc/wiki Bugs: http://issues.carrot2.org/browse/HPPC/ See ALTERNATIVES.txt if you're just shopping around. See LICENSE.txt to make your company's lawyer happy. See CHANGES.txt for API changes and updates. (c) Carrot Search s.c., http://carrotsearch.com/
High Performance Primitive Collections for Java
HPPC: High Performance Primitive Collections -------------------------------------------- Collections of primitive types (maps, sets, stacks, lists) with open internals and an API twist (no java.util.collections.* compatibility). See tSample code for https://issues.carrot2.org/browse/HPPC-186
WormMap implements the same interfaces as HashMap. But not all of the methods function in the same way.
- Currently, there is no key distribution visualization, so visualizeKeyDistribution() throws an exception if called.
- WormMap doesn't have a constant load factor, this is why the ensureCapacity() method is not 100% accurate.
Some of the methods that don't exist in hppc HashMap:
- remove(KType key, VType value)
- removeValue(VType value)
- containsValue(VType value)
- trimToSize()
- getLoad()
add the maven-bundle-plugin to make hppc-collections a valid OSGI bundle
Create Interface Accountable
and helper class RamUsageEstimator
, most of them forked from Lucene.
Then estimate the memory usage using those utils
I implemented robin hood hashing using the existing open hash map and the algorithm discussed here: http://sebastiansylvan.com/2013/05/08/robin-hood-hashing-should-be-your-default-hash-table-implementation/.
Removes are still done via the shiftConflicingKeys method. As discussed here http://sebastiansylvan.com/2013/08/05/more-on-robin-hood-hashing-2/ doing tombstones leads to long probe lengths when there are many removes and inserts of the same key. This shows up pretty well in the test suite. The existing shift scheme seems to be correct so I simply reused that.
Would like to do some performance testing and some comparison of probe length variance between this and the open hash map implementation.
There are many Java libraries licensed under "Apache License, Version 2.0" that do not use its official spelling. This causes issues like https://issues.apache.org/jira/browse/MPIR-382: with every library defining its own spelling, it's difficult in large projects to have a clear view of all licenses in use. This PR changes the license spelling to the official one, as advised by Maven developers.
We have intergrated hppc and made good success after setting a constant HashOrderMixing as we really need iteration to be deterministic. Now I think we had a chat where you recommended against such a solution, unfortunately I cannot find it anymore.
But we found some more explanation here.
There are still a few things that I do not understand and maybe you can help us :) ?
Why is this (partial) copying into the same implementation such a big problem that the design of the collection is heavily influenced? It seems you even plan to introduce an iterationSeed in 0.9?
Is the deterministic iteration really the problem and not something "good"? Wouldn't it be better to solve the problem when adding many elements e.g. a special "addAll" method that takes the collection and the element count or some kind of an "Acceptor" or "Filter"? Then it can internally handle the cluster problem when copying into other collections and you can still maintain deterministic iteration.
We probably have too seldom uses where we (partially) copy maps into other maps, but when exactly are those clusters created? Because as soon as the map has a different size the hashing should be different and the clustering should be less of a problem (?)
0.9.0.RC2(Dec 14, 2020)
Release candidate preview for 0.9.0.
New features and API changes
- GH-13: Add automatic module name to the generated JAR's manifest ("com.carrotsearch.hppc")
Improvements
- HPPC-191: Improve Accountable implementation (Haoyu Zhai)
0.9.0.RC1(Sep 8, 2020)
Release candidate preview for 0.9.0.
New features and API changes
-
HPPC-179: Update java template parser to support Java 8.
-
HPPC-186: A different strategy has been implemented for collision avalanche avoidance. This results in removal of Scatter* maps and sets and their unification with their Hash* counterparts. This change should not affect any existing code unless it relied on static, specific ordering of keys. A side effect of this change is that key/value enumerators will return a different ordering of their container's values on each invocation. If your code relies on the order of values in associative arrays, it must order them after they are retrieved. (Bruno Roustant).
-
HPPC-176: A new set of associative containers implementing Worm Hashing has been added. This strategy is appropriate for a medium sized maps and sets (less than 2M entries). It takes more time to put entries in the map because it maintains chains of entries having the same hash. Then the lookup speed is fast even if the map is heavy loaded or hashes are clustered. On average it takes slightly less memory than KTypeVTypeHashMap: even though it allocates more data structures, the reasonable load factor is higher (it varies around 80%) so containers enlarge later. (Bruno Roustant, Aleksandr Danilin).
Improvements
-
HPPC-183: Simplify IndirectSort comparator to use IntBinaryOperator.
-
HPPC-177: Modernize the build system to gradle and make it work with IntelliJ.
-
HPPC-184: Use closures where possible to make the resulting JAR smaller.
Bugs
- HPPC-187: ObjectIdentityHashSet redistributes keys according to key.hashCode rather than object identity's hash code. (Bruno Roustant).
0.8.2(Jun 1, 2020)
This release adds utility methods for estimating allocated and used memory.
- HPPC-175: Method estimating memory usage (Haoyu Zhai)
0.8.1(May 24, 2018)
A bug fix release removing Intrisics class that somehow got included in the 0.8.0.
0.8.0(May 24, 2018)
This is a maintenance release that drops the esoteric JARA completely, updates and modernizes project build process and brings a contractual improvement to map.get (return of default values is now guaranteed).
0.7.3(Oct 22, 2017)
This is a bugfix release improving performance and documentation.
Check out the changelog.
Category: Java / High Performance |
Watchers: 60 |
Star: 750 |
Fork: 117 |
Last update: Feb 24, 2021 |