Hash buckets and % vs. & ops
FROM SEAJUG MAILING LIST
------------------------------->
Re: hashmap in java7?
Stuart Maclean (<<redacted&rt;&rt;)
Schedule cleanup
11:39 AM
[Keep this message at the top of your inbox]
Groups
To: seajug@yahoogroups.com
Picture of Stuart Maclean
In the old days, the bucket count for a map was advised to be prime,
e.g. C = 101, so that after the hash() method was called on the object
to be either inserted or lookup, you would get a hopefully 'uniform'
distribution of bucketIDs, since the hash value mod C and bucket count C
would be co-prime. 101 was chosen i think because if you had to
increase capacity C, then C = 2C + 1 would still be prime for C = 203, 407.
Then, in Java 7 I think, it was realised that the expensive operation is
the mod operation needed in identifying the bucket. It would be MUCH
cheaper if C was a power of 2, since it reduces this
int bucket = hash % C
to this
int bucket = hash & (C-1)
which is is much more 'machine-oriented'. I can vouch for this since
one search problem I had was reduced from 2.5 hours to 1.5 hours solely
by replacing a 'mod' with an 'and', where the operation was called
millions of times in an inner loop.
Of course this may not be relevant to the current discussion but I
thought it worth sharing...
Stuart