Scott Shipp : Notes on Hash Buckets

Hash buckets and % vs. & ops

FROM SEAJUG MAILING LIST
        ------------------------------->
        Re: hashmap in java7?

        Stuart Maclean (<<redacted&rt;&rt;)
        Schedule cleanup
        11:39 AM
        [Keep this message at the top of your inbox]
        Groups
        To: seajug@yahoogroups.com
        Picture of Stuart Maclean

        In the old days, the bucket count for a map was advised to be prime,
        e.g. C = 101, so that after the hash() method was called on the object
        to be either inserted or lookup, you would get a hopefully 'uniform'
        distribution of bucketIDs, since the hash value mod C and bucket count C
        would be co-prime. 101 was chosen i think because if you had to
        increase capacity C, then C = 2C + 1 would still be prime for C = 203, 407.

        Then, in Java 7 I think, it was realised that the expensive operation is
        the mod operation needed in identifying the bucket. It would be MUCH
        cheaper if C was a power of 2, since it reduces this

        int bucket = hash % C

        to this

        int bucket = hash & (C-1)

        which is is much more 'machine-oriented'. I can vouch for this since
        one search problem I had was reduced from 2.5 hours to 1.5 hours solely
        by replacing a 'mod' with an 'and', where the operation was called
        millions of times in an inner loop.

        Of course this may not be relevant to the current discussion but I
        thought it worth sharing...

        Stuart
$ cd -