Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752416AbcD2EMl (ORCPT ); Fri, 29 Apr 2016 00:12:41 -0400 Received: from ns.horizon.com ([71.41.210.147]:13207 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751469AbcD2EMk (ORCPT ); Fri, 29 Apr 2016 00:12:40 -0400 Date: 29 Apr 2016 00:12:36 -0400 Message-ID: <20160429041236.6211.qmail@ns.horizon.com> From: "George Spelvin" To: linux@horizon.com, torvalds@linux-foundation.org Subject: Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3196 Lines: 99 Linus wrote: > Having looked around at other hashes, I suspect we should look at the > ones that do five or six shifts, and a mix of add/sub and xor. And > because they shift the bits around more freely you don't have the > final shift (that ends up being dependent on the size of the target > set). I'm not sure that final shift is a problem. You need to mask the result to the desired final size somehow, and a shift is no more cycles than an AND. > It really would be lovely to hear that we can just replace > hash_int/long() with a better hash. And I wouldn't get too hung up on > the multiplication trick. I suspect it's not worth it. My main concern is that the scope of the search grows enormously if we include such things. I don't want to discourage someone from looking, but I volunteered to find a better multiplication constant with an efficient add/subtract chain, not start a thesis project on more general hash functions. Two places one could look for ideas, though: http://www.burtleburtle.net/bob/hash/integer.html https://gist.github.com/badboy/6267743 Here's Thomas Wang's 64-bit hash, which is reputedly quite good, in case it helps: uint64_t hash(uint64_t key) { key = ~key + (key << 21); // key = (key << 21) - key - 1; key ^= key >> 24; key += (key << 3)) + (key << 8); // key *= 265 key ^= key >> 14; key += (key << 2)) + (key << 4); // key *= 21 key ^= key >> 28; key += key << 31; return key; } And his slightly shorter 64-to-32-bit function: unsigned hash(uint64_t key) { key = ~key + (key << 18); // key = (key << 18) - key - 1; key ^= key >> 31; key *= 21; // key += (key << 2)) + (key << 4); key ^= key >> 11; key += key << 6; key ^= key >> 22; return (uint32_t)key; } Sticking to multiplication, using the heuristics in the current comments (prime near golden ratio = 9e3779b9 = 2654435769,) I can come up with this for multiplying by 2654435599 = 0x9e37790f: // ----------------------------------------------------------------------------- // This code was generated by Spiral Multiplier Block Generator, www.spiral.net // Copyright (c) 2006, Carnegie Mellon University // All rights reserved. // The generated code is distributed under a BSD style license // (see http://www.opensource.org/licenses/bsd-license.php) // ----------------------------------------------- // Cost: 6 adds/subtracts 6 shifts 0 negations // Depth: 5 // Input: // int t0 // Outputs: // int t1 = 2654435599 * t0 // ----------------------------------------------- t3 = shl(t0, 11); /* 2048*/ t2 = sub(t3, t0); /* 2047*/ t5 = shl(t2, 8); /* 524032*/ t4 = sub(t5, t2); /* 521985*/ t7 = shl(t0, 25); /* 33554432*/ t6 = add(t4, t7); /* 34076417*/ t9 = shl(t0, 9); /* 512*/ t8 = sub(t9, t0); /* 511*/ t11 = shl(t6, 4); /* 545222672*/ t10 = sub(t11, t6); /* 511146255*/ t12 = shl(t8, 22); /* 2143289344*/ t1 = add(t10, t12); /* 2654435599*/ Which translates into C as uint32_t multiply(uint32_t x) { unsigned y = (x << 11) - x; y -= y << 8; y -= x << 25; x -= x << 9; y -= y << 4; y -= x << 22; return y; } Unfortunately, that utility bogs like hell on 64-bit constants.