Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752811AbcD2C5z (ORCPT ); Thu, 28 Apr 2016 22:57:55 -0400 Received: from ns.horizon.com ([71.41.210.147]:31774 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752455AbcD2C5y (ORCPT ); Thu, 28 Apr 2016 22:57:54 -0400 Date: 28 Apr 2016 22:57:51 -0400 Message-ID: <20160429025751.8368.qmail@ns.horizon.com> From: "George Spelvin" To: tglx@linutronix.de Subject: Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism Cc: linux@horizon.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1869 Lines: 60 Thomas Gleixner wrote: > I'm not a hashing wizard and I completely failed to understand why > hash_long/ptr are so horrible for the various test cases I ran. It's very simple: the constants chosen are bit-sparse, *particularly* in the least significant bits, and only 32/64 bits of the product are kept. Using the high-word of a double-width multiply is even better, but some machines (*cough* SPARCv9 *cough*) don't have hardware support for that. So what you get is: (0x9e370001 * (x << 12)) & 0xffffffff = (0x9e370001 * x & 0xfffff) << 12 = (0x70001 * x & 0xfffff) << 12 *Now* does it make sense? 64 bits is just as bad... 0x9e37fffffffc0001 becomes 0x7fffffffc0001, which is 2^51 - 2^18 + 1. The challenge is the !CONFIG_ARCH_HAS_FAST_MULTIPLIER case, when it has to be done with shifts and adds/subtracts. Now, what's odd is that it's only relevant for 64-bit platforms, and currently only x86 and POWER7+ have it. SPARCv9, MIPS64, ARM64, SH64, PPC64, and IA64 all have it turned off. Is this a bug that should be fixed? In fact, do *any* 64-bit platforms need multiply emulation? How many 32-bit platforms nead a multiplier that's easy for GCC to evaluate via shifts and adds? Generlly, by the time you've got a machine grunty enough to need 64 bits, a multiplier is quite affordable. Anyway, assuming there exists at least one platform that needs the shift-and-add sequence, it's quite easy to get a higher hamming weight, you just have to use a few more registers to save some intermediate results. E.g. u64 x = val, t = val, u; x <<= 2; u = x += t; /* val * 5 */ x <<= 4; /* val * 80 */ x -= u; /* val * 75 = 0b1001011 */ Shall I try to come up with something? Footnote: useful web pages on shift-and-add/subtract mutliplciation http://www.vinc17.org/research/mulbyconst/index.en.html http://www.spiral.net/hardware/multless.html