Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752714AbcD1Scu (ORCPT ); Thu, 28 Apr 2016 14:32:50 -0400 Received: from mail-ig0-f195.google.com ([209.85.213.195]:36749 "EHLO mail-ig0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751920AbcD1Scs (ORCPT ); Thu, 28 Apr 2016 14:32:48 -0400 MIME-Version: 1.0 In-Reply-To: <20160428163525.495514517@linutronix.de> References: <20160428161742.363543816@linutronix.de> <20160428163525.495514517@linutronix.de> Date: Thu, 28 Apr 2016 11:32:47 -0700 X-Google-Sender-Auth: 4veNW0bv5bsMCjKgahGxMm23-sM Message-ID: Subject: Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism From: Linus Torvalds To: Thomas Gleixner Cc: LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Sebastian Andrzej Siewior , Darren Hart , Michael Kerrisk , Davidlohr Bueso , Chris Mason , "Carlos O'Donell" , Torvald Riegel , Eric Dumazet Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1379 Lines: 32 On Thu, Apr 28, 2016 at 9:42 AM, Thomas Gleixner wrote: > hash_long/hash_ptr() provide really bad hashing for small hash sizes and for > cases where the lower 12 bits (Page size aligned) of the hash value are 0. Hmm. hash_long/ptr really shouldn't care about the low bits being zero at all, because it should mix in all the bits (using a prime multiplier and taking the high bits). That said, numbers rule, so clearly we need to do something. It does strike me that we would be better off just trying to improve hash_long(). In particular, there are people and projects that have worked on nothing but hashing. I'm not sure we should add a new hash algorithm even if the whole "modulo prime" sounds obviously fine in theory. For example, your 64-bit code has obvious problems if there are common patterns in the low and the high 32 bits.. Not a problem for something like hash_ptr(), but it can certainly be a problem for other cases. It would be a really good idea to have some real hard numbers on the hashing in general, but _particularly_ so if/when we start adding new ones. Have you tested the modulus version with SMhasher, for example? For example, there's Thomas Wang's hash function which should cascade all the bits. I'd really hate to add *another* ad-hoc hash when the previous ad-hoc hash has been shown to be bad. Linus