Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752825AbcD2CZI (ORCPT ); Thu, 28 Apr 2016 22:25:08 -0400 Received: from mail-io0-f194.google.com ([209.85.223.194]:36677 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752244AbcD2CZG (ORCPT ); Thu, 28 Apr 2016 22:25:06 -0400 MIME-Version: 1.0 In-Reply-To: References: <20160428161742.363543816@linutronix.de> <20160428163525.495514517@linutronix.de> Date: Thu, 28 Apr 2016 19:25:04 -0700 X-Google-Sender-Auth: tX3Syy-bHtDOvAei5N2pix2JG_E Message-ID: Subject: Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism From: Linus Torvalds To: Thomas Gleixner Cc: LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Sebastian Andrzej Siewior , Darren Hart , Michael Kerrisk , Davidlohr Bueso , Chris Mason , "Carlos O'Donell" , Torvald Riegel , Eric Dumazet Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1494 Lines: 40 On Thu, Apr 28, 2016 at 4:26 PM, Thomas Gleixner wrote: > > I'll try to dig up some time to analyze the hash_long failure unless someone > familiar with the problem is beating me to it. I'm not sure you need to spend time analyzing failure: if you get bad hashing with hash_long(), then we know that is a bad hash without having to really try to figure out why. It's the hashes that _look_ like they might be good hashes, but there's not a lot of analysis behind it, that I would worry about. The simple prime modulus _should_ be fine, but at the same time I kind of suspect we can do better. Especially since it has two multiplications. Looking around, there's http://burtleburtle.net/bob/hash/integer.html and that 32-bit "full avalanche" hash in six shifts looks like it could be better. You wouldn't want to inline it, but the point of a full avalanche bit mixing _should_ be that you could avoid the whole "upper bits" part, and it should work independently of the target set size. So if that hash works better, it would be a pretty good replacement option for hash_int(). There is also https://gist.github.com/badboy/6267743 that has a 64 bit to 32 bit hash function that might be useful for "hash_long()". Most of the people who worry about hashes tend to have strings to hash, not just a single word like a pointer, but there's clearly people around who have tried to search for good hashes that really spread out the bits. Linus