Message-ID: <1462037825.5535.185.camel@edumazet-glaptop3.roam.corp.google.com>
Subject: Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>, LKML <linux-kernel@vger.kernel.org>,
        Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
        Darren Hart <darren@dvhart.com>,
        Michael Kerrisk <mtk.manpages@googlemail.com>,
        Davidlohr Bueso <dave@stgolabs.net>, Chris Mason <clm@fb.com>,
        "Carlos O'Donell" <carlos@redhat.com>,
        Torvald Riegel <triegel@redhat.com>,
        Eric Dumazet <edumazet@google.com>
Date: Sat, 30 Apr 2016 10:37:05 -0700
In-Reply-To: <CA+55aFyncfSsV=bmqS4gHAiKg8GwQEnjB2Fpn0OPqozm5QsxmA@mail.gmail.com>
References: <20160428161742.363543816@linutronix.de>
	 <20160428163525.495514517@linutronix.de>
	 <CA+55aFzMtEVc_5QS8EUvdKYfNMmJ9fpgcM4iEhdwaY1Oe=JYmw@mail.gmail.com>
	 <alpine.DEB.2.11.1604290119320.3718@nanos>
	 <CA+55aFys0Qtca-dwSVMDf0q4UEsXteFtCNcLsi=9rOqLr2b7ZQ@mail.gmail.com>
	 <alpine.DEB.2.11.1604301432570.3692@nanos>
	 <1462034711.5535.169.camel@edumazet-glaptop3.roam.corp.google.com>
	 <CA+55aFyncfSsV=bmqS4gHAiKg8GwQEnjB2Fpn0OPqozm5QsxmA@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2806
Lines: 76

On Sat, 2016-04-30 at 10:12 -0700, Linus Torvalds wrote:
> On Sat, Apr 30, 2016 at 9:45 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > I use hash_32() in net/sched/sch_fq.c, for all packets sent by Google
> > servers. (Note that I did _not_ use hash_ptr())
> >
> > That's gazillions of packets per second, and the current multiply worked
> > just fine in term of hash spreading.
> 
> So hash_32() really is much better than hash_64(). I think we'll tweak
> it a bit, but largely leave it alone.
> 
> The 64-bit case needs to be tweaked a _lot_.

Agreed ;)

> 
> For the 32-bit case, I like the one that George Spelvin suggested:
> 
>    #define GOLDEN_RATIO_32 0x61c88647      /* phi^2 = 1-phi */
> 
> because of his slow multiplier fallback version that we could also use:
> 
>   /* Returns x * GOLDEN_RATIO_32 without a hardware multiplier */
>   unsigned hash_32(unsigned x)
>   {
>           unsigned y, z;
>                                 /* Path length */
>           y = (x << 19) + x;      /* 1 shift + 1 add */
>           z = (x << 9) + y;       /* 1 shift + 2 add */
>           x = (x << 23) + z;      /* 1 shift + 3 add */
>           z = (z << 8) + y;       /* 2 shift + 3 add */
>           x = (x << 6) - x;       /* 2 shift + 4 add */
>           return (z << 3) + x;    /* 3 shift + 4 add */
>   }
> 
> and I don't think that we really need the several big constants with
> the fancy "full cascade" function.
> 
> If you have a test-case for that sch_fq.c case, it might be a good
> idea to test the above GOLDEN_RATIO_32 value, but quite frankly, I
> don't see any way it would be materially different from the one we use
> now. It does avoid that long series of zeroes in the low bits, but
> that's actually not a huge problem for the 32-bit hash to begin with.
> It's not nearly as long a series (or in the wrong bit positions) as
> the 64-bit hash multiplier value had.
> 
> Also, I suspect that since you hash the kernel "struct sock" pointers,
> you actually never get the kinds of really bad patterns that Thomas
> had.
> 
> But maybe you use hash_32() on a pointer because you noticed that
> hash_long() or hash_ptr() (which use hash_64 on 64-bit architectures,
> and would have been more natural) gave worse performance?

Not at all.

At the time I did sch_fq (for linux-3.12), hash_64() was not using a
multiply yet, but this long series of shifts/add/sub

I used hash_32() because it was simply faster on my servers.

You added this multiply in linux-3.17, and I did not noticed at that
time.


> 
> Maybe you thought that it was the bigger multiply that caused the
> performance problems? If you did performance work, I suspect it really
> could have been that hash_64() did a bad job for you.

Really not ;)

I could test hash_64(), more entropy can not be bad.