Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758666AbZAWDzU (ORCPT ); Thu, 22 Jan 2009 22:55:20 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754547AbZAWDzH (ORCPT ); Thu, 22 Jan 2009 22:55:07 -0500 Received: from mx1.suse.de ([195.135.220.2]:50799 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751076AbZAWDzG (ORCPT ); Thu, 22 Jan 2009 22:55:06 -0500 Date: Fri, 23 Jan 2009 04:55:03 +0100 From: Nick Piggin To: Hugh Dickins Cc: Pekka Enberg , Linux Memory Management List , Linux Kernel Mailing List , Andrew Morton , Lin Ming , "Zhang, Yanmin" , Christoph Lameter Subject: Re: [patch] SLQB slab allocator Message-ID: <20090123035503.GD20098@wotan.suse.de> References: <20090121143008.GV24891@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5503 Lines: 140 On Wed, Jan 21, 2009 at 06:10:12PM +0000, Hugh Dickins wrote: > On Wed, 21 Jan 2009, Nick Piggin wrote: > > > > Since last posted, I've cleaned up a few bits and pieces, (hopefully) > > fixed a known bug where it wouldn't boot on memoryless nodes (I don't > > have a system to test with), and improved performance and reduced > > locking somewhat for node-specific and interleaved allocations. > > I haven't reviewed your postings, but I did give the previous version > of your patch a try on all my machines. Some observations and one patch. Great, thanks! > I was initially _very_ impressed by how well it did on my venerable > tmpfs loop swapping loads, where I'd expected next to no effect; but > that turned out to be because on three machines I'd been using SLUB, > without remembering how default slub_max_order got raised from 1 to 3 > in 2.6.26 (hmm, and Documentation/vm/slub.txt not updated). > > That's been making SLUB behave pretty badly (e.g. elapsed time 30% > more than SLAB) with swapping loads on most of my machines. Though > oddly one seems immune, and another takes four times as long: guess > it depends on how close to thrashing, but probably more to investigate > there. I think my original SLUB versus SLAB comparisons were done on > the immune one: as I remember, SLUB and SLAB were equivalent on those > loads when SLUB came in, but even with boot option slub_max_order=1, > SLUB is still slower than SLAB on such tests (e.g. 2% slower). > FWIW - swapping loads are not what anybody should tune for. Yeah, that's to be expected with higher order allocations I think. Does your immune machine simply have fewer CPUs and thus doesn't use such high order allocations? > So in fact SLQB comes in very much like SLAB, as I think you'd expect: > slightly ahead of it on most of the machines, but probably in the noise. > (SLOB behaves decently: not a winner, but no catastrophic behaviour.) > > What I love most about SLUB is the way you can reasonably build with > CONFIG_SLUB_DEBUG=y, very little impact, then switch on the specific > debugging you want with a boot option when you want it. That was a > great stride forward, which you've followed in SLQB: so I'd have to > prefer SLQB to SLAB (on debuggability) and to SLUB (on high orders). It is nice. All credit to Christoph for that (and the fine grained sysfs code). > I do hate the name SLQB. Despite having no experience of databases, > I find it almost impossible to type, coming out as SQLB most times. > Wish you'd invented a plausible vowel instead of the Q; but probably > too late for that. Yeah, apologies for the name :P > init/Kconfig describes it as "Qeued allocator": should say "Queued". Thanks. > Documentation/vm/slqbinfo.c gives several compilation warnings: > I'd rather leave it to you to fix them, maybe the unused variables > are about to be used, or maybe there's much worse wrong with it > than a few compilation warnings, I didn't investigate. OK. > The only bug I found (but you'll probably want to change the patch > - which I've rediffed to today's slqb.c, but not retested). > > On fake NUMA I hit kernel BUG at mm/slqb.c:1107! claim_remote_free_list() > is doing several things without remote_free.lock: that VM_BUG_ON is unsafe > for one, and even if others are somehow safe today, it will be more robust > to take the lock sooner. Good catch, thanks. The BUG should be OK where it is if we only claim the remote free list when remote_free_check is is set, but some of the periodic reaping and teardown code calls it unconditionally. But it's not critical so it should definitely go inside the lock. > I moved the prefetchw(head) down to where we know it's going to be the head, > and replaced the offending VM_BUG_ON by a later WARN_ON which you'd probably > better remove altogether: once we got the lock, it's hardly interesting. Right, I'll probably do that. Thanks! > Signed-off-by: Hugh Dickins > --- > > mm/slqb.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) > > --- slqb/mm/slqb.c.orig 2009-01-21 15:23:54.000000000 +0000 > +++ slqb/mm/slqb.c 2009-01-21 15:32:44.000000000 +0000 > @@ -1115,17 +1115,12 @@ static void claim_remote_free_list(struc > void **head, **tail; > int nr; > > - VM_BUG_ON(!l->remote_free.list.head != !l->remote_free.list.tail); > - > if (!l->remote_free.list.nr) > return; > > + spin_lock(&l->remote_free.lock); > l->remote_free_check = 0; > head = l->remote_free.list.head; > - /* Get the head hot for the likely subsequent allocation or flush */ > - prefetchw(head); > - > - spin_lock(&l->remote_free.lock); > l->remote_free.list.head = NULL; > tail = l->remote_free.list.tail; > l->remote_free.list.tail = NULL; > @@ -1133,9 +1128,15 @@ static void claim_remote_free_list(struc > l->remote_free.list.nr = 0; > spin_unlock(&l->remote_free.lock); > > - if (!l->freelist.nr) > + WARN_ON(!head + !tail != !nr + !nr); > + if (!nr) > + return; > + > + if (!l->freelist.nr) { > + /* Get head hot for likely subsequent allocation or flush */ > + prefetchw(head); > l->freelist.head = head; > - else > + } else > set_freepointer(s, l->freelist.tail, head); > l->freelist.tail = tail; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/