Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758138AbYAJSnj (ORCPT ); Thu, 10 Jan 2008 13:43:39 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755530AbYAJSnb (ORCPT ); Thu, 10 Jan 2008 13:43:31 -0500 Received: from waste.org ([66.93.16.53]:37016 "EHLO waste.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754616AbYAJSna (ORCPT ); Thu, 10 Jan 2008 13:43:30 -0500 Subject: Re: [RFC PATCH] greatly reduce SLOB external fragmentation From: Matt Mackall To: Linus Torvalds Cc: Pekka J Enberg , Christoph Lameter , Ingo Molnar , Hugh Dickins , Andi Kleen , Peter Zijlstra , Linux Kernel Mailing List In-Reply-To: References: <84144f020801021109v78e06c6k10d26af0e330fc85@mail.gmail.com> <1199314218.4497.109.camel@cinder.waste.org> <20080103085239.GA10813@elte.hu> <1199378818.8274.25.camel@cinder.waste.org> <1199419890.4608.77.camel@cinder.waste.org> <1199641910.8215.28.camel@cinder.waste.org> <1199906151.6245.57.camel@cinder.waste.org> <1199919548.6245.74.camel@cinder.waste.org> <1199987366.5331.92.camel@cinder.waste.org> Content-Type: text/plain; charset=utf-8 Date: Thu, 10 Jan 2008 12:42:45 -0600 Message-Id: <1199990565.5331.130.camel@cinder.waste.org> Mime-Version: 1.0 X-Mailer: Evolution 2.12.2 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3283 Lines: 65 On Thu, 2008-01-10 at 10:28 -0800, Linus Torvalds wrote: > > On Thu, 10 Jan 2008, Matt Mackall wrote: > > > > > > (I'm not a fan of slabs per se - I think all the constructor/destructor > > > crap is just that: total crap - but the size/type binning is a big deal, > > > and I think SLOB was naïve to think a pure first-fit makes any sense. Now > > > you guys are size-binning by just two or three bins, and it seems to make > > > a difference for some loads, but compared to SLUB/SLAB it's a total hack). > > > > Here I'm going to differ with you. The premises of the SLAB concept > > (from the original paper) are: > > I really don't think we differ. > > The advantage of slab was largely the binning by type. Everything else was > just a big crock. SLUB does the binning better, by really just making the > type binning be about what really matters - the *size* of the type. > > So my argument was that the type/size binning makes sense (size more so > than type), but the rest of the original Sun arguments for why slab was > such a great idea were basically just the crap. > > Hard type binning was a mistake (but needed by slab due to the idiotic > notion that constructors/destructors are "good for caches" - bleargh). I > suspect that hard size binning is a mistake too (ie there are probably > cases where you do want to split unused bigger size areas), but the fact > that all of our allocators are two-level (with the page allocator acting > as a size-agnostic free space) may help it somewhat. > > And yes, I do agree that any current allocator has problems with the big > sizes that don't fit well into a page or two (like task_struct). That > said, most of those don't have lots of allocations under many normal > circumstances (even if there are uses that will really blow them up). > > The *big* slab users at least for me tend to be ext3_inode_cache and > dentry. Everything else is orders of magnitude less. And of the two bad > ones, ext3_inode_cache is the bad one at 700+ bytes or whatever (resulting > in ~10% fragmentation just due to the page thing, regardless of whether > you use an order-0 or order-1 page allocation). > > Of course, dentries fit better in a page (due to being smaller), but then > the bigger number of dentries per page make it harder to actually free > pages, so then you get fragmentation from that. Oh well. You can't win. One idea I've been kicking around is pushing the boundary for the buddy allocator back a bit (to 64k, say) and using SL*B under that. The page allocators would call into buddy for larger than 64k (rare!) and SL*B otherwise. This would let us greatly improve our handling of things like task structs and skbs and possibly also things like 8k stacks and jumbo frames. As SL*B would never be competing with the page allocator for contiguous pages (the buddy allocator's granularity would be 64k), I don't think this would exacerbate the page-level fragmentation issues. Crazy? -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/