Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757047AbYAJTrX (ORCPT ); Thu, 10 Jan 2008 14:47:23 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1759306AbYAJTq7 (ORCPT ); Thu, 10 Jan 2008 14:46:59 -0500 Received: from relay2.sgi.com ([192.48.171.30]:50483 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1758691AbYAJTq5 (ORCPT ); Thu, 10 Jan 2008 14:46:57 -0500 Date: Thu, 10 Jan 2008 11:46:56 -0800 (PST) From: Christoph Lameter X-X-Sender: clameter@schroedinger.engr.sgi.com To: Linus Torvalds cc: Matt Mackall , Pekka J Enberg , Ingo Molnar , Hugh Dickins , Andi Kleen , Peter Zijlstra , Linux Kernel Mailing List Subject: Re: [RFC PATCH] greatly reduce SLOB external fragmentation In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1871 Lines: 48 On Thu, 10 Jan 2008, Linus Torvalds wrote: > It's not even clear that a buddy allocator even for the high-order pages > is at all the right choice. Almost nobody actually wants >64kB blocks, and > the ones that *do* want bigger allocations tend to want *much* bigger > ones, so it's quite possible that it could be worth it to have something > like a three-level allocator: Excellent! I am definitely on board with this. > - huge pages (superpages for those crazy db people) > > Just a simple linked list of these things is fine, we'd never care > about coalescing large pages together anyway. > > - "large pages" (on the order of ~64kB) - with *perhaps* a buddy bitmap > setup to try to coalesce back into huge-pages, but more likely just > admitting that you'd need something like migration to ever get back a > hugepage that got split into large-pages. > > So maybe a simple bitmap allocator per huge-page for large pages. Say > you have a 4MB huge-page, and just a 64-bit free-bitmap per huge-page > when you split it into large pages. > > - slab/slub/slob for anything else, and "get_free_page()" ends up being > just a shorthand for saying "naturally aligned kmalloc of size > "PAGE_SIZE< > and maybe it would all work out ok. Hmmm... a 3 level allocator? Basically we would have BASE_PAGE STANDARD_PAGE and HUGE_PAGE? We could simply extend the page allocator to have 3 pcp lists for these sizes and go from there? Thinking about the arches this would mean BASE_PAGE STANDARD_PAGE HUGE_PAGE x86_64 4k 64k 2M i386 4k 16k 4M ia64 16k 256k 1G ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/