Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756322Ab0DPRwJ (ORCPT ); Fri, 16 Apr 2010 13:52:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44405 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751276Ab0DPRwI (ORCPT ); Fri, 16 Apr 2010 13:52:08 -0400 Date: Fri, 16 Apr 2010 19:51:26 +0200 From: Andrea Arcangeli To: Christoph Lameter Cc: KAMEZAWA Hiroyuki , Daisuke Nishimura , LKML , linux-mm , Mel Gorman , Rik van Riel , Minchan Kim , Balbir Singh , KOSAKI Motohiro , Andrew Morton Subject: Re: Interleave policy on 2M pages (was Re: [RFC][BUGFIX][PATCH 1/2] memcg: fix charge bypass route of migration) Message-ID: <20100416175126.GP32034@random.random> References: <20100413134207.f12cdc9c.nishimura@mxp.nes.nec.co.jp> <20100415120516.3891ce46.kamezawa.hiroyu@jp.fujitsu.com> <20100415154324.834dace9.nishimura@mxp.nes.nec.co.jp> <20100415155611.da707913.kamezawa.hiroyu@jp.fujitsu.com> <20100415081743.GP32034@random.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1680 Lines: 36 On Fri, Apr 16, 2010 at 11:13:10AM -0500, Christoph Lameter wrote: > On Thu, 15 Apr 2010, Andrea Arcangeli wrote: > > > 2) add alloc_pages_vma for numa awareness in the huge page faults > > How do interleave policies work with alloc_pages_vma? So far the semantics > is to spread 4k pages over different nodes. With 2M pages this can no > longer work the way is was. static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, unsigned nid) See the order parameter, so I hope it's already solved. I assume the idea would be to interleave 2M pages to avoid the CPU the memory overhead of the pte layer and to decrease the tlb misses, but still maxing out the bandwidth of the system when multiple threads accesses memory that is stored in different nodes with random access. It should be ideal for hugetlbfs too for the large shared memory pools of the DB. Surely it'll be better than having all hugepages from the same node despite MPOL_INTERLEAVE is set. Said that, it'd also be possible to disable hugepages if the vma has MPOL_INTERLEAVE set, but I doubt we want to do that by default. Maybe we can add a sysfs control later for that which can be further tweaked at boot time by per-arch quirks, dunno... It's really up to you, you know numa better, but I've no doubt that MPOL_INTERLEAVE also can make sense with hugepages (both hugetlbfs and transparent hugepage support). Thanks, Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/