Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757593AbYBEKS2 (ORCPT ); Tue, 5 Feb 2008 05:18:28 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756871AbYBEKSG (ORCPT ); Tue, 5 Feb 2008 05:18:06 -0500 Received: from relay2.sgi.com ([192.48.171.30]:33024 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756214AbYBEKSD (ORCPT ); Tue, 5 Feb 2008 05:18:03 -0500 Date: Tue, 5 Feb 2008 04:17:55 -0600 From: Paul Jackson To: Lee Schermerhorn Cc: kosaki.motohiro@jp.fujitsu.com, andi@firstfloor.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, clameter@sgi.com, rientjes@google.com, mel@csn.ul.ie Subject: Re: [2.6.24-rc8-mm1][regression?] numactl --interleave=all doesn't works on memoryless node. Message-Id: <20080205041755.3411b5cc.pj@sgi.com> In-Reply-To: <1202149243.5028.61.camel@localhost> References: <20080202165054.F491.KOSAKI.MOTOHIRO@jp.fujitsu.com> <20080202090914.GA27723@one.firstfloor.org> <20080202180536.F494.KOSAKI.MOTOHIRO@jp.fujitsu.com> <1202149243.5028.61.camel@localhost> Organization: SGI X-Mailer: Sylpheed version 2.2.4 (GTK+ 2.12.0; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2728 Lines: 61 Lee wrote: > I don't know the current state of Paul's rework of cpusets and > mems_allowed. That probably resolves this issue, if he still plans on > allowing a fully populated mask to indicate interleaving over all > allowed nodes. It got a bit stalled out for the last month (my employer had other designs on my time.) But I'd really like to drive it home. What happened so far, in December 2007 and earlier, is that a few of us: David Rientjes Lee.Schermerhorn@hp.com Christoph Lameter Andi Kleen had a discussion, motivated in good part by the need to allow a mempolicy of MPOL_INTERLEAVE over all nodes currently available in the cpuset, where that interleave policy was robustly preserved if the cpuset changed (without requiring the application to somehow "know" its cpuset had changed and reissuing the set_mempolicy call.) But that discussion touched on some other long standing deficiencies in the way that I had originally glued cpusets and memory policies together. The current mechanism doesn't handle changing cpusets very well, especially if the number of nodes in the cpuset increases. Obviously, I can't change the current behaviour, especially of the mempolicy system calls. I can only add new options that provide new alternatives. The patchset I'd like to drive home addresses these issues with a couple of additional MPOL_* flags, upward compatible, that alter the way that nodemasks are mapped into cpusets, and remapped if the cpuset subsequently changes. The next two steps I need to take are: 1) propose this patch, with careful explanation (it's easy to lose one's bearings in the mappings and remappings of node numberings) to a wider audience, such as linux-mm or linux-kernel, and 2) carefully test this, especially on each code path I touched in mm/mempolicy.c, where the changes were delicate, to ensure I didn't break any existing code. There were also some other, smaller patches proposed, by myself and others. I was preferring to address a wider set of the long standing issues in this area, but the others above mostly preferred the smaller patches. This needs to be discussed in a wider forum, and a concensus reached. Hopefully this week or next, I will publish this patch proposal. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.940.382.4214 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/