Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932867Ab2F2VC2 (ORCPT ); Fri, 29 Jun 2012 17:02:28 -0400 Received: from mail-lb0-f174.google.com ([209.85.217.174]:35520 "EHLO mail-lb0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755973Ab2F2VC0 (ORCPT ); Fri, 29 Jun 2012 17:02:26 -0400 MIME-Version: 1.0 In-Reply-To: <20120629163025.GP6676@redhat.com> References: <1340888180-15355-1-git-send-email-aarcange@redhat.com> <1340888180-15355-14-git-send-email-aarcange@redhat.com> <1340894776.28750.44.camel@twins> <4FEDB797.3050804@gmail.com> <20120629163025.GP6676@redhat.com> Date: Sat, 30 Jun 2012 05:02:25 +0800 Message-ID: Subject: Re: [PATCH 13/40] autonuma: CPU follow memory algorithm From: Nai Xia To: Andrea Arcangeli Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Hillf Danton , Dan Smith , Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Paul Turner , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Bharata B Rao , Lee Schermerhorn , Rik van Riel , Johannes Weiner , Srivatsa Vaddagiri , Christoph Lameter , Alex Shi , Mauricio Faria de Oliveira , Konrad Rzeszutek Wilk , Don Morris , Benjamin Herrenschmidt Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3345 Lines: 71 On Sat, Jun 30, 2012 at 12:30 AM, Andrea Arcangeli wrote: > Hi Nai, > > On Fri, Jun 29, 2012 at 10:11:35PM +0800, Nai Xia wrote: >> If one process do very intensive visit of a small set of pages in this >> node, but occasional visit of a large set of pages in another node. >> Will this algorithm do a very bad judgment? I guess the answer would >> be: it's possible and this judgment depends on the racing pattern >> between the process and your knuma_scand. > > Depending if the knuma_scand/scan_pass_sleep_millisecs is more or less > occasional than the visit of a large set of pages it may behave > differently correct. > > Note that every algorithm will have a limit on how smart it can be. > > Just to make a random example: if you lookup some pagecache a million > times and some other pagecache a dozen times, their "aging" > information in the pagecache will end up identical. Yet we know one > set of pages is clearly higher priority than the other. We've only so > many levels of lrus and so many referenced/active bitflags per > page. Once you get at the top, then all is equal. > > Does this mean the "active" list working set detection is useless just > because we can't differentiate a million of lookups on a few pages, vs > a dozen of lookups on lots of pages? > > Last but not the least, in the very example you mention it's not even > clear that the process should be scheduled in the CPU where there is > the small set of pages accessed frequently, or the CPU where there's > the large set of pages accessed occasionally. If the small sets of > pages fits in the 8MBytes of the L2 cache, then it's better to put the > process in the other CPU where the large set of pages can't fit in the > L2 cache. Lots of hardware details should be evaluated, to really know > what's the right thing in such case even if it was you having to > decide. > > But the real reason why the above isn't an issue and why we don't need > to solve that problem perfectly: there's not just a CPU follow memory > algorithm in AutoNUMA. There's also the memory follow CPU > algorithm. AutoNUMA will do its best to change the layout of your > example to one that has only one clear solution: the occasional lookup > of the large set of pages, will make those eventually go in the node > together with the small set of pages (or the other way around), and > this is how it's solved. > > In any case, whatever wrong decision it will take, it will at least be > a better decision than the numa/sched where there's absolutely zero > information about what pages the process is accessing. And best of all > with AutoNUMA you also know which pages the _thread_ is accessing so > it will also be able to take optimal decisions if there are more > threads than CPUs in a node (as long as not all thread accesses are > shared). > > Hope this explains things better. > Andrea Hi Andrea, Sorry for being so negative, but this problem seems so clear to me. I might have pointed all these out, if you CC me since the first version, I am not always on the list watching posts.... Sincerely, Nai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/