Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932477Ab2E3JHX (ORCPT ); Wed, 30 May 2012 05:07:23 -0400 Received: from merlin.infradead.org ([205.233.59.134]:58414 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932272Ab2E3JHV convert rfc822-to-8bit (ORCPT ); Wed, 30 May 2012 05:07:21 -0400 Message-ID: <1338368763.26856.207.camel@twins> Subject: Re: [PATCH 13/35] autonuma: add page structure fields From: Peter Zijlstra To: KOSAKI Motohiro Cc: Rik van Riel , Andrea Arcangeli , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Hillf Danton , Dan Smith , Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Paul Turner , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Bharata B Rao , Lee Schermerhorn , Johannes Weiner , Srivatsa Vaddagiri , Christoph Lameter Date: Wed, 30 May 2012 11:06:03 +0200 In-Reply-To: <4FC5D973.3080108@gmail.com> References: <1337965359-29725-1-git-send-email-aarcange@redhat.com> <1337965359-29725-14-git-send-email-aarcange@redhat.com> <1338297385.26856.74.camel@twins> <4FC4D58A.50800@redhat.com> <1338303251.26856.94.camel@twins> <4FC5D973.3080108@gmail.com> Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Mailer: Evolution 3.2.2- Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2198 Lines: 47 On Wed, 2012-05-30 at 04:25 -0400, KOSAKI Motohiro wrote: > (5/29/12 10:54 AM), Peter Zijlstra wrote: > > On Tue, 2012-05-29 at 09:56 -0400, Rik van Riel wrote: > >> On 05/29/2012 09:16 AM, Peter Zijlstra wrote: > >>> On Fri, 2012-05-25 at 19:02 +0200, Andrea Arcangeli wrote: > >> > >>> 24 bytes per page.. or ~0.6% of memory gone. This is far too great a > >>> price to pay. > >>> > >>> At LSF/MM Rik already suggested you limit the number of pages that can > >>> be migrated concurrently and use this to move the extra list_head out of > >>> struct page and into a smaller amount of extra structures, reducing the > >>> total overhead. > >> > >> For THP, we should be able to track this NUMA info on a > >> 2MB page granularity. > > > > Yeah, but that's another x86-only feature, _IF_ we're going to do this > > it must be done for all archs that have CONFIG_NUMA, thus we're stuck > > with 4k (or other base page size). > > Even if THP=n, we don't need 4k granularity. All modern malloc implementation have > per-thread heap (e.g. glibc call it as arena) and it is usually 1-8MB size. So, if > it is larger than 2MB, we can always use per-pmd tracking. iow, memory consumption > reduce to 1/512. Yes, and we all know objects allocated in one thread are never shared with other threads.. the producer-consumer pattern seems fairly popular and will destroy your argument. > My suggestion is, track per-pmd (i.e. 2M size) granularity and fix glibc too (current > glibc malloc has dynamically arena size adjusting feature and then it often become > less than 2M). The trouble with making this per pmd is that you then get the false sharing per pmd, so if there's shared data on the 2m page you'll not know where to put it. I also know of some folks who did a strict per-cpu allocator based on some kernel patches I hope to see posted sometime soon. This because if you have many more threads than cpus the wasted space in your areas is tremendous. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/