Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758999Ab3HMRd4 (ORCPT ); Tue, 13 Aug 2013 13:33:56 -0400 Received: from relay1.sgi.com ([192.48.179.29]:35914 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755919Ab3HMRdz (ORCPT ); Tue, 13 Aug 2013 13:33:55 -0400 Message-ID: <520A6DFC.1070201@sgi.com> Date: Tue, 13 Aug 2013 10:33:48 -0700 From: Mike Travis User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Linus Torvalds CC: Nathan Zimmer , Peter Anvin , Ingo Molnar , Linux Kernel Mailing List , linux-mm , Robin Holt , Rob Landley , Daniel J Blueman , Andrew Morton , Greg Kroah-Hartman , Yinghai Lu , Mel Gorman Subject: Re: [RFC v3 0/5] Transparent on-demand struct page initialization embedded in the buddy allocator References: <1375465467-40488-1-git-send-email-nzimmer@sgi.com> <1376344480-156708-1-git-send-email-nzimmer@sgi.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2612 Lines: 61 On 8/13/2013 10:09 AM, Linus Torvalds wrote: > On Mon, Aug 12, 2013 at 2:54 PM, Nathan Zimmer wrote: >> >> As far as extra overhead. We incur an extra function call to >> ensure_page_is_initialized but that is only really expensive when we find >> uninitialized pages, otherwise it is a flag check once every PTRS_PER_PMD. >> To get a better feel for this we ran two quick tests. > > Sorry for coming into this late and for this last version of the > patch, but I have to say that I'd *much* rather see this delayed > initialization using another data structure than hooking into the > basic page allocation ones.. > > I understand that you want to do delayed initialization on some TB+ > memory machines, but what I don't understand is why it has to be done > when the pages have already been added to the memory management free > list. > > Could we not do this much simpler: make the early boot insert the > first few gigs of memory (initialized) synchronously into the free > lists, and then have a background thread that goes through the rest? > > That way the MM layer would never see the uninitialized pages. > > And I bet that *nobody* cares if you "only" have a few gigs of ram > during the first few minutes of boot, and you mysteriously end up > getting more and more memory for a while until all the RAM has been > initialized. > > IOW, just don't call __free_pages_bootmem() on all the pages al at > once. If we have to remove a few __init markers to be able to do some > of it later, does anybody really care? > > I really really dislike this "let's check if memory is initialized at > runtime" approach. > > Linus > Initially this patch set consisted of diverting a major portion of the memory to an "absent" list during e820 processing. A very late initcall was then used to dispatch a cpu per node to add that nodes's absent memory. By nature these ran in parallel so Nathan did the work to "parallelize" various global resource locks to become per node locks. This sped up insertion considerably. And by disabling the "auto-start" of the insertion process and using a manual start command, you could monitor the insertion process and find hot spots in the memory initialization code. Also small updates to the sys/devices/{memory,node} drivers to also display the amount of memory still "absent". -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/