Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68269C433FE for ; Wed, 8 Dec 2021 08:30:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240662AbhLHIeI (ORCPT ); Wed, 8 Dec 2021 03:34:08 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:40568 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231444AbhLHIeH (ORCPT ); Wed, 8 Dec 2021 03:34:07 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id F0DC11FD3E; Wed, 8 Dec 2021 08:30:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638952234; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s4DawGFhqDEWy8bwtJuu4mjzG5TYxTEODeOtF/8wB48=; b=lRtGwHF96+Lp7YKK15WWj/ntbjH2Pf4wKDtXUZbQUh8mw1/S1r7h4vLsDoNoXhRWVtuTAo TvVBOTCUDXiXLxBNlWvtAS4T24JgZCFPu75TMIQWCt8yxfqgW0sqnrrMccICm2abbYzpIf nAs/vqYA+9BCZmA6Tcy4KRdPj4llPjk= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id B6443A3B81; Wed, 8 Dec 2021 08:30:34 +0000 (UTC) Date: Wed, 8 Dec 2021 09:30:34 +0100 From: Michal Hocko To: Alexey Makhalov Cc: David Hildenbrand , Dennis Zhou , Eric Dumazet , "linux-mm@kvack.org" , Andrew Morton , Oscar Salvador , Tejun Heo , Christoph Lameter , "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" Subject: Re: [PATCH v3] mm: fix panic in __alloc_pages Message-ID: References: <2E174230-04F3-4798-86D5-1257859FFAD8@vmware.com> <21539fc8-15a8-1c8c-4a4f-8b85734d2a0e@redhat.com> <78E39A43-D094-4706-B4BD-18C0B18EB2C3@vmware.com> <0E315E66-7EE2-42D8-B1B5-BD49E21AD67E@vmware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <0E315E66-7EE2-42D8-B1B5-BD49E21AD67E@vmware.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 08-12-21 08:19:16, Alexey Makhalov wrote: > Hi Michal, > > > On Dec 8, 2021, at 12:04 AM, Michal Hocko wrote: > > > > On Tue 07-12-21 17:17:27, Alexey Makhalov wrote: > >> > >> > >>> On Dec 7, 2021, at 9:13 AM, David Hildenbrand wrote: > >>> > >>> On 07.12.21 18:02, Alexey Makhalov wrote: > >>>> > >>>> > >>>>> On Dec 7, 2021, at 8:36 AM, Michal Hocko wrote: > >>>>> > >>>>> On Tue 07-12-21 17:27:29, Michal Hocko wrote: > >>>>> [...] > >>>>>> So your proposal is to drop set_node_online from the patch and add it as > >>>>>> a separate one which handles > >>>>>> - sysfs part (i.e. do not register a node which doesn't span a > >>>>>> physical address space) > >>>>>> - hotplug side of (drop the pgd allocation, register node lazily > >>>>>> when a first memblocks are registered) > >>>>> > >>>>> In other words, the first stage > >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>>> index c5952749ad40..f9024ba09c53 100644 > >>>>> --- a/mm/page_alloc.c > >>>>> +++ b/mm/page_alloc.c > >>>>> @@ -6382,7 +6382,11 @@ static void __build_all_zonelists(void *data) > >>>>> if (self && !node_online(self->node_id)) { > >>>>> build_zonelists(self); > >>>>> } else { > >>>>> - for_each_online_node(nid) { > >>>>> + /* > >>>>> + * All possible nodes have pgdat preallocated > >>>>> + * free_area_init > >>>>> + */ > >>>>> + for_each_node(nid) { > >>>>> pg_data_t *pgdat = NODE_DATA(nid); > >>>>> > >>>>> build_zonelists(pgdat); > >>>> > >>>> Will it blow up memory usage for the nodes which might never be onlined? > >>>> I prefer the idea of init on demand. > >>>> > >>>> Even now there is an existing problem. > >>>> In my experiments, I observed _huge_ memory consumption increase by increasing number > >>>> of possible numa nodes. I’m going to report it in separate mail thread. > >>> > >>> I already raised that PPC might be problematic in that regard. Which > >>> architecture / setup do you have in mind that can have a lot of possible > >>> nodes? > >>> > >> It is x86_64 VMware VM, not the regular one, but specially configured (1 vCPU per node, > >> with hot-plug support, 128 possible nodes) > > > > This is slightly tangent but could you elaborate more on this setup and > > reasoning behind it. I was already curious when you mentioned this > > previously. Why would you want to have so many nodes and having 1:1 with > > CPUs. What is the resulting NUMA topology? > > This setup with 128 nodes was used purely for development purposes. That is when the issue > with hot adding numa nodes was found. OK, I see. > Original issue presents even with feasible number of nodes. Yes the issue is independent on the number of offline nodes currently. The number of nodes is only interesting for the wasted amount of memory if we are to allocate pgdat for each possible node. -- Michal Hocko SUSE Labs