Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3801723imm; Mon, 11 Jun 2018 01:54:26 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKUZcvaF9DRiXJifhZCsKv1CsFoy0MIBlzMkKm9BKrDXIIErJ7VXfmRBnJxRtenvqqNvJHY X-Received: by 2002:a17:902:5501:: with SMTP id f1-v6mr17573481pli.108.1528707266550; Mon, 11 Jun 2018 01:54:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528707266; cv=none; d=google.com; s=arc-20160816; b=aF4/fbAsLGIRWEFimTjWSGczc5/3Jo4DMNCv6cjrqAZJW1H4aNyGtaqQyhKjK/Kefw ojp256rSPR8GxRQRVe8UIuX92zCdwrd+myh8YJENTs2n+QkpLjwsXQNC0fQRDgD8Vgad 5Hu656gf0T4o4I+fUYTrRz6Vza2OUw45Jb2FA187ephq1ARfc2iTadDtGA+JGIpgQBUE BcBN6g2WUm2jynedujlXVMvk9li7nao3LeAm6RL7phse4C9hpXLGdA87vZq6fOx4csdl 2vBviOGwsza9pcIze/M3GlaOnA0fJdaLfHf9+EkDnoEGicNDZjq2NHhM1F2/EQN4P+FN 8JoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=4EZK66m7a811mqcThc1iOkhU+fRV4GmlPuhiMe6cQWY=; b=oWM1KO66SLX0i33Oom9uxD668BMWqxzh7U1rsW8FgLkOi7X251To836gs1KgYjULU5 VXh6xDqAxNDvR9PTnpKGhl5LcJyvNdGMZmPFPt54Z718klD5NI68tX+U973RNYs5JBeZ r7DQzRxtNPY/Rm8HizkOb9+IuKnh9aWAQwR4mYxpJZdZ+Sn0yGonz31CNzp3znenVLZD n0Iy7o+o6TPrAl5COszk8YpKoR4v0RtpOpY8iwCgHvhTxdTJheAQR1MNLE5jLpnc9Kc4 3JIRmqfBQmhdpXCj85qghirWOzEbqgl2Oea1jpOD0NrajAhVTiq/ZOoL+F4t91q8inss BwLQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f59-v6si28341006plf.500.2018.06.11.01.54.11; Mon, 11 Jun 2018 01:54:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932592AbeFKIwo (ORCPT + 99 others); Mon, 11 Jun 2018 04:52:44 -0400 Received: from mx2.suse.de ([195.135.220.15]:46512 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932551AbeFKIwm (ORCPT ); Mon, 11 Jun 2018 04:52:42 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 070B7ACCA; Mon, 11 Jun 2018 08:52:41 +0000 (UTC) Date: Mon, 11 Jun 2018 10:52:37 +0200 From: Michal Hocko To: Xie XiuQi Cc: Hanjun Guo , Bjorn Helgaas , Will Deacon , Catalin Marinas , Greg Kroah-Hartman , "Rafael J. Wysocki" , Jarkko Sakkinen , linux-arm , Linux Kernel Mailing List , wanghuiqiang@huawei.com, tnowicki@caviumnetworks.com, linux-pci@vger.kernel.org, Andrew Morton , linux-mm@kvack.org, zhongjiang Subject: Re: [PATCH 1/2] arm64: avoid alloc memory on offline node Message-ID: <20180611085237.GI13364@dhcp22.suse.cz> References: <1527768879-88161-1-git-send-email-xiexiuqi@huawei.com> <1527768879-88161-2-git-send-email-xiexiuqi@huawei.com> <20180606154516.GL6631@arm.com> <20180607105514.GA13139@dhcp22.suse.cz> <5ed798a0-6c9c-086e-e5e8-906f593ca33e@huawei.com> <20180607122152.GP32433@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 11-06-18 11:23:18, Xie XiuQi wrote: > Hi Michal, > > On 2018/6/7 20:21, Michal Hocko wrote: > > On Thu 07-06-18 19:55:53, Hanjun Guo wrote: > >> On 2018/6/7 18:55, Michal Hocko wrote: > > [...] > >>> I am not sure I have the full context but pci_acpi_scan_root calls > >>> kzalloc_node(sizeof(*info), GFP_KERNEL, node) > >>> and that should fall back to whatever node that is online. Offline node > >>> shouldn't keep any pages behind. So there must be something else going > >>> on here and the patch is not the right way to handle it. What does > >>> faddr2line __alloc_pages_nodemask+0xf0 tells on this kernel? > >> > >> The whole context is: > >> > >> The system is booted with a NUMA node has no memory attaching to it > >> (memory-less NUMA node), also with NR_CPUS less than CPUs presented > >> in MADT, so CPUs on this memory-less node are not brought up, and > >> this NUMA node will not be online (but SRAT presents this NUMA node); > >> > >> Devices attaching to this NUMA node such as PCI host bridge still > >> return the valid NUMA node via _PXM, but actually that valid NUMA node > >> is not online which lead to this issue. > > > > But we should have other numa nodes on the zonelists so the allocator > > should fall back to other node. If the zonelist is not intiailized > > properly, though, then this can indeed show up as a problem. Knowing > > which exact place has blown up would help get a better picture... > > > > I specific a non-exist node to allocate memory using kzalloc_node, > and got this following error message. > > And I found out there is just a VM_WARN, but it does not prevent the memory > allocation continue. > > This nid would be use to access NODE_DADA(nid), so if nid is invalid, > it would cause oops here. > > 459 /* > 460 * Allocate pages, preferring the node given as nid. The node must be valid and > 461 * online. For more general interface, see alloc_pages_node(). > 462 */ > 463 static inline struct page * > 464 __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) > 465 { > 466 VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); > 467 VM_WARN_ON(!node_online(nid)); > 468 > 469 return __alloc_pages(gfp_mask, order, nid); > 470 } > 471 > > (I wrote a ko, to allocate memory on a non-exist node using kzalloc_node().) OK, so this is an artificialy broken code, right. You shouldn't get a non-existent node via standard APIs AFAICS. The original report was about an existing node which is offline AFAIU. That would be a different case. If I am missing something and there are legitimate users that try to allocate from non-existing nodes then we should handle that in node_zonelist. [...] -- Michal Hocko SUSE Labs