Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7934103imu; Mon, 3 Dec 2018 23:23:48 -0800 (PST) X-Google-Smtp-Source: AFSGD/VkexdFyLU5s06gB0sLQKclAZyYSfJgM9dSBhjqi+h1ZDiDJ5U61dC1l07W2MxP23MJ13WL X-Received: by 2002:a63:cf56:: with SMTP id b22mr14974469pgj.336.1543908228881; Mon, 03 Dec 2018 23:23:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543908228; cv=none; d=google.com; s=arc-20160816; b=yEFDWLTGaFMuoIyes3Al6K0DEsKF2XyH8b/5QtjPQ/c0g4H+dSEsT8K+NKQpFXIwfr Zc+iJiK48FAavSKHvONF4UUjX+Zl36CvcoKbpS+vGkp/3u07Pvb2UGBSfjKUEJ/DbyYq 3FhvQz7pbmXR8zRRQ2AcuTHxk9V83oowxuUHXt4+EWJ782/CJK0Ihwxi2v1dnxuy4SCQ g+JmHpoX+kryS1qsFql+hcgKu6QmxElPju3Yt4nZ2oY0vUaN1RCsPu3Sc6nUwrq+esC6 pyxoWExTu/27Tbds3SRcIK4IvDffm0OM9OagQM/1A6b1WpLB3rkW+VQVhL3oovIpE9R0 5KbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=X/8hTl+g3o0utsPC2twaS+S8cdTXTvmaLrGdPAAPLYc=; b=ACd93M38rn0iGgt9uDgs0/OQq9cEGRxK+B6cFYV0yfV9h3EA1bF61FwdR14jUMrzgo +UuTfG6k8Ub7Q11FzOSsNMAMX/vxc8X67GAqUYyUkkKUrz1LdihCwF0DwF66u7JBls8v s43Gqvkojve11hg/gf8EOSTM6KguqAM70iwKQuObr6ts7FnJ3EO7Qreys0RQn5ozYRw9 LYFBzsA8O0iDvaNs8fQs2pqgYplgEXD2T0snQsBm1onF865jijd1ssays7fs/IZMcXIH LnKFzN7TMeJoeb62sdF9At0d4agiSbxE/nT+1mlDSZePh1FuqUJyIb40QW8M0/BwiXql OR1A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h91si16543177pld.411.2018.12.03.23.23.32; Mon, 03 Dec 2018 23:23:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726030AbeLDHW5 (ORCPT + 99 others); Tue, 4 Dec 2018 02:22:57 -0500 Received: from mx2.suse.de ([195.135.220.15]:41898 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725988AbeLDHW5 (ORCPT ); Tue, 4 Dec 2018 02:22:57 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id CFEE3AF7A; Tue, 4 Dec 2018 07:22:53 +0000 (UTC) Date: Tue, 4 Dec 2018 08:22:51 +0100 From: Michal Hocko To: Pingfan Liu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Vlastimil Babka , Mike Rapoport , Bjorn Helgaas , Jonathan Cameron Subject: Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline Message-ID: <20181204072251.GT31738@dhcp22.suse.cz> References: <1543892757-4323-1-git-send-email-kernelfans@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1543892757-4323-1-git-send-email-kernelfans@gmail.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 04-12-18 11:05:57, Pingfan Liu wrote: > During my test on some AMD machine, with kexec -l nr_cpus=x option, the > kernel failed to bootup, because some node's data struct can not be allocated, > e.g, on x86, initialized by init_cpu_to_node()->init_memory_less_node(). But > device->numa_node info is used as preferred_nid param for > __alloc_pages_nodemask(), which causes NULL reference > ac->zonelist = node_zonelist(preferred_nid, gfp_mask); > This patch tries to fix the issue by falling back to the first online node, > when encountering such corner case. We have seen similar issues already and the bug was usually that the zonelists were not initialized yet or the node is completely bogus. Zonelists should be initialized by build_all_zonelists quite early so I am wondering whether the later is the case. What is the actual node number the device is associated with? Your patch is not correct btw, because we want to fallback into the node in the distance order rather into the first online node. -- Michal Hocko SUSE Labs