Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp4130451pxb; Tue, 2 Nov 2021 04:46:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwFvos2KPPlbtFjL8sw3CvB8v/9mbazqg9xm9j+xb1s5OewaegtbarcFVIfkt1AoJGM9a16 X-Received: by 2002:a17:907:60cc:: with SMTP id hv12mr44936581ejc.86.1635853619414; Tue, 02 Nov 2021 04:46:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635853619; cv=none; d=google.com; s=arc-20160816; b=bld/Ij9CVc1cASi2q6pEGqlq7xnaFhjRLnkmES23liQad1X2vnJA5rUt0Map+MiWE0 uNljB7nMWcff41cJJ2GK0hK8F3daO1FtL/eG5gBSWwBHqo+z5s7/Xen0GO6W498nfhS8 7nDO5Ajo9YfppdKEsZYfBNcPdoUg2ts2MB0MHxCI0mqUp2XDbhmFBUuyI98TJDYB4c7c eRjzH/lxDG1ZvTVhYyEjDwN1XrtyhttXpD4NK9m2o3aq/BnsvG3MCQSYsZn6T5fdAS4P 6fJL3kfmsUy60h2EmYSARadyRDNl8ucykBWOIRiziS8MnDA3k+23Ri2BO5MbToz79O3x lirw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=KEOvJnRWexgJYfY25q9WuMGbZFJrjSNfEkZHTyzLLJ4=; b=inExrsYfL5FK8TtiDFJWM28IRsCj08QWbZS2d4yqG+4AuPd11BiDRmQoWRs6a70Y9y O46JwD0VEGIrcgXjvl548U3NBZ6Tmk+uMdF0Q2SH95z74+bJGV4xE/a7j79mUKurIVDZ 5aWA4+gnFsvwLQ/RWimxlMqSEsc4BWQfkYRwlYb9P0MEfIw6MKG4++xCx2tSQO9A/oAl vQYrYyNHhZHmZDS8fEUzKfr55tzRJp9cSxwBUYb9GgZv8SPVSUeXxOxFYEQj8Hap/ne6 QCzyB4UdPIBhZ2H8cPm2ouLjeIFAjp+8escgk3/uEEQn5pegVFsSGb1a3kZ4eY+UmXNt q+Sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=IZ7Lp6Mb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h18si5065842edb.621.2021.11.02.04.46.34; Tue, 02 Nov 2021 04:46:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=IZ7Lp6Mb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230305AbhKBLrW (ORCPT + 99 others); Tue, 2 Nov 2021 07:47:22 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:41008 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230058AbhKBLrV (ORCPT ); Tue, 2 Nov 2021 07:47:21 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 789F521763; Tue, 2 Nov 2021 11:44:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1635853484; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KEOvJnRWexgJYfY25q9WuMGbZFJrjSNfEkZHTyzLLJ4=; b=IZ7Lp6Mba/cCskET8M5IguvVMzjQ7yUO+F2026/HvGdJsEgmC8DAuRrrE8P5/SsAM+Kykf aU1Fq4XFYfSb188i1kPPeCMEHWmTmXnnaPsqryYyMx7r1K5HUY4J2JObf50T1nFapwr8M2 PrNJSZshO43h5ei2dFfXvlzbzlAbPMs= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 49D6EA3B83; Tue, 2 Nov 2021 11:44:44 +0000 (UTC) Date: Tue, 2 Nov 2021 12:44:40 +0100 From: Michal Hocko To: David Hildenbrand Cc: Alexey Makhalov , "linux-mm@kvack.org" , Andrew Morton , "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" , Oscar Salvador Subject: Re: [PATCH] mm: fix panic in __alloc_pages Message-ID: References: <20211101201312.11589-1-amakhalov@vmware.com> <7136c959-63ff-b866-b8e4-f311e0454492@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 02-11-21 12:00:57, David Hildenbrand wrote: > On 02.11.21 11:34, Alexey Makhalov wrote: [...] > >> The node onlining logic when onlining a CPU sounds bogus as well: Let's > >> take a look at try_offline_node(). It checks that: > >> 1) That no memory is *present* > >> 2) That no CPU is *present* > >> > >> We should online the node when adding the CPU ("present"), not when > >> onlining the CPU. > > > > Possible. > > Assuming try_online_node was moved under add_cpu(), let’s > > take look on this call stack: > > add_cpu() > > try_online_node() > > __try_online_node() > > hotadd_new_pgdat() > > At line 1190 we'll have a problem: > > 1183 pgdat = NODE_DATA(nid); > > 1184 if (!pgdat) { > > 1185 pgdat = arch_alloc_nodedata(nid); > > 1186 if (!pgdat) > > 1187 return NULL; > > 1188 > > 1189 pgdat->per_cpu_nodestats = > > 1190 alloc_percpu(struct per_cpu_nodestat); > > 1191 arch_refresh_nodedata(nid, pgdat); > > > > alloc_percpu() will go for all possible CPUs and will eventually end up > > calling alloc_pages_node() trying to use subject nid for corresponding CPU > > hitting the same state #2 problem as NODE_DATA(nid) is still NULL and nid > > is not yet online. > > Right, we will end up calling pcpu_alloc_pages()->alloc_pages_node() for > each possible CPU. We use cpu_to_node() to come up with the NID. Shouldn't this be numa_mem_id instead? Memory less nodes are odd little critters crafted into the MM code without wider considerations. From time to time we are struggling with some fallouts but the primary thing is that zonelists should be valid for all memory less nodes. If that is not the case then there is a problem with the initialization code. If somebody is providing a bogus node to allocate from then this should be fixed. It is still not clear to me which case are we hitting here. -- Michal Hocko SUSE Labs