Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1366744imm; Wed, 1 Aug 2018 14:51:41 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdxrwuVJ+MY8YHLYSfSU+nKYq2Q9s7GaLEm3gDrlPiLoFoY6kjJtrDK65Q6t3ZNXDNtagzZ X-Received: by 2002:a17:902:5a4c:: with SMTP id f12-v6mr63328plm.253.1533160301272; Wed, 01 Aug 2018 14:51:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533160301; cv=none; d=google.com; s=arc-20160816; b=EdyfmzLlTI4LIAxBlqVPEgJ21PRZ5HzXai7t+fxPnYnhdh4PDStTUyWoSZXQ64unXK 0ZxiG5jptFBIA9NMh4yt3XTkJkfrWjjf7PrWW+IajndCuYeXViPEDA+I+Mrjq/aXJN85 UZ+AwlYuKQhFURpmpH7WZPbrkHUvNa/z7tU7fI0Jd4krrA1Q5WtheJGPRtlEhCKyI+/C jsQzll9nTqW0Vp9FSystZDZc+z6BoXVtIjcdT5lcditsFho8VvJUdh/TK9PhekGW5GHs 3Gm1jNRagowozPrO1dFJ/rMyQJ/JdOhUUt2WWOUJ+JgaETayGuCgYbvZA2pE79XG+IXd PiQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=A7BJ6VJAmJVtgjX6DSKE2t2AGwm+L5F/aT/VQZ08qus=; b=Mp2B6P/H3O3m8SVvZFKdpb7AnngemkzC9NyeTfn9s5u6Dbyx5EZUVbdKQvi/zYUYMH M5RWwdBL9czk8pyS2WX5hdXUQZUmffZXDvgEAL6m3PIFp4W1HZiQ22aJoGnfoNcCnXSF TQaqFmYu6KiK/aYvIvQGPcCqN4iWQmBvBICUINGgqsx9og4/qDOq4HQFuekQfGcLiT7W B6qvQT4SjSc9/p933nBPeJe/vnii/1x1EYDA1gy0SObT+mVr7Z4KyYLrC9/+2rDxwPkZ pSWGs2Mjts8wU9Zdv7kcGuCZCFNd/3iBEIIUPk1Bsd2xDiAA0IWrMRMPYb/fn7WZ+hgC HV/A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a5-v6si65533pla.167.2018.08.01.14.51.26; Wed, 01 Aug 2018 14:51:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732503AbeHAXiN (ORCPT + 99 others); Wed, 1 Aug 2018 19:38:13 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:56910 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726174AbeHAXiN (ORCPT ); Wed, 1 Aug 2018 19:38:13 -0400 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.92]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 2EA1CD3C; Wed, 1 Aug 2018 21:50:21 +0000 (UTC) Date: Wed, 1 Aug 2018 14:50:20 -0700 From: Andrew Morton To: Jeremy Linton Cc: linux-mm@kvack.org, cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, mhocko@suse.com, vbabka@suse.cz, Punit.Agrawal@arm.com, Lorenzo.Pieralisi@arm.com, linux-arm-kernel@lists.infradead.org, bhelgaas@google.com, linux-kernel@vger.kernel.org Subject: Re: [RFC 0/2] harden alloc_pages against bogus nid Message-Id: <20180801145020.8c76a490c1bf9bef5f87078a@linux-foundation.org> In-Reply-To: <20180801200418.1325826-1-jeremy.linton@arm.com> References: <20180801200418.1325826-1-jeremy.linton@arm.com> X-Mailer: Sylpheed 3.6.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 1 Aug 2018 15:04:16 -0500 Jeremy Linton wrote: > The thread "avoid alloc memory on offline node" > > https://lkml.org/lkml/2018/6/7/251 > > Asked at one point why the kzalloc_node was crashing rather than > returning memory from a valid node. The thread ended up fixing > the immediate causes of the crash but left open the case of bad > proximity values being in DSDT tables without corrisponding > SRAT/SLIT entries as is happening on another machine. > > Its also easy to fix that, but we should also harden the allocator > sufficiently that it doesn't crash when passed an invalid node id. > There are a couple possible ways to do this, and i've attached two > separate patches which individually fix that problem. > > The first detects the offline node before calling > the new_slab code path when it becomes apparent that the allocation isn't > going to succeed. The second actually hardens node_zonelist() and > prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL > zonelist. This latter case happens if the node has never been initialized > or is possibly out of range. There are other places (NODE_DATA & > online_node) which should be checking if the node id's are > MAX_NUMNODES. > What is it that leads to a caller requesting memory from an invalid node? A race against offlining? If so then that's a lack of appropriate locking, isn't it? I don't see a problem with emitting a warning and then selecting a different node so we can keep running. But we do want that warning, so we can understand the root cause and fix it?