Received: by 2002:ac0:a874:0:0:0:0:0 with SMTP id c49csp427331ima; Fri, 15 Mar 2019 06:06:08 -0700 (PDT) X-Google-Smtp-Source: APXvYqzh+jDQRn6Qq9ASjra4vNVNXh7zgUy0UQgCGoGlevHmMlIzG67PK5snhdqNpTuTgOajx+ol X-Received: by 2002:a17:902:8c84:: with SMTP id t4mr4157774plo.298.1552655168738; Fri, 15 Mar 2019 06:06:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552655168; cv=none; d=google.com; s=arc-20160816; b=sx4hQjSW02RtK+umBh7lEMB1jKyt4bBDRKi+0VTPmQDDJRcrU0QvFUZh69i1MzWXA1 3Vnuw656Ga5uOOSYYGcDCOZHwGCaUZ5Eo95VnwoEw/DWxEHXaVj109v4j/I2qlPQKMcQ d6FLQI4MvsFXHgYIUV+ejzRFFxqE+9diDq9QN8qoKaITJdxdF7QeYBnyVrvD92wOM/pU 3bimOIlxawzxibpQ2OAJNnOjIb/gDZE9RAVXB6PzgAqerxmjEtYpjv5j6orSpWK9lFcZ eypkzTbQkSkH6EdadKH6d8iWacjzN7RliAxGhX5o/Z2KjnwWflOGmeSVHXnJhvaQNIpG wzww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=CAGk9Rn0Ha/vPJ9vPva7TAWkzuhyj2TF5ORFiQAhKcs=; b=mRD0zIQaxSXx/t4JxL8biWd8P5wYYc4uj3X2GBZx9vLy7dKwecKadtdOGXqWAkazHs +e49K/4fNaP3oemyOTiNmhkgcMqmDTxXgsycCe2RIQHCUpFJF6BWo02DC58ywCBNDeyl 2QaMavE+4/YspzgVOfMN74lovCUUOwlUp1Z9snmeGj3NUzh69aSe/dlSVMq2OITTzhNm nZLXPyRxH7b6LmuSjdRcSzsPnfw41palV4NjAVTBYukRAH40PWvYDP7axAFGp+fmNmF1 ZllZRtdJQGD2BEnXbPDvEoN0WaR1YCpSd/1l02OsOm0fxb0sIMahcnwrOBLkvWmJPOXt aq9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t23si1757386pgv.63.2019.03.15.06.05.52; Fri, 15 Mar 2019 06:06:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729048AbfCONFI (ORCPT + 99 others); Fri, 15 Mar 2019 09:05:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34238 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728992AbfCONFI (ORCPT ); Fri, 15 Mar 2019 09:05:08 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 88ECB7D7BB; Fri, 15 Mar 2019 13:05:07 +0000 (UTC) Received: from [10.36.117.12] (ovpn-117-12.ams2.redhat.com [10.36.117.12]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2E130601AE; Fri, 15 Mar 2019 13:05:00 +0000 (UTC) Subject: Re: [RFC v3] sched/topology: fix kernel crash when a CPU is hotplugged in a memoryless node To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Suravee Suthikulpanit , Srikar Dronamraju , Borislav Petkov , David Gibson , Michael Ellerman , Nathan Fontenot , Michael Bringmann , linuxppc-dev@lists.ozlabs.org, Ingo Molnar References: <20190304195952.16879-1-lvivier@redhat.com> <77b142fe-0886-1510-28bd-d432ea2c796a@redhat.com> <20190315122556.GE6058@hirez.programming.kicks-ass.net> From: Laurent Vivier Message-ID: <18d517b6-0554-f241-abf8-ed085998c6d8@redhat.com> Date: Fri, 15 Mar 2019 14:05:00 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20190315122556.GE6058@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Fri, 15 Mar 2019 13:05:08 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 15/03/2019 13:25, Peter Zijlstra wrote: > On Fri, Mar 15, 2019 at 12:12:45PM +0100, Laurent Vivier wrote: > >> Another way to avoid the nodes overlapping for the offline nodes at >> startup is to ensure the default values don't define a distance that >> merge all offline nodes into node 0. >> >> A powerpc specific patch can workaround the kernel crash by doing this: >> >> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >> index 87f0dd0..3ba29bb 100644 >> --- a/arch/powerpc/mm/numa.c >> +++ b/arch/powerpc/mm/numa.c >> @@ -623,6 +623,7 @@ static int __init parse_numa_properties(void) >> struct device_node *memory; >> int default_nid = 0; >> unsigned long i; >> + int nid, dist; >> >> if (numa_enabled == 0) { >> printk(KERN_WARNING "NUMA disabled by user\n"); >> @@ -636,6 +637,10 @@ static int __init parse_numa_properties(void) >> >> dbg("NUMA associativity depth for CPU/Memory: %d\n", >> min_common_depth); >> >> + for (nid = 0; nid < MAX_NUMNODES; nid ++) >> + for (dist = 0; dist < MAX_DISTANCE_REF_POINTS; dist++) >> + distance_lookup_table[nid][dist] = nid; >> + >> /* >> * Even though we connect cpus to numa domains later in SMP >> * init, we need to know the node ids now. This is because > > What does that actually do? That is, what does it make the distance > table look like before and after you bring up the CPUs? By default the table is full of 0. When a CPU is brought up the value is read from the device-tree and the table is updated. What I've seen is this value is common for 2 nodes at a given level if they share the level. So as the table is initialized with 0, all offline nodes (no memory no cpu) are merged with node 0. My fix initializes the table with unique values for each node, so by default no nodes are mixed. > >> Any comment? > > Well, I had a few questions here: > > 20190305115952.GH32477@hirez.programming.kicks-ass.net > > that I've not yet seen answers to. I didn't answer because: - I thought this was not the good way to fix the problem as you said "it seems very fragile and unfortunate", - I don't have the answers, I'd really like someone from IBM that knows well the NUMA part of powerpc answers to these questions... and perhaps find a better solution. Thanks, Laurent