Received: by 2002:ac0:a679:0:0:0:0:0 with SMTP id p54csp841888imp; Wed, 20 Feb 2019 09:58:46 -0800 (PST) X-Google-Smtp-Source: AHgI3Ibdi3TLJhkPKU4vOsu5W5jvrktpH2933NfNccn2ZekLWg9caAyrCb0h+PICWDX1K3oaWTe6 X-Received: by 2002:a65:6683:: with SMTP id b3mr30152250pgw.423.1550685526010; Wed, 20 Feb 2019 09:58:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550685526; cv=none; d=google.com; s=arc-20160816; b=h6Oj8o+5+PDxEj4ybd36/8me7wT12bMtZfLniS0KjT649D3eDwzQhd8qGF0mY0Pf6C WAwbyRjBzh2Ic48+DTKhU2GLPDzRrXmbUEDvTLgcjUqlUWkC3fpYM4/bXUWDimkry677 9ws+EXxYvIheauUOav52rUG+Kfgm5n/ilHMytqPkDEydLDoUA+JHxp9h5pcMwuDTy6x6 QctOsrkGuvkfrhB58LYfT+lq/AbpMA12t6INgEj3MAeBDPxowZI+YzXeqa/PJjQ7MpIa clOyuTeePGkGVnqhp0YOiug+SjBUF6Cse2fHgaxlEHei1G7dU159N8YD8CifOHzv2XWu HcIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:autocrypt:openpgp:from:references:cc:to:subject; bh=sIVuSFXZdwL7l+QHMIL2ykA+QA0YShtPKSiKtQMR4to=; b=uNSBC51y04EY38XVyiT5p0ix/Lr9s5m05w8udNpfgcQ2qGfQGtNcpJ+I0zWMmmRGGx N/4jDTZt2Fr76wxPfXADISqhMSkun+CUuUEmnAYjGqFlZgYYIHwGxs8l1yxZOEgKJ/+3 t1VfEswiqPO5BYUA75i4WV6JYYMo/YzGB1Of8DGGIxzMtu2Uu+vcSq7JImDPT07EQG8j FCiZdmtAF21fWlz0vh0HVNVkVoLFHFHVaGaeheYM7GhRASYZzh0hGSev/bZNgRJh5k0e 0vvNs4LK3aiTVZFL7yrogSdu88SVIlJt/vAIZtw/3fF5DV+L1pVi4ob5/S7l71MYLf7O zo+Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cg7si15215658plb.127.2019.02.20.09.58.30; Wed, 20 Feb 2019 09:58:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726227AbfBTR5H (ORCPT + 99 others); Wed, 20 Feb 2019 12:57:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37900 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725836AbfBTR5H (ORCPT ); Wed, 20 Feb 2019 12:57:07 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 37D3FC0EA42B; Wed, 20 Feb 2019 17:57:06 +0000 (UTC) Received: from [10.40.204.241] (ovpn-204-241.brq.redhat.com [10.40.204.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7BF7F1A835; Wed, 20 Feb 2019 17:57:02 +0000 (UTC) Subject: Re: [PATCH v2] sched/topology: fix kernel crash when a CPU is hotplugged in a memoryless node To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Suravee Suthikulpanit , Srikar Dronamraju , Borislav Petkov , David Gibson , Michael Ellerman , Nathan Fontenot , Michael Bringmann , linuxppc-dev@lists.ozlabs.org, Ingo Molnar References: <20190220165520.5241-1-lvivier@redhat.com> <20190220170841.GM32494@hirez.programming.kicks-ass.net> From: Laurent Vivier Openpgp: preference=signencrypt Autocrypt: addr=lvivier@redhat.com; prefer-encrypt=mutual; keydata= mQINBFYFJhkBEAC2me7w2+RizYOKZM+vZCx69GTewOwqzHrrHSG07MUAxJ6AY29/+HYf6EY2 WoeuLWDmXE7A3oJoIsRecD6BXHTb0OYS20lS608anr3B0xn5g0BX7es9Mw+hV/pL+63EOCVm SUVTEQwbGQN62guOKnJJJfphbbv82glIC/Ei4Ky8BwZkUuXd7d5NFJKC9/GDrbWdj75cDNQx UZ9XXbXEKY9MHX83Uy7JFoiFDMOVHn55HnncflUncO0zDzY7CxFeQFwYRbsCXOUL9yBtqLer Ky8/yjBskIlNrp0uQSt9LMoMsdSjYLYhvk1StsNPg74+s4u0Q6z45+l8RAsgLw5OLtTa+ePM JyS7OIGNYxAX6eZk1+91a6tnqfyPcMbduxyBaYXn94HUG162BeuyBkbNoIDkB7pCByed1A7q q9/FbuTDwgVGVLYthYSfTtN0Y60OgNkWCMtFwKxRaXt1WFA5ceqinN/XkgA+vf2Ch72zBkJL RBIhfOPFv5f2Hkkj0MvsUXpOWaOjatiu0fpPo6Hw14UEpywke1zN4NKubApQOlNKZZC4hu6/ 8pv2t4HRi7s0K88jQYBRPObjrN5+owtI51xMaYzvPitHQ2053LmgsOdN9EKOqZeHAYG2SmRW LOxYWKX14YkZI5j/TXfKlTpwSMvXho+efN4kgFvFmP6WT+tPnwARAQABtCNMYXVyZW50IFZp dmllciA8bHZpdmllckByZWRoYXQuY29tPokCOAQTAQIAIgUCVgVQgAIbAwYLCQgHAwIGFQgC CQoLBBYCAwECHgECF4AACgkQ8ww4vT8vvjwpgg//fSGy0Rs/t8cPFuzoY1cex4limJQfReLr SJXCANg9NOWy/bFK5wunj+h/RCFxIFhZcyXveurkBwYikDPUrBoBRoOJY/BHK0iZo7/WQkur 6H5losVZtrotmKOGnP/lJYZ3H6OWvXzdz8LL5hb3TvGOP68K8Bn8UsIaZJoeiKhaNR0sOJyI YYbgFQPWMHfVwHD/U+/gqRhD7apVysxv5by/pKDln1I5v0cRRH6hd8M8oXgKhF2+rAOL7gvh jEHSSWKUlMjC7YwwjSZmUkL+TQyE18e2XBk85X8Da3FznrLiHZFHQ/NzETYxRjnOzD7/kOVy gKD/o7asyWQVU65mh/ECrtjfhtCBSYmIIVkopoLaVJ/kEbVJQegT2P6NgERC/31kmTF69vn8 uQyW11Hk8tyubicByL3/XVBrq4jZdJW3cePNJbTNaT0d/bjMg5zCWHbMErUib2Nellnbg6bc 2HLDe0NLVPuRZhHUHM9hO/JNnHfvgiRQDh6loNOUnm9Iw2YiVgZNnT4soUehMZ7au8PwSl4I KYE4ulJ8RRiydN7fES3IZWmOPlyskp1QMQBD/w16o+lEtY6HSFEzsK3o0vuBRBVp2WKnssVH qeeV01ZHw0bvWKjxVNOksP98eJfWLfV9l9e7s6TaAeySKRRubtJ+21PRuYAxKsaueBfUE7ZT 7ze5Ag0EVgUmGQEQALxSQRbl/QOnmssVDxWhHM5TGxl7oLNJms2zmBpcmlrIsn8nNz0rRyxT 460k2niaTwowSRK8KWVDeAW6ZAaWiYjLlTunoKwvF8vP3JyWpBz0diTxL5o+xpvy/Q6YU3BN efdq8Vy3rFsxgW7mMSrI/CxJ667y8ot5DVugeS2NyHfmZlPGE0Nsy7hlebS4liisXOrN3jFz asKyUws3VXek4V65lHwB23BVzsnFMn/bw/rPliqXGcwl8CoJu8dSyrCcd1Ibs0/Inq9S9+t0 VmWiQWfQkz4rvEeTQkp/VfgZ6z98JRW7S6l6eophoWs0/ZyRfOm+QVSqRfFZdxdP2PlGeIFM C3fXJgygXJkFPyWkVElr76JTbtSHsGWbt6xUlYHKXWo+xf9WgtLeby3cfSkEchACrxDrQpj+ Jt/JFP+q997dybkyZ5IoHWuPkn7uZGBrKIHmBunTco1+cKSuRiSCYpBIXZMHCzPgVDjk4viP brV9NwRkmaOxVvye0vctJeWvJ6KA7NoAURplIGCqkCRwg0MmLrfoZnK/gRqVJ/f6adhU1oo6 z4p2/z3PemA0C0ANatgHgBb90cd16AUxpdEQmOCmdNnNJF/3Zt3inzF+NFzHoM5Vwq6rc1JP jfC3oqRLJzqAEHBDjQFlqNR3IFCIAo4SYQRBdAHBCzkM4rWyRhuVABEBAAGJAh8EGAECAAkF AlYFJhkCGwwACgkQ8ww4vT8vvjwg9w//VQrcnVg3TsjEybxDEUBm8dBmnKqcnTBFmxN5FFtI WlEuY8+YMiWRykd8Ln9RJ/98/ghABHz9TN8TRo2b6WimV64FmlVn17Ri6FgFU3xNt9TTEChq AcNg88eYryKsYpFwegGpwUlaUaaGh1m9OrTzcQy+klVfZWaVJ9Nw0keoGRGb8j4XjVpL8+2x OhXKrM1fzzb8JtAuSbuzZSQPDwQEI5CKKxp7zf76J21YeRrEW4WDznPyVcDTa+tz++q2S/Bp P4W98bXCBIuQgs2m+OflERv5c3Ojldp04/S4NEjXEYRWdiCxN7ca5iPml5gLtuvhJMSy36gl U6IW9kn30IWuSoBpTkgV7rLUEhh9Ms82VWW/h2TxL8enfx40PrfbDtWwqRID3WY8jLrjKfTd R3LW8BnUDNkG+c4FzvvGUs8AvuqxxyHbXAfDx9o/jXfPHVRmJVhSmd+hC3mcQ+4iX5bBPBPM oDqSoLt5w9GoQQ6gDVP2ZjTWqwSRMLzNr37rJjZ1pt0DCMMTbiYIUcrhX8eveCJtY7NGWNyx FCRkhxRuGcpwPmRVDwOl39MB3iTsRighiMnijkbLXiKoJ5CDVvX5yicNqYJPKh5MFXN1bvsB kmYiStMRbrD0HoY1kx5/VozBtc70OU0EB8Wrv9hZD+Ofp0T3KOr1RUHvCZoLURfFhSQ= Message-ID: <11997145-4718-ed17-6085-54be18bf85ba@redhat.com> Date: Wed, 20 Feb 2019 18:57:01 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: <20190220170841.GM32494@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 20 Feb 2019 17:57:06 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 20/02/2019 18:08, Peter Zijlstra wrote: > On Wed, Feb 20, 2019 at 05:55:20PM +0100, Laurent Vivier wrote: >> index 3f35ba1d8fde..372278605f0d 100644 >> --- a/kernel/sched/topology.c >> +++ b/kernel/sched/topology.c >> @@ -1651,6 +1651,7 @@ void sched_init_numa(void) >> */ >> tl[i++] = (struct sched_domain_topology_level){ >> .mask = sd_numa_mask, >> + .flags = SDTL_OVERLAP, > > This makes no sense what so ever. The numa identify node should not have > overlap with other domains. > > Are you sure this is not because of the utterly broken powerpc nonsense > where they move CPUs between nodes? No, I'm not sure. This why I've Cc: powerpc folks. My conclusion is only based on the before/after changes. I've tested some patches from powerpc ML, but they don't fix this problem: powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update powerpc/pseries: Perform full re-add of CPU for topology update post-migration So the only reason I can see to have a corrupted sched_group list is the sched_domain_span() fonction doesn't return a correct cpumask for the domain once a new CPU is added. Thanks, Laurent