Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp18466418ybl; Fri, 3 Jan 2020 02:59:16 -0800 (PST) X-Google-Smtp-Source: APXvYqzkpkX8g8EinrS78McsWK6UA3bABBVfWGWUDyaZOwdpWkKTSc0vuBgfSVFJ6jbd91CI9U+Y X-Received: by 2002:a9d:67c1:: with SMTP id c1mr62819013otn.161.1578049156537; Fri, 03 Jan 2020 02:59:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578049156; cv=none; d=google.com; s=arc-20160816; b=HEN2k9S9RYByuoFClb5kGGIthsMumJAL7QdxY8RBb8w098qB5lIzgg+PDo9R1qso0k 9AyccLMwL/HlKkEypUh7Ftsh4vAvW4v/lobiw4PDfLQN/Iz1XJuqYRJlj6JTeWLCabg7 Uz3nKLb25xJJWKl3WL9TpYchl7A2ygyHn4pOIq84ai37NBZFGlmQR+WsVcUMDGxncS4e basktpz/58HsZXg0mcnyQKCiIYGRNrYEjDgiZ2UCKvGMHS+v8oBhHztwinhkQ6PQdjLu TLQqH9HK8ExKXMlM6eILoj17TmuS5X6VU95gMv1vx7EvQVoPpFYVL6lMN71m3kTcCKGV X6MA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=oeMLSyiSpLlFksD936F7sd25s3eIQCAH1yjJc9szNeM=; b=HSxoOLjLQki4nnw1aocwJ8+KsRLegfVdmWzAhmtcqFW/whWVLkS69sGfCgsCgeJCCD SY6QLplN4/JbYRnKXRK3Ax54T8QzUr8SSnzpH5BrXjjNWASghQAe+pvjqCeUnKV0mDaA 875pmo2vFouYilpz4eyFlhDGZNpdb8eHnqEhi7UKMrClANwmWZZx/+seSIBOmGne+JsY cc7C02gXBo/sShAiY6FZfrKzG0yYCDYWnl1zzyADh7r6klNOJPTwCgmLdnQBgmxHPL4M xp1g8fvK/+NbJtSWSTy8seBs9gmn1onaM+LWHFf+OFt8QdrUhN/vfFtem4mi2ZbwDggf Er8g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c126si29245082oib.239.2020.01.03.02.58.53; Fri, 03 Jan 2020 02:59:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727494AbgACK5q (ORCPT + 99 others); Fri, 3 Jan 2020 05:57:46 -0500 Received: from foss.arm.com ([217.140.110.172]:54610 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727220AbgACK5p (ORCPT ); Fri, 3 Jan 2020 05:57:45 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2F9AC1FB; Fri, 3 Jan 2020 02:57:45 -0800 (PST) Received: from [10.1.194.46] (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4A5E33F703; Fri, 3 Jan 2020 02:57:44 -0800 (PST) Subject: Re: [PATCH] cpu-topology: warn if NUMA configurations conflicts with lower layer To: "Zengtao (B)" , Sudeep Holla Cc: Linuxarm , Greg Kroah-Hartman , "Rafael J. Wysocki" , "linux-kernel@vger.kernel.org" , Morten Rasmussen References: <1577088979-8545-1-git-send-email-prime.zeng@hisilicon.com> <20191231164051.GA4864@bogus> <678F3D1BB717D949B966B68EAEB446ED340AE1D3@dggemm526-mbx.china.huawei.com> <20200102112955.GC4864@bogus> <678F3D1BB717D949B966B68EAEB446ED340AEB67@dggemm526-mbx.china.huawei.com> <678F3D1BB717D949B966B68EAEB446ED340AFCA0@dggemm526-mbx.china.huawei.com> From: Valentin Schneider Message-ID: <7b375d79-2d3c-422b-27a6-68972fbcbeaf@arm.com> Date: Fri, 3 Jan 2020 10:57:43 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <678F3D1BB717D949B966B68EAEB446ED340AFCA0@dggemm526-mbx.china.huawei.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/01/2020 04:24, Zengtao (B) wrote: >>> From your example, the core 7 and core 8 has got different LLC but the >> same Low >>> Level cache? >> >> AFAIA what matters here is memory controllers, less so LLCs. Cores within >> a single die could have private LLCs and separate memory controllers, or >> shared LLC and separate memory controllers. >> >>> From schedule view of point, lower level sched domain should be a subset >> of higher >>> Level sched domain. >>> >> >> Right, and that is checked when you have sched_debug on the cmdline >> (or write 1 to /sys/kernel/debug/sched_debug & regenerate the sched >> domains) >> > > No, here I think you don't get my issue, please try to understand my example > First:. > > ************************************* > NUMA: 0-2, 3-7 > core_siblings: 0-3, 4-7 > ************************************* > When we are building the sched domain, per the current code: > (1) For core 3 > MC sched domain fallbacks to 3~7 > DIE sched domain is 3~7 > (2) For core 4: > MC sched domain is 4~7 > DIE sched domain is 3~7 > > When we are build sched groups for the MC level: > (1). core3's sched groups chain is built like as: 3->4->5->6->7->3 > (2). core4's sched groups chain is built like as: 4->5->6->7->4 > so after (2), > core3's sched groups is overlapped, and it's not a chain any more. > In the afterwards usecase of core3's sched groups, deadloop happens. > > And it's difficult for the scheduler to find out such errors, > that is why I think a warning is necessary here. > I was mostly commenting on Sudeep's example, sorry for the confusion. The case you describe in the changelog (and again here in your reply) is indeed slightly different, and quite more dramatic (we "corrupt" the sched_group list). I believe that could still be detected by the scheduler: the problem is that we have non-equal yet non-disjoint non-NUMA sched domain spans; AFAIA the only accepted configurations are either completely equal or completely disjoint (otherwise we end up "corrupting" the sched group chain as above, because we violate the assumptions used for building groups, see topology.c::get_group()). I'm juggling with other things atm, but let me have a think and see if we couldn't detect that in the scheduler itself. >> Now, I don't know how this plays out for the numa-in-package topologies >> like >> the one suggested by Sudeep. x86 and AMD had to play some games to >> get >> numa-in-package topologies working, see e.g. >> >> cebf15eb09a2 ("x86, sched: Add new topology for multi-NUMA-node >> CPUs") >> >> perhaps you need to "lie" here and ensure sub-NUMA sched domain levels >> are >> a subset of the NUMA levels, i.e. lie for core_siblings. I might go and >> play with this to see how it behaves.