Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp3241061pxu; Mon, 19 Oct 2020 07:29:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxEk5D7oU7fE2S2ATvpui8I+PiCjL62qj2SGUQHMMZyEyJZI0nQ0ha0khL+gv64rB7aOpLo X-Received: by 2002:a17:906:8349:: with SMTP id b9mr206628ejy.88.1603117779417; Mon, 19 Oct 2020 07:29:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603117779; cv=none; d=google.com; s=arc-20160816; b=Zb84mRfHVOuuNeAHiR4epcVWlDSde6xBBfNZtpH/1zDMGXmYVujkRKEQdrfiKG/k7B L4K3uDCRLsmrntBZGZfYejwIaoaN0UFS9yow1hjuUH/rtWTYpIj81VJ1Vopaqno4NcMO 0oTgUWjDQ3CjGXAmHOr0vUOkLu2fda+vxlnLoNqMqjjDkFB89fWgaVL9sH0I42qNUxj8 3/5E0lOEFA+2mijPy3TlOBMeNvZz2YUffynrvICR5zKPg7B80tnZ4mda6TAoHFZIgv4N M6agLFMqMcL8AdF6iFnhebr6h2yaUG88r7MplAIld8sZmpy6a5O4mGvZyLJHMXFScUFx jFXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date; bh=QhcMitrJ8otPS0os5/IDws8YxlIjAhX1nLdf9+EeS/g=; b=TzFygWGWRGGASXM6y3MVCE7+SBMXJQshpRqqZFbJCjmJ/rrEV0NAfs3bqyUB97ZdY4 XtUrg4K55yecaltM9CN/o1GsylH6/qTIb1dsyldEl52rNu0++7PmOduMw2eNvsDWZvdy Buja3r4YbZjXQkGWCc7NnJxLKLhEbvRWN5rvEdrbdD16VAbhXX/E1OC37fZjgA7q2u7+ MHBi2986FoD0k7tmC17FsZx+GdEf2r0aI0ZO2m2MC9/6PBQdoYjBVApY29+7MtgPrSfK 2tYOFo7P+0RIsPmffWyLXwFcDXrX021chlaG72AKTYPgv1TPaiYkkUq0Y7ddGu+3mUrc oWzg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r18si7881990edy.411.2020.10.19.07.29.16; Mon, 19 Oct 2020 07:29:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729368AbgJSO1T (ORCPT + 99 others); Mon, 19 Oct 2020 10:27:19 -0400 Received: from lhrrgout.huawei.com ([185.176.76.210]:2991 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728311AbgJSO1S (ORCPT ); Mon, 19 Oct 2020 10:27:18 -0400 Received: from lhreml710-chm.china.huawei.com (unknown [172.18.7.108]) by Forcepoint Email with ESMTP id D268999E714A7DA46F91; Mon, 19 Oct 2020 15:27:16 +0100 (IST) Received: from localhost (10.227.96.57) by lhreml710-chm.china.huawei.com (10.201.108.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1913.5; Mon, 19 Oct 2020 15:27:16 +0100 Date: Mon, 19 Oct 2020 15:27:15 +0100 From: Jonathan Cameron To: Valentin Schneider CC: Morten Rasmussen , Peter Zijlstra , , , , , Len Brown , Greg Kroah-Hartman , Sudeep Holla , , Will Deacon , , Brice Goglin , Jeremy Linton , Jerome Glisse Subject: Re: [RFC PATCH] topology: Represent clusters of CPUs within a die. Message-ID: <20201019142715.00005fb1@huawei.com> In-Reply-To: References: <20201016152702.1513592-1-Jonathan.Cameron@huawei.com> <20201019103522.GK2628@hirez.programming.kicks-ass.net> <20201019123226.00006705@Huawei.com> <20201019131052.GC8004@e123083-lin> Organization: Huawei tech. R&D (UK) Ltd. X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; i686-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.227.96.57] X-ClientProxiedBy: lhreml717-chm.china.huawei.com (10.201.108.68) To lhreml710-chm.china.huawei.com (10.201.108.61) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 19 Oct 2020 14:48:02 +0100 Valentin Schneider wrote: > +Cc Jeremy > > On 19/10/20 14:10, Morten Rasmussen wrote: > > Hi Jonathan, > > The problem I see is that the benefit of keeping tasks together due to > > the interconnect layout might vary significantly between systems. So if > > we introduce a new cpumask for cluster it has to have represent roughly > > the same system properties otherwise generic software consuming this > > information could be tricked. > > > > If there is a provable benefit of having interconnect grouping > > information, I think it would be better represented by a distance matrix > > like we have for NUMA. > > > > Morten > > That's my queue to paste some of that stuff I've been rambling on and off > about! > > With regards to cache / interconnect layout, I do believe that if we > want to support in the scheduler itself then we should leverage some > distance table rather than to create X extra scheduler topology levels. > > I had a chat with Jeremy on the ACPI side of that sometime ago. IIRC given > that SLIT gives us a distance value between any two PXM, we could directly > express core-to-core distance in that table. With that (and if that still > lets us properly discover NUMA node spans), we could let the scheduler > build dynamic NUMA-like topology levels representing the inner quirks of > the cache / interconnect layout. You would rapidly run into the problem SLIT had for numa node description. There is no consistent description of distance and except in the vaguest sense or 'nearer' it wasn't any use for anything. That is why HMAT came along. It's far from perfect but it is a step up. I can't see how you'd generalize those particular tables to do anything for intercore comms without breaking their use for NUMA, but something a bit similar might work. A lot of thought has gone in (and meeting time) to try an improve the situation for complex topology around NUMA. Whilst there are differences in representing the internal interconnects and caches it seems like a somewhat similar problem. The issue there is it is really really hard to describe this stuff with enough detail to be useful, but simple enough to be usable. https://lore.kernel.org/linux-mm/20181203233509.20671-1-jglisse@redhat.com/ > > It's mostly pipe dreams for now, but there seems to be more and more > hardware where that would make sense; somewhat recently the PowerPC guys > added something to their arch-specific code in that regards. Pipe dream == something to work on ;) ACPI has a nice code first model of updating the spec now, so we can discuss this one in public, and propose spec changes only once we have an implementation proven. Note I'm not proposing we put the cluster stuff in the scheduler, just provide it as a hint to userspace. Jonathan