Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp4728455rdb; Fri, 15 Sep 2023 10:23:34 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHZ1lyG4OjCkjQHjEYJdlhoyk5jEFm5LauyoD1upnVCCD1VoryfO1ooBmamalpLEQxvzwco X-Received: by 2002:a05:6a20:54a9:b0:157:229a:db21 with SMTP id i41-20020a056a2054a900b00157229adb21mr2769225pzk.56.1694798613540; Fri, 15 Sep 2023 10:23:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694798613; cv=none; d=google.com; s=arc-20160816; b=qmhJEyef7TIhUNRKpRcZwDVAfuqs/vnn6Hj1td1ZcqPgXQWriADcxW25f9yrJU+X7F eEhkXoEzpqq/Zj4wcWMzjUuNngbRX8JgszGTtx8s8pDy2KPobu9UPkIMVLMKKdZtW0gA AL/ZNlUzPztWxGLEgKqeC1WrZAiiPRASvJ5knvoYdll5RlUXBNkPjg6ZnJWtYVCUU3S5 odvbmy4NjZ1lLyVQs+Yavjhy2h0EnuPkpdL4xHQaRnfI8xRG90tC/mODtqQZ7wXsZ13t kQ4cDlkRpqaUHqg4uuDLyeGVO8mabhVD513EErFQBvr/SEMVpZ5+xuCUzXSsGer7TzPL L0ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=Pgdenh113Om7PsyECTSZ8QKtL91+aXOHsEk6bc538J8=; fh=QQbm8ez5dpMb+LgSiirL6PYUOhf28GpoSdJvZaaW0Zg=; b=A2nZMD1r7/ap4RYolT+ooFlG7zTFx2hLG7w1kNWS5kLgYNQrGjj/e9p6J3Vph7zTCX u2Nz2oMnQjULyU5S/yfCttsHXQo/QRIOX7OQPhwdYUEUmp/GFEFdyTMPap7L0xjTKp+I y3jHhh+Ph6LSWEH7mT14OzWR0PRECkLxjcAZL4pNZQRmxcYmS1XMbHm4k8I1GErxGA7I ORalAj5ZoVtkuzU7VhYZTMJFLe/hycfDSAnlnrKxkDfFPJNquHK6S7VRCoT5RJJFxHcB 716iWX7x6/zw6oEVsoc1WIoHal+/O1HQ4NQqL7cogDnASFfBeglSZFiW9TQgQPiDC/RC uu3A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id d3-20020a633603000000b0057750d57806si3555470pga.136.2023.09.15.10.23.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Sep 2023 10:23:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id E78FA807BA35; Fri, 15 Sep 2023 10:17:24 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234857AbjIORQv (ORCPT + 99 others); Fri, 15 Sep 2023 13:16:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235526AbjIORQn (ORCPT ); Fri, 15 Sep 2023 13:16:43 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 292F983 for ; Fri, 15 Sep 2023 10:16:38 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 23CCB1FB; Fri, 15 Sep 2023 10:17:15 -0700 (PDT) Received: from [10.1.197.60] (eglon.cambridge.arm.com [10.1.197.60]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 114303F5A1; Fri, 15 Sep 2023 10:16:35 -0700 (PDT) Message-ID: Date: Fri, 15 Sep 2023 18:16:11 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: resctrl2 - status Content-Language: en-GB To: Tony Luck , Reinette Chatre , Babu Moger Cc: Amit Singh Tomar , "Yu, Fenghua" , George Cherian , "robh@kernel.org" , "peternewman@google.com" , Drew Fustini , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" References: <35f05064-a412-ad29-5352-277fb147bbc4@intel.com> <9742f177-a0ce-c5d3-5d92-90dda32f5d07@intel.com> From: James Morse In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.2 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 15 Sep 2023 10:17:25 -0700 (PDT) Hi Tony, On 06/09/2023 19:21, Tony Luck wrote: > I've just pushed an updated set of patches to: > > git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git resctrl_v65 > > Rebased to v6.5. Includes the module auto-load code discussed in > previous e-mail. I've taken a look at your resctrl2_v65rc7 branch. I don't think my reply to your first series got sent, sorry if I'm actually repeating myself. It goes without saying that I'd prefer we refactor the existing filesystem into the shape we need, instead of a wholesale replacement. All this focusses on the structure of the code, but we can change that anytime. I'd prefer we work out what user-space needs and go from there. I think the problems are: * Discovering what the platform supports, including possible conflicting schemes. * Architecture/Platform specific schema. e.g. SMBA. MPAM will have a handful of these too. * User-space should know which schema are portable, and which are not. e.g. L2/L3/MB. * Different control types for the same resource. e.g. Intel uses a percentage for bandwidth control, AMD an absolute number. We should standardise this sort of thing and make it discoverable. * Conflicting schemes for the same hardware. e.g. CDP and mba_MBps ~ I'd really like to keep the 'core' schema in the base driver. This is to ensure that 'L3' doesn't behave differently due to differing software implementation on Intel/AMD/Arm/riscv. The L{2.3} schema were really well defined as they reference other things that are already visible in sysfs, this makes them nice and portable. I think this is the standard we should aim for with any new schema, and only resort to making them arch-specific if the hardware is just too strange. I tried to find a way to do this for SMBA as arm platforms will certainly have something similar, but the 'any other NUMA node' behaviour was just too specific to what was built. I suspect this could be done with a frontend/backend split, with a 'common' directory holding the frontend driver for schema that apply to all architectures if the backend has shown up. ~ Because I think the types a schema is configured with should be portable across architectures, I'd prefer the string parsing code lives in the core code, and isn't duplicated by every submodule! String parsing in the kernel is bad enough! The obvious types we have so far are: bitmap/percentage/number/bandwidth. I think it makes sense for user-space to be told what the schema is configured with, and let the core code do that parsing. ~ I don't have a good answer for conflicting drivers for the same hardware resource. I think we'd need to find a way of making the existing command-line arguments work, causing the corresponding module to auto-load. But this sucks for distros, who would need somewhere to describe the policy of which modules get loaded. The good news is things like libvirt aren't mounting resctrl themselves. ~ I suspect the CDP schemes should be made arch-specific as x86 needs the odd/even pairing, whereas MPAM does not. This would allow a scheme where the Iside CLOSIDs can be shared to avoid halving the available CLOSID. Having somewhere sensible to put the MPAM driver is useful. It's currently dumped in drivers/platform as it doesn't really fit anywhere! Allowing each submodule to add extra files to the info directories looks useful. MPAM's priority control has a property to say whether 0 is the top of the scale or not, this would allow it to be exposed to user-space, instead of having to shift the range up/down to hide the difference in behaviour. MPAM only needs to update the hardware on a CPU that is part of target domain on some platforms. The vast majority would allow the MMIO writes to come from anywhere. Having the applychanges behaviour specific to the submodule would reduce the number of IPI. From what I've seen, riscv never needs an IPI here. It looks like all the core code runs at process context without touching the pre-empt counter or masking interrupts - this is a really nice property. Most of my headaches have come from this area. The limbo work isn't arch specific, that is a property of caches, it should really be core code behaviour to avoid duplication. MPAM needs that, I expect riscv does too. Making the CLOSID/RMID allocation behaviour arch specific saves some headaches. MPAM is particularly different in this area. I don't know what riscv's behaviour is here. > James: > > I'm now hoping for some feedback from ARM folks on whether this is a > useful direction. Is it possible to implement MPAM features on top of > this base architecture independent layer. If not, am I just missing > some simple extensions that you will need. Or is there some fundamental > problem that looks hard/impossible to resolve? You've got an rdt_l2_cat driver which is really a symlink to rdt_l3_cat built with different pre-processor defines. It's a good trick, but it only works because you know the full range of hardware that has been built. MPAM can't do things like this, it really is a bag of bits, and it isn't until the ACPI tables are parsed to find where the resources are - and then that hardware is probed - that we have a clue what was built. For example, the cache controls could be on any cache in the hierarchy, and the number is derived from the structure of the PPTT table. You've already got a fair amount of duplication when multiple struct resctrl_resources are defined. MPAM would have to allocate and populate these dynamically to avoid building a module for 'L5' ... just in case someone built that... The mba_MBps logic is really just software, I see no reason to duplicate it on multiple architectures. This should be in the core of the filesystem. We already have a mount option to enable it. I see the arch/submodules can't influence the domain numbers ... this is a problem as it hardcodes "you can have L2 or L3", which is something you were trying to avoid. MPAM will want to use NUMA-IDs for memory-side-caches, (I'd hope this to be a core/portable schema), as well as IOMMU-IDs for the I/O side of this. I don't think this is really a problem, as I'd like to add the things I need in this area as core/portable schema. Arm's IOMMU has support to label traffic with the equivalent to CLOSID/RMID. My prototype support for this adds IOMMU-groups to the resctrl tasks file so that the devices can be moved between control/monitor groups as if they were tasks. I think this would work for other architectures if they had similar support as IOMMU-groups are an existing concept. Thanks, James