Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp20746pxb; Wed, 30 Mar 2022 21:43:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzyeXrfGpknDKj8SFb745JllXdu9RMTmzhkFTa8aJwrwfj5DCb0pxl/28Woa6snxEK2+4yf X-Received: by 2002:a17:90a:8581:b0:1b2:7541:af6c with SMTP id m1-20020a17090a858100b001b27541af6cmr3914925pjn.48.1648701807453; Wed, 30 Mar 2022 21:43:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648701807; cv=none; d=google.com; s=arc-20160816; b=rGjL9VdwYxsFFqlOnNMvSkHIA0J0l2j/+h+bSPfyp40Y7QaYjSjFEef8RbSbBnNx0E 9zsBgLA36Bn1tRRsb6mOCFUL9GYUhHzKSTv61ixzx4cgieZ5zUW986SPzoWup959dJTi 6EY+zrJ1tDBr3/VBdUXBSary0NCGpC1IpOAV81T1nN/vksbnUlE1UpMZb6phmamCAwJE 8vz8BSe7kXy+71YsBPI/QAKg2FnwsiMYnPAkJf/5b9saf7HY6OvTwfJh9dGLjncY6/0q oJjNW2ui+iGl3lpjgbleBhPbKLh/uMnlSXz683iFcu9fYYGmx4mnGp3S0qCkaDRCJNc6 3evg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=+R8Omy5O8MMfJZgBr2wOqb/tSI5UtqvMQFKMnlK8X4s=; b=SnSx2FWbBGxPop/qDmviBF2VmenZUEKM1z31KaRXwddM42l2tCXldr40Ww7PEWYE6Y Vp5Ir89YUWrHS+YWrAgw4xiEi5GtRG2emJxi46DUNDyzSYgeG49Gq3ythstaULd7Er0F 62TDct2yxqS387vhA4bI4H0CqgvdNiHJBtDVBM/8kn3pzOoaEEMSbBgDsyZn8CSaKJ87 wrlwQ8WgAkIhhusbwS53eQWbTf8cP0YrIFaGNFNOUr6ZHy1aEi3UKMhmW6syz+66JASi 7eXjf7lwy1g1vaHBITi+2poxrtzqtucMLVkmD1E3ZBfOrVlWupYJiO/9Y6mcxRhVWotZ L1HA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id l14-20020a170902f68e00b00153b2d16434si25190224plg.60.2022.03.30.21.43.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 21:43:27 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7069E276818; Wed, 30 Mar 2022 20:40:34 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348993AbiC3QsY (ORCPT + 99 others); Wed, 30 Mar 2022 12:48:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230263AbiC3Qrt (ORCPT ); Wed, 30 Mar 2022 12:47:49 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9E79326ECBA for ; Wed, 30 Mar 2022 09:46:03 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 669011576; Wed, 30 Mar 2022 09:46:03 -0700 (PDT) Received: from [10.1.196.218] (eglon.cambridge.arm.com [10.1.196.218]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 92D4C3F73B; Wed, 30 Mar 2022 09:46:01 -0700 (PDT) Subject: Re: [PATCH v3 19/21] x86/resctrl: Rename and change the units of resctrl_cqm_threshold To: Reinette Chatre , x86@kernel.org, linux-kernel@vger.kernel.org Cc: Fenghua Yu , Thomas Gleixner , Ingo Molnar , Borislav Petkov , H Peter Anvin , Babu Moger , shameerali.kolothum.thodi@huawei.com, Jamie Iles , D Scott Phillips OS , lcherian@marvell.com, bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com References: <20220217182110.7176-1-james.morse@arm.com> <20220217182110.7176-20-james.morse@arm.com> <87c00fe2-e4fc-b006-f608-3dc2a209ed77@intel.com> From: James Morse Message-ID: <1d4220ef-277d-fbb0-edb7-14f09bae0c23@arm.com> Date: Wed, 30 Mar 2022 17:45:56 +0100 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <87c00fe2-e4fc-b006-f608-3dc2a209ed77@intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Reinette, On 17/03/2022 17:00, Reinette Chatre wrote: > On 2/17/2022 10:21 AM, James Morse wrote: >> resctrl_cqm_threshold is stored in a hardware specific chunk size, >> but exposed to user-space as bytes. >> >> This means the filesystem parts of resctrl need to know how the hardware >> counts, to convert the user provided byte value to chunks. The interface >> between the architecture's resctrl code and the filesystem ought to >> treat everything as bytes. >> >> Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read() >> still returns its value in chunks, so this needs converting to bytes. >> As all the callers have been touched, rename the variable to >> resctrl_rmid_realloc_threshold, which describes what the value is for. >> @@ -762,10 +763,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r) >> * >> * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC. >> */ >> - resctrl_cqm_threshold = cl_size * 1024 / r->num_rmid; >> - >> - /* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */ >> - resctrl_cqm_threshold /= hw_res->mon_scale; >> + resctrl_rmid_realloc_threshold = cl_size * 1024 / r->num_rmid; >> >> ret = dom_data_init(r); >> if (ret) >> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> index 7ec089d72ab7..93b3697027df 100644 >> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> @@ -1030,10 +1030,7 @@ static int rdt_delay_linear_show(struct kernfs_open_file *of, >> static int max_threshold_occ_show(struct kernfs_open_file *of, >> struct seq_file *seq, void *v) >> { >> - struct rdt_resource *r = of->kn->parent->priv; >> - struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r); >> - >> - seq_printf(seq, "%u\n", resctrl_cqm_threshold * hw_res->mon_scale); >> + seq_printf(seq, "%u\n", resctrl_rmid_realloc_threshold); >> >> return 0; >> } > > > This change has some user visible impact that I am still digesting but thought > that I would share for your consideration. > > As seen in the above two snippets, the original code did: > > resctrl_cqm_threshold /= hw_res->mon_scale; /* resctrl_cqm_threshold used internally */ > > resctrl_cqm_threshold * hw_res->mon_scale; /* this is displayed to user */ > > The original loss due to truncation during the division is not recovered > when the value is displayed to the user the user may see significant differences > before and after this patch. > > I tried this out on a system with a large cache and the before and after > information is significant: > Before this patch: > info/L3_MON/max_threshold_occupancy:147456 > > After this patch: > info/L3_MON/max_threshold_occupancy:196608 Hmm. I hadn't considered that information would be lost by the current way of doing this. It looks like this happens because num_rmid isn't necessarily a power of 2. > As I understand this change indeed represents the information more accurately but > I found it noteworthy that this is not just a simple "change the units" and > may thus have broader impact and may indeed result in different behavior that > should be considered. I agree it more accurately reflects resctrl's calculation of "the number of lines tagged per RMID if all RMIDs have the same number of lines", but if that produces a number the hardware will never actually measure, then the rounding is still happening, but somewhere else. I think the right thing to do is round resctrl_rmid_realloc_threshold down to the nearest multiple of hw_res->mon_scale in rdt_get_mon_l3_config(). This way the filesystem parts still handle things in bytes, and the architecture code provides the quantised value that will actually get measured. Its this value that should be reported to user-space. It doesn't look like the 'Upscaling Factor' is guaranteed to be a power of 2, so I can't use the round_down() helpers. I've added this to the commit message: | Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power | of 2, so the existing code introduces a rounding error from resctrl's | theoretical fraction of the cache usage. This behaviour is kept as it | ensures the user visible value matches the value read from hardware | when the rmid will be reallocated. and the hunk below, which fixes it for me. Thanks, James ---------------%<--------------- diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c index b18e227d585c..fb81d650c457 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -753,6 +753,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r) unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset; struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r); unsigned int cl_size = boot_cpu_data.x86_cache_size; + u64 threshold; int ret; hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale; @@ -771,7 +772,15 @@ int rdt_get_mon_l3_config(struct rdt_resource *r) * * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC. */ - resctrl_rmid_realloc_threshold = cl_size * 1024 / r->num_rmid; + threshold = cl_size * 1024 / r->num_rmid; + + /* + * Because num_rmid may not be a power of two, round the value + * to the nearest multiple of hw_res->mon_scale so it matches a + * value the hardware will measure. mon_scale may not be a power of 2. + */ + threshold /= hw_res->mon_scale; + resctrl_rmid_realloc_threshold = threshold * hw_res->mon_scale; ret = dom_data_init(r); if (ret) ---------------%<---------------