Received: by 2002:ab2:7a09:0:b0:1f8:46dc:890e with SMTP id k9csp162728lqo; Wed, 15 May 2024 10:24:03 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCX85rV0EvJFEbfRodLr4FhW6c/dfc6oUtJQRPYPOrz2gaGOENtI4C1FjG6NJQGzC1+bBvfWpuLeeowidboSjbjpMhvSR4FkotGoocz9lA== X-Google-Smtp-Source: AGHT+IEcbZw3frsDQVXiLxyWB4RWU7iB6fvJ7oJ6dGUjYHMz6FA+Ja1Tloy+6cvifxfV1ZvPDDVk X-Received: by 2002:a05:6358:338c:b0:192:3802:275c with SMTP id e5c5f4694b2df-193bb519954mr1628144855d.9.1715793843211; Wed, 15 May 2024 10:24:03 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715793843; cv=pass; d=google.com; s=arc-20160816; b=UNkO1L6ETZqke9lWZNE96sk38tonuiSB2r6PjD8+BHTTH1/FkwnumAdxnVoekS5iOI BYRFLyikTeeFC/AgfJquSNxj7aP8GiNWa8vsRQeG+hDiID7PCb7rZfap+Nm7JRR0Zr0+ yVlCdmx8eRXGWGUgL/AwtLAMiA0YFcnjTSyJUaNtJ2p6eNYVldpEEFzZgzejGgJDWRpn rZ/5dZCenkQQjv2pGndF5ZEM+2yPtzJH3TY+4BlVblJD0NtMjNJQZsoABkTJrmH12RqK 6qPadCshDoLn5unL9Jm0ljOqzrebRyAUrB/SYGZXrChuu3k2Eg3M8Izrh+pgT5D7friI mMeQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=NhqAnvHOu4yp0XvRc/IZBRRiVasAKhTfHIx14ctBpwA=; fh=MlswO9Oy1zn4InyVC+lDwvZD58uakDgHpc3HzWkB16E=; b=ynZaTc6K9ZmzaQT7NUpuZ1poA7GMMJT21HdOr2rehLAJYXaUsLPtzk/dYgSpEthy6W 7M1qfDH852DwuhX3h8bWhqOtXw3bYzO18+zuxg6hflkAVjguODZyMY9jSI6RQcu8CAoV 71A97yr1/r/Hp4Hd/1Yp9P0dnnggnHx0l5E5flxQUcL3ZW+kiACE2IAKSLzO9clx/4Zx UqoGwEnXS1YP/28Tu82S6mPAKcHyEdRsnDIVPNkny2waLZUD0U+V/LPhuSILLiHWSPPu Y9aS8v/SPtazhF1853DloxMVEYorrFMurKInPE8SSH3xM7yu1rbwW+hZPqYEF/3RbwCl Lo+Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="iWAu/hOS"; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-180202-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-180202-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id 41be03b00d2f7-6576d27743bsi661646a12.737.2024.05.15.10.24.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 May 2024 10:24:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-180202-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="iWAu/hOS"; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-180202-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-180202-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 3DDF0286D1C for ; Wed, 15 May 2024 17:23:17 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 70EFF85959; Wed, 15 May 2024 17:23:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iWAu/hOS" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2EF5D39FD4 for ; Wed, 15 May 2024 17:23:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715793789; cv=none; b=shAMoyJbWVMTUb9LjnCKzcbk053IkK0b0IdTMA1dsHNQ1vRKj9we1qlB/I2NwGQRmFLjKM9R2kpdi62y0eifxn5A9UtrAJaAOZutuS0HTQP4+VrTs4PxFQ8GFidzyQWTApbKjyPq8LYrltBjgKi7kQyhWOd9dI/Aq8VS0GKh/YY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715793789; c=relaxed/simple; bh=saOhvCKUEgcFey82Kci69bV1jdk4J6sSMDYGD2dNYlc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=hmqicMqtqaXeqCxAV4SB7CiIgTlKz+czLOtUiCv/EAvGhZk/yinqLTZZB7GW928P8jAE9gm3193ovKagp7eoojMcAb1eip6JSotzAhfgZorpK2jXJagWPHKbnrt5YimWNL2a6uzYh7WFhGCqFJcYjOlDGVmOCyAoxBL6XcIuoWQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iWAu/hOS; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715793785; x=1747329785; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=saOhvCKUEgcFey82Kci69bV1jdk4J6sSMDYGD2dNYlc=; b=iWAu/hOSZMdTtPmSfFrebAO45Pam9z+x+PQT59uWYVuVJFBzKUS68HGl 39hT/JZUOUxPX9FwBvT6lJeE+Elyc+R7sk5SeEr8OZ3CoBxD75uNPysWT SP0bUhTYrSQGyVH91W8wO6QJxm1ZJX1eK2V9R7zXG9slR3JH0uyLcpxlg 5FPcXzbxxuBGP27kJTRYl74CKFVmVEF+igA6zOojFXqF0/2mqSKnJ0U4V HXeOuA6mXSSjffoqwZ4BVob89rpaa5Rw4MtCMTap4fjYOgDOAf/4OcbBG ht9R/bRYbPKc9vFyknA5pzf5E3u7OPwjog2s49ypK8Qp2Cqfe/5T0v/3h w==; X-CSE-ConnectionGUID: JyYg/4tjSI+vXQuSPXOPwg== X-CSE-MsgGUID: phDIplfXRl6CsshCQ2RGXA== X-IronPort-AV: E=McAfee;i="6600,9927,11074"; a="12034448" X-IronPort-AV: E=Sophos;i="6.08,162,1712646000"; d="scan'208";a="12034448" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2024 10:23:04 -0700 X-CSE-ConnectionGUID: XGC1ClPRSGaZRQaN91mZPw== X-CSE-MsgGUID: Q+zEai0zQxy9w+wTFGA7YQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,162,1712646000"; d="scan'208";a="36010447" Received: from agluck-desk3.sc.intel.com (HELO agluck-desk3) ([172.25.222.105]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2024 10:23:04 -0700 Date: Wed, 15 May 2024 10:23:02 -0700 From: Tony Luck To: Reinette Chatre Cc: "Yu, Fenghua" , "Wieczor-Retman, Maciej" , Peter Newman , James Morse , Babu Moger , Drew Fustini , Dave Martin , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "patches@lists.linux.dev" Subject: Re: [PATCH v17 7/9] x86/resctrl: Add new monitor files for Sub-NUMA cluster (SNC) monitoring Message-ID: References: <20240503203325.21512-8-tony.luck@intel.com> <0178e84e-d55f-47bf-b8b0-58e05fcaa108@intel.com> <2efcae46-736a-4809-8530-7dde3977f3ce@intel.com> <666470f0-e8c0-423e-bcb3-ea16359f5cfd@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <666470f0-e8c0-423e-bcb3-ea16359f5cfd@intel.com> On Wed, May 15, 2024 at 09:47:28AM -0700, Reinette Chatre wrote: > Hi Tony, > > On 5/14/2024 2:53 PM, Tony Luck wrote: > > On Tue, May 14, 2024 at 01:30:05PM -0700, Reinette Chatre wrote: > >> Hi Tony, > >> > >> On 5/14/2024 11:26 AM, Luck, Tony wrote: > >>>> On 5/13/2024 5:21 PM, Tony Luck wrote: > >>>>> On Mon, May 13, 2024 at 11:53:17AM -0700, Reinette Chatre wrote: > >>>>>> On 5/13/2024 10:05 AM, Tony Luck wrote: > >>>>>>> On Fri, May 10, 2024 at 02:24:13PM -0700, Reinette Chatre wrote: > >>>>>>> Thanks for the review. Detailed comments below. But overall I'm > >>>>>>> going to split patch 7 into a bunch of smaller changes, each with > >>>>>>> a better commit message. > >>>>>>> > >>>>>>>> On 5/3/2024 1:33 PM, Tony Luck wrote: > >>>>>>>> > >>>>>>>> (Could you please start the changelog with some context?) > >>>>>>>> > >>>>>>>>> Add a field to the rdt_resource structure to track whether monitoring > >>>>>>>>> resources are tracked by hardware at a different scope (NODE) from > >>>>>>>>> the legacy L3 scope. > >>>>>>>> > >>>>>>>> This seems to describe @mon_scope that was introduced in patch #3? > >>>>>>> > >>>>>>> Not really. Patch #3 made the change so that control an monitor > >>>>>>> functions can have different scope. That's still needed as with SNC > >>>>>>> enabled the underlying data collection is at the node level for > >>>>>>> monitoring, while control stays at the L3 cache scope. > >>>>>>> > >>>>>>> This new field describes the legacy scope of monitoring, so that > >>>>>>> resctrl can provide correctly scoped monitor files for legacy > >>>>>>> applications that aren't aware of SNC. So I'm using this both > >>>>>>> to indicate when SNC is enabled (with mon_scope != mon_display_scope) > >>>>>>> or disabled (when they are the same). > >>>>>> > >>>>>> This seems to enforce the idea that these new additions aim to be > >>>>>> generic on the surface but the only goal is to support SNC. > >>>>> > >>>>> If you have some more ideas on how to make this more generic and > >>>>> less SNC specific I'm all ears. > >>>> > >>>> It may not end up being totally generic. It should not pretend to be > >>>> when it is not. It makes the flows difficult to follow when there are > >>>> these unexpected checks/quirks in what claims to be core code. > >>> > >>> Do you want some sort of warning comments in pieces of code > >>> that are SNC specific? > >> > >> I cannot think now where warnings will be appropriate but if you > >> find instances then please do. To start the quirks can at least be > >> documented. For example, "Only user of is SNC, which does > >> not require so simplify by ..." > > > > The main spot that triggered this line of discussion was changing the > > sanity check that operations to read monitors is being done from a > > CPU within the right domain. I've added a short comment on the new > > check: > > > > - if (!cpumask_test_cpu(smp_processor_id(), &d->hdr.cpu_mask)) > > + /* Event counts can only be read from a CPU on the same L3 cache */ > > + if (d->display_id != get_cpu_cacheinfo_id(smp_processor_id(), r->mon_display_scope)) > > return -EINVAL; > > > > But my change embeds the assumption that monitor events are L3 scoped. > > > > Should it be something like this (to keep the non-SNC case generic): > > > > if (r->mon_scope == r->mon_display_scope) { > > if (!cpumask_test_cpu(smp_processor_id(), &d->hdr.cpu_mask)) > > return -EINVAL; > > Yes, keeping this check looks good to me ... > > > } else { > > /* > > * SNC: OK to read events on any CPU sharing same L3 > > * cache instance. > > */ > > if (d->display_id != get_cpu_cacheinfo_id(smp_processor_id(), r->mon_display_scope)) > > return -EINVAL; > > } > > ... while I remain unsure about where "display_id" fits in. See below. > > > >> > >>> > >>>> > >>>>>>>>> } > >>>>>>>>> + > >>>>>>>>> + return 0; > >>>>>>>>> +} > >>>>>>>>> + > >>>>>>>>> +static int mkdir_mondata_subdir(struct kernfs_node *parent_kn, > >>>>>>>>> + struct rdt_mon_domain *d, > >>>>>>>>> + struct rdt_resource *r, struct rdtgroup *prgrp) > >>>>>>>>> +{ > >>>>>>>>> + struct kernfs_node *kn, *ckn; > >>>>>>>>> + char name[32]; > >>>>>>>>> + bool do_sum; > >>>>>>>>> + int ret; > >>>>>>>>> + > >>>>>>>>> + do_sum = r->mon_scope != r->mon_display_scope; > >>>>>>>>> + sprintf(name, "mon_%s_%02d", r->name, d->display_id); > >>>>>>>>> + kn = kernfs_find_and_get_ns(parent_kn, name, NULL); > >>>>>>>>> + if (!kn) { > >>>>>>>>> + /* create the directory */ > >>>>>>>>> + kn = kernfs_create_dir(parent_kn, name, parent_kn->mode, prgrp); > >>>>>>>>> + if (IS_ERR(kn)) > >>>>>>>>> + return PTR_ERR(kn); > >>>>>>>>> + > >>>>>>>>> + ret = rdtgroup_kn_set_ugid(kn); > >>>>>>>>> + if (ret) > >>>>>>>>> + goto out_destroy; > >>>>>>>>> + ret = mon_add_all_files(kn, d, r, prgrp, do_sum); > >>>>>>>> > >>>>>>>> This does not look right. If I understand correctly the private data > >>>>>>>> of these event files will have whichever mon domain came up first as > >>>>>>>> its domain id. That seems completely arbitrary and does not reflect > >>>>>>>> accurate state for this file. Since "do_sum" is essentially a "flag" > >>>>>>>> on how this file can be treated, can its "dom_id" not rather be > >>>>>>>> the "monitor scope domain id"? Could that not help to eliminate > >>>>>>> > >>>>>>> You are correct that this should be the "monitor scope domain id" rather > >>>>>>> than the first SNC domain that appears. I'll change to use that. I don't > >>>>>>> think it helps in removing the per-domain display_id. > >>>>>> > >>>>>> Wouldn't the file metadata then be the "display_id"? > >>>>> > >>>>> Yes. The metadata is the display_id for files that need to sum across > >>>>> SNC nodes, but the domain id for ones where no summation is needed. > >>>> > >>>> Right ... and there is a "sum" flag to tell which is which? > >>> > >>> Yes. sum==0 means the domid field is the one and only domain to > >>> report for this resctrl monitor file. sum==1 means the domid field is > >>> the display_id - all domains with this display_id must be summed to > >>> provide the result to present to the user. > >>> > >>> I've tried to capture that in the kerneldoc comment for struct mon_event. > >>> Here's what I'm planning to include in v18 (Outlook will probably mangle > >>> the formatting ... just imagine that the text lines up neatly): > >>> > >>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h > >>> index 49440f194253..3411557d761a 100644 > >>> --- a/arch/x86/kernel/cpu/resctrl/internal.h > >>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h > >>> @@ -132,14 +132,19 @@ struct mon_evt { > >>> * as kernfs private data > >>> * @rid: Resource id associated with the event file > >>> * @evtid: Event id associated with the event file > >>> - * @domid: The domain to which the event file belongs > >>> + * @sum: Set when event must be summed across multiple > >>> + * domains. > >>> + * @domid: When @sum is zero this is the domain to which > >>> + * the event file belongs. When sum is one this > >>> + * is the display_id of all domains to be summed > >> > >> Here is where I would like to understand why it cannot just be > >> "When sum is one this is the domain id of the scope at which (for which?) > >> the events must be summed." Although, you already mentioned this will be > >> clear in next posting. > >> > >>> * @u: Name of the bit fields struct > >>> */ > >>> union mon_data_bits { > >>> void *priv; > >>> struct { > >>> unsigned int rid : 10; > >>> - enum resctrl_event_id evtid : 8; > >>> + enum resctrl_event_id evtid : 7; > >>> + unsigned int sum : 1; > >>> unsigned int domid : 14; > >>> } u; > >>> }; > >>> > >>> -Tony > > > > Maybe an example might help. Assume an SNC system with two sockets, > > three SNC nodes per socket, only supporting monitoring. The only domain > > list created by resctrl is the mon_domains list on the RDT_RESOURCE_L3 > > resource. And it looks like this (with "disply_list" abbreviated to > > "dspl" to keep the picture small): > > > > > > <------ SNC NODES ON SOCKET 0 -----> <------ SNC NODES ON SOCKET 1 ------> > > ----> +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ > > | id = 0 | | id = 1 | | id = 2 | | id = 3 | | id = 4 | | id = 5 | > > | | | | | | | | | | | | > > | dspl = 0 | | dspl = 0 | | dspl = 0 | | dspl = 1 | | dspl = 1 | | dspl = 1 | > > | | | | | | | | | | | | > > +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ > > > > Reading the per-SNC node monitor values looks just the same as the > > non-SNC case. The struct rmid_read passed across the smp_call*() has > > the resource, domain, event, and reading the counters is essentially > > unchanged. > > > > Reading a file to sum event counts for SNC nodes on socket 1 needs to > > find each of the "struct rdt_mon_domain" that are part of socket 1. > > I'm doing that with meta data in the file that says sum=1 (need to add > > up something) and domid=1 (the things to be added are those with > > display_id = 1). So the code reads: > > > > list_for_each_entry(d, &rr->r->mon_domains, hdr.list) { > > if (d->display_id == rr->d->display_id) { > > ... call stuff to read and sum for domain "d" > > } > > } > > > > The display_id is "the domain id of the scope at which (for which?) > > the events must be summed." in your text above. > > My point remains that it is not clear (to me) why it is required to > carry the display_id around. > > list_for_each_entry(d, &rr->r->mon_domains, hdr.list) { > /* determine @id of @d at rr->r->mon_display_scope */ > if (id == domid) { > ... call stuff to read and sum for domain "d" > } > } That "determine @id of @d at rr->r->mon_display_scope" is: display_id = get_domain_id_from_scope(cpumask_first(rr->d->hdr.cpu_mask), rr->r->mon_display_scope); if (display_id < 0) { take some error action } So it certainly isn't *required* to carry display_id around. But doing so makes the code simpler. I could bury the long line into a helper macro/function. But I can't bury the error check. I'd also need to change get_domain_id_from_scope() from "static" to global so it can be used in other files besides core.c Note that there are several places where I need to use display_id, computing it at run time in each place, but it seems so much easier to do it once at domain creation time. > > Reinette -Tony