Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752314AbdDCXv5 (ORCPT ); Mon, 3 Apr 2017 19:51:57 -0400 Received: from mga09.intel.com ([134.134.136.24]:9129 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751750AbdDCXv4 (ORCPT ); Mon, 3 Apr 2017 19:51:56 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,272,1486454400"; d="scan'208";a="83898539" Date: Mon, 3 Apr 2017 16:52:24 -0700 (PDT) From: Shivappa Vikas X-X-Sender: vikas@vshiva-Udesk To: Vikas Shivappa cc: vikas.shivappa@intel.com, x86@kernel.org, linux-kernel@vger.kernel.org, hpa@zytor.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, "Shankar, Ravi V" , tony.luck@intel.com, "Yu, Fenghua" , h.peter.anvin@intel.com Subject: Re: [PATCH] x86/cqm: Cqm3 Documentation In-Reply-To: <1491263247-26400-1-git-send-email-vikas.shivappa@linux.intel.com> Message-ID: References: <1491263247-26400-1-git-send-email-vikas.shivappa@linux.intel.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11123 Lines: 304 On Mon, 3 Apr 2017, Vikas Shivappa wrote: > Explains the design for the interface Explains the design for the new resctrl based cqm interface. A followup with design document after the requirements for new cqm was reviewed : https://marc.info/?l=linux-kernel&m=148891934720489 > > Signed-off-by: Vikas Shivappa > --- > Documentation/x86/intel_rdt_ui.txt | 210 ++++++++++++++++++++++++++++++++++--- > 1 file changed, 197 insertions(+), 13 deletions(-) > > diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt > index d918d26..46a2efd 100644 > --- a/Documentation/x86/intel_rdt_ui.txt > +++ b/Documentation/x86/intel_rdt_ui.txt > @@ -1,12 +1,13 @@ > -User Interface for Resource Allocation in Intel Resource Director Technology > +User Interface for Resource Allocation and Monitoring in Intel Resource > +Director Technology > > Copyright (C) 2016 Intel Corporation > > Fenghua Yu > Tony Luck > > -This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the > -X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3". > +This feature is enabled by the CONFIG_INTEL_RDT Kconfig and the > +X86 /proc/cpuinfo flag bits "rdt", "cqm", "cat_l3" and "cdp_l3". > > To use the feature mount the file system: > > @@ -16,14 +17,20 @@ mount options are: > > "cdp": Enable code/data prioritization in L3 cache allocations. > > +The mount succeeds if either of allocation or monitoring is present. > +Monitoring is enabled for each resource which has support in the > +hardware. For more details on the behaviour of the interface during > +monitoring and allocation, see resctrl group section. > > Info directory > -------------- > > The 'info' directory contains information about the enabled > resources. Each resource has its own subdirectory. The subdirectory > -names reflect the resource names. Each subdirectory contains the > -following files: > +names reflect the resource names. > + > +Each subdirectory contains the following files with respect to > +allocation: > > "num_closids": The number of CLOSIDs which are valid for this > resource. The kernel uses the smallest number of > @@ -35,15 +42,36 @@ following files: > "min_cbm_bits": The minimum number of consecutive bits which must be > set when writing a mask. > > +Each subdirectory contains the following files with respect to > +monitoring: > + > +"num_rmids": The number of RMIDs which are valid for > + this resource. > + > +"mon_enabled": Indicates if monitoring is enabled for > + the resource. > > -Resource groups > ---------------- > +"max_threshold_occupancy": This is specific to LLC_occupancy > + monitoring. provides an upper bound on > + the threshold and is measured in bytes > + because it's exposed to userland. > + > +Resource alloc and monitor groups (ALLOC_MON group) > +--------------------------------------------------- > Resource groups are represented as directories in the resctrl file > system. The default group is the root directory. Other groups may be > created as desired by the system administrator using the "mkdir(1)" > command, and removed using "rmdir(1)". > > -There are three files associated with each group: > +User can allocate resources and monitor resources via these > +resource groups created in the root directory. > + > +Note that the creation of new ALLOC_MON groups is only allowed when RDT > +allocation is supported. This means user can still monitor the root > +group when only RDT monitoring is supported. > + > +There are three files associated with each group with respect to > +resource allocation: > > "tasks": A list of tasks that belongs to this group. Tasks can be > added to a group by writing the task ID to the "tasks" file > @@ -75,6 +103,56 @@ the CPU's group is used. > > 3) Otherwise the schemata for the default group is used. > > +There are three files associated with each group with respect to > +resource monitoring: > + > +"data": A list of all the monitored resource data available to this > + group. This includes the monitored data for all the tasks in the > + 'tasks' and the cpus in 'cpus' file. Each resource has its own > + line and format - see below for details the 'data' file > + description. The monitored data for > + the ALLOC_MON group is the sum of all the data for its sub MON > + groups. > + > +"mon_tasks": A directory where in user can create Resource monitor > + groups (MON groups). This will let user create a group to > + monitor a subset of tasks in the above 'tasks' file. > + > +Resource monitor groups (MON group) > +----------------------------------- > + > +Resource monitor groups are directories inside the mon_tasks directory. > +There is one mon_tasks directory inside every ALLOC_MON group including > +the root group. > + > +MON group help user monitor a subset of tasks and cpus with in > +the parent ALLOC_MON group. > + > +Each MON group has 3 files: > + > +"tasks": This behaves exactly as the 'tasks' file above in the ALLOC_MON > + group with the added restriction that only a task present in the > + parent ALLOC_MON group can be added and this automatically > + removes the task from the "tasks" file of any other MON group. > + When a task gets removed from parent ALLOC_MON group the task is > + removed from "tasks" file in the child MON group. > + > +"cpus": This behaves exactly as the 'cpus' file above in the ALLOC_MON > + group with the added restriction that only a cpu present in the > + parent ALLOC_MON group can be added and this automatically > + removes the task from the "cpus" file of any other MON group. > + When a cpu gets removed from parent ALLOC_MON group the cpu is > + removed from "cpus" file in the child MON group. > + > +"data": A list of all the monitored resource data available to > + this group. Each resource has its own line and format - see > + below for details in the 'data' file description. > + > +data files - general concepts > +----------------------------- > +Each line in the file describes one resource. The line starts with > +the name of the resource, followed by monitoring data collected > +in each of the instances/domains of that resource on the system. > > Schemata files - general concepts > --------------------------------- > @@ -107,21 +185,26 @@ and 0xA are not. On a system with a 20-bit mask each bit represents 5% > of the capacity of the cache. You could partition the cache into four > equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000. > > - > -L3 details (code and data prioritization disabled) > --------------------------------------------------- > +L3 'schemata' file format (code and data prioritization disabled) > +---------------------------------------------------------------- > With CDP disabled the L3 schemata format is: > > L3:=;=;... > > -L3 details (CDP enabled via mount option to resctrl) > ----------------------------------------------------- > +L3 'schemata' file format (CDP enabled via mount option to resctrl) > +------------------------------------------------------------------ > When CDP is enabled L3 control is split into two separate resources > so you can specify independent masks for code and data like this: > > L3data:=;=;... > L3code:=;=;... > > +L3 'data' file format (data) > +--------------------------- > +When monitoring is enabled for L3 occupancy the 'data' file format is: > + > + L3:=;=;... > + > L2 details > ---------- > L2 cache does not support code and data prioritization, so the > @@ -129,6 +212,8 @@ schemata format is always: > > L2:=;=;... > > +Examples for RDT allocation usage: > + > Example 1 > --------- > On a two socket machine (one L3 cache per socket) with just four bits > @@ -212,3 +297,102 @@ Finally we move core 4-7 over to the new group and make sure that the > kernel and the tasks running there get 50% of the cache. > > # echo C0 > p0/cpus > + > +Examples for RDT Monitoring usage: > + > +Example 1 (Monitor CTRL_MON group and subset of tasks in CTRL_MON group) > +--------- > +On a two socket machine (one L3 cache per socket) with just four bits > +for cache bit masks > + > +# mount -t resctrl resctrl /sys/fs/resctrl > +# cd /sys/fs/resctrl > +# mkdir p0 p1 > +# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata > +# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata > +# echo 5678 > p1/tasks > +# echo 5679 > p1/tasks > + > +The default resource group is unmodified, so we have access to all parts > +of all caches (its schemata file reads "L3:0=f;1=f"). > + > +Tasks that are under the control of group "p0" may only allocate from the > +"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1. > +Tasks in group "p1" use the "lower" 50% of cache on both sockets. > + > +Create monitor groups > + > +# cd /sys/fs/resctrl/p1/mon_tasks > +# mkdir m11 m12 > +# echo 5678 > m11/tasks > +# echo 5679 > m12/tasks > + > +fetch data (data shown in bytes) > + > +# cat m11/tasks_data > +L3:0=16234000;1=14789000 > +# cat m12/tasks_data > +L3:0=14234000;1=16789000 > + > +The parent group shows the aggregated data. > + > +# cat /sys/fs/resctrl/p1/tasks_data > +L3:0=31234000;1=31789000 > + > +Example 2 (Monitor a task from its creation) > +--------- > +On a two socket machine (one L3 cache per socket) > + > +# mount -t resctrl resctrl /sys/fs/resctrl > +# cd /sys/fs/resctrl > +# mkdir p0 p1 > + > +An RMID is allocated to the group once its created and hence the > +below is monitored from its creation. > + > +# echo $$ > /sys/fs/resctrl/p1/tasks > +# echo > /sys/fs/resctrl/p1/tasks > + > +Fetch the data > + > +# cat /sys/fs/resctrl/p1/tasks_data > +L3:0=31234000;1=31789000 > + > +Example 3 (Monitor without CAT support or before creating CAT groups) > +--------- > + > +Assume a system like HSW has only CQM and no CAT support. In this case > +the resctrl will still mount but cannot create CTRL_MON directories. > +But user can create different MON groups within the root group thereby > +able to monitor all tasks including kernel threads. > + > +This can also be used to profile jobs cache size footprint before being > +able to allocate them different allocation groups. > + > +# mount -t resctrl resctrl /sys/fs/resctrl > +# cd /sys/fs/resctrl > + > +# echo $$ > /sys/fs/resctrl/p1/tasks > +# echo > /sys/fs/resctrl/p1/tasks > + > +# cat /sys/fs/resctrl/p1/tasks_data > +L3:0=31234000;1=31789000 > + > +Example 4 (Monitor real time tasks) > +----------------------------------- > + > +A single socket system which has real time tasks running on cores 4-7 > +and non real time tasks on other cpus. We want to monitor the cache > +occupancy of the real time threads on these cores. > + > +# mount -t resctrl resctrl /sys/fs/resctrl > +# cd /sys/fs/resctrl > +# mkdir p1 > + > +Move the cpus 4-7 over to p1 > +# echo C0 > p0/cpus > + > +View the llc occupancy snapshot > + > +# cat /sys/fs/resctrl/p1/tasks_data > +L3:0=11234000 > -- > 1.9.1 > >