Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932798AbbG1Oyl (ORCPT ); Tue, 28 Jul 2015 10:54:41 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:60634 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932370AbbG1Oyg (ORCPT ); Tue, 28 Jul 2015 10:54:36 -0400 Date: Tue, 28 Jul 2015 16:54:29 +0200 From: Peter Zijlstra To: Vikas Shivappa Cc: linux-kernel@vger.kernel.org, vikas.shivappa@intel.com, x86@kernel.org, hpa@zytor.com, tglx@linutronix.de, mingo@kernel.org, tj@kernel.org, matt.fleming@intel.com, will.auld@intel.com, glenn.p.williamson@intel.com, kanaka.d.juvva@intel.com Subject: Re: [PATCH 3/9] x86/intel_rdt: Cache Allocation documentation and cgroup usage guide Message-ID: <20150728145429.GQ25159@twins.programming.kicks-ass.net> References: <1435789270-27010-1-git-send-email-vikas.shivappa@linux.intel.com> <1435789270-27010-4-git-send-email-vikas.shivappa@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1435789270-27010-4-git-send-email-vikas.shivappa@linux.intel.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11447 Lines: 334 On Wed, Jul 01, 2015 at 03:21:04PM -0700, Vikas Shivappa wrote: Please edit this document to have consistent spacing. Its really hard to read this. Every time I spot a misplaced space my brain stumbles and I need to restart. > diff --git a/Documentation/cgroups/rdt.txt b/Documentation/cgroups/rdt.txt > new file mode 100644 > index 0000000..dfff477 > --- /dev/null > +++ b/Documentation/cgroups/rdt.txt > @@ -0,0 +1,215 @@ > + RDT > + --- > + > +Copyright (C) 2014 Intel Corporation > +Written by vikas.shivappa@linux.intel.com > +(based on contents and format from cpusets.txt) > + > +CONTENTS: > +========= > + > +1. Cache Allocation Technology > + 1.1 What is RDT and Cache allocation ? > + 1.2 Why is Cache allocation needed ? > + 1.3 Cache allocation implementation overview > + 1.4 Assignment of CBM and CLOS > + 1.5 Scheduling and Context Switch > +2. Usage Examples and Syntax > + > +1. Cache Allocation Technology(Cache allocation) > +=================================== > + > +1.1 What is RDT and Cache allocation > +------------------------------------ > + > +Cache allocation is a sub-feature of Resource Director Technology(RDT) missing ' ' before the '('. > +Allocation or Platform Shared resource control which provides support to > +control Platform shared resources like L3 cache. Currently L3 Cache is Double ' ' after '.' -- which _can_ be correct, but is inconsistent throughout the document. > +the only resource that is supported in RDT. More information can be > +found in the Intel SDM, Volume 3, section 17.15. Please also include the SDM revision, like June 2015. In fact, in the June 2015 V3 17.15 is CQM, not CAT. > +Cache Allocation Technology provides a way for the Software (OS/VMM) > +to restrict cache allocation to a defined 'subset' of cache which may > +be overlapping with other 'subsets'. This feature is used when > +allocating a line in cache ie when pulling new data into the cache. > +The programming of the h/w is done via programming MSRs. Double ' ' before 'MSRs'. > +The different cache subsets are identified by CLOS identifier (class > +of service) and each CLOS has a CBM (cache bit mask). The CBM is a > +contiguous set of bits which defines the amount of cache resource that > +is available for each 'subset'. > + > +1.2 Why is Cache allocation needed > +---------------------------------- > + > +In todays new processors the number of cores is continuously increasing, > +especially in large scale usage models where VMs are used like > +webservers and datacenters. The number of cores increase the number Single ' ' after . > +of threads or workloads that can simultaneously be run. When > +multi-threaded-applications, VMs, workloads run concurrently they > +compete for shared resources including L3 cache. > + > +The Cache allocation enables more cache resources to be made available Double ' ' for no apparent reason. > +for higher priority applications based on guidance from the execution > +environment. > + > +The architecture also allows dynamically changing these subsets during > +runtime to further optimize the performance of the higher priority > +application with minimal degradation to the low priority app. > +Additionally, resources can be rebalanced for system throughput benefit. > + > +This technique may be useful in managing large computer systems which > +large L3 cache. Examples may be large servers running instances of Double ' ' > +webservers or database servers. In such complex systems, these subsets > +can be used for more careful placing of the available cache > +resources. > + > +1.3 Cache allocation implementation Overview > +-------------------------------------------- > + > +Kernel implements a cgroup subsystem to support cache allocation. > + > +Each cgroup has a CLOSid <-> CBM(cache bit mask) mapping. No ' ' before '(' > +A CLOS(Class of service) is represented by a CLOSid.CLOSid is internal Idem, also, _no_ space after '.' > +to the kernel and not exposed to user. Each cgroup would have one CBM Double space after '.' > +and would just represent one cache 'subset'. > + > +The cgroup follows cgroup hierarchy ,mkdir and adding tasks to the I'm thinking the convention is ' ' _after_ ',', not before. > +cgroup never fails. When a child cgroup is created it inherits the > +CLOSid and the CBM from its parent. When a user changes the default > +CBM for a cgroup, a new CLOSid may be allocated if the CBM was not > +used before. The changing of 'l3_cache_mask' may fail with -ENOSPC once > +the kernel runs out of maximum CLOSids it can support. > +User can create as many cgroups as he wants but having different CBMs > +at the same time is restricted by the maximum number of CLOSids > +(multiple cgroups can have the same CBM). > +Kernel maintains a CLOSid<->cbm mapping which keeps reference counter Above you had ' ' around the arrows. > +for each cgroup using a CLOSid. > + > +The tasks in the cgroup would get to fill the L3 cache represented by > +the cgroup's 'l3_cache_mask' file. > + > +Root directory would have all available bits set in 'l3_cache_mask' file Random double ' ' > +by default. > + > +Each RDT cgroup directory has the following files. Some of them may be a > +part of common RDT framework or be specific to RDT sub-features like > +cache allocation. > + > + - intel_rdt.l3_cache_mask: The cache bitmask(CBM) is represented by this > + file. The bitmask must be contiguous and would have a 1 or 2 bit > + minimum length. > + > +1.4 Assignment of CBM,CLOS > +-------------------------- > + > +The 'l3_cache_mask' needs to be a subset of the parent node's > +'l3_cache_mask'. Any contiguous subset of these bits(with a minimum of 2 > +bits on hsw SKUs) maybe set to indicate the cache mapping desired. The > +'l3_cache_mask' between 2 directories can overlap. The 'l3_cache_mask' would > +represent the cache 'subset' of the Cache allocation cgroup. For ex: on > +a system with 16 bits of max cbm bits, if the directory has the least > +significant 4 bits set in its 'l3_cache_mask' file(meaning the 'l3_cache_mask' > +is just 0xf), it would be allocated the right quarter of the Last level > +cache which means the tasks belonging to this Cache allocation cgroup > +can use the right quarter of the cache to fill. If it > +has the most significant 8 bits set ,it would be allocated the left > +half of the cache(8 bits out of 16 represents 50%). Random whitespace again. Also try and limit paragraphs to 5-6 lines max. > + > + > +The cache portion defined in the CBM file is available to all tasks > +within the cgroup to fill and these task are not allowed to allocate > +space in other parts of the cache. > + > +1.5 Scheduling and Context Switch > +--------------------------------- > + > +During context switch kernel implements this by writing the > +CLOSid (internally maintained by kernel) of the cgroup to which the > +task belongs to the CPU's IA32_PQR_ASSOC MSR. The MSR is only written > +when there is a change in the CLOSid for the CPU in order to minimize > +the latency incurred during context switch. > + > +The following considerations are done for the PQR MSR write so that it > +has minimal impact on scheduling hot path: > +- This path doesnt exist on any non-intel platforms. !x86 I think you mean, its entirely possible to have the code present on AMD systems for instance. > +- On Intel platforms, this would not exist by default unless CGROUP_RDT > +is enabled. You can enable this just fine on AMD machines. > +- remains a no-op when CGROUP_RDT is enabled and intel hardware does not > +support the feature. > +- When feature is available, still remains a no-op till the user > +manually creates a cgroup *and* assigns a new cache mask. Since the > +child node inherits the parents cache mask , by cgroup creation there is > +no scheduling hot path impact from the new cgroup. > +- per cpu PQR values are cached and the MSR write is only done when > +there is a task with different PQR is scheduled on the CPU. Typically if > +the task groups are bound to be scheduled on a set of CPUs , the number > +of MSR writes is greatly reduced. Aside from many instances of random whitespace, maybe also format like: - point; - multi line point; - another multi line thing. > + > +2. Usage examples and syntax > +============================ > + > +To check if Cache allocation was enabled on your system > + > +dmesg | grep -i intel_rdt $ dmesg | grep -i intel_rdt That is, whitespace before _and_ after _and_ indent, plus a prompt, to clarify its a command and not part of the text and weirdly formatted. > +should output : intel_rdt: Max bitmask length: xx,Max ClosIds: xx intel_rdt: Max bitmask length: xx Again, wrap in whitespace and indent to set apart. > +the length of l3_cache_mask and CLOS should depend on the system you use. > + > +Also /proc/cpuinfo would have rdt(if rdt is enabled) and cat_l3( if L3 Many more instances of random whitespace. > + cache allocation is enabled). > + > +Following would mount the cache allocation cgroup subsystem and create > +2 directories. Please refer to Documentation/cgroups/cgroups.txt on > +details about how to use cgroups. > + > + cd /sys/fs/cgroup > + mkdir rdt > + mount -t cgroup -ointel_rdt intel_rdt /sys/fs/cgroup/rdt > + cd rdt > + > +Create 2 rdt cgroups > + > + mkdir group1 > + mkdir group2 > + > +Following are some of the Files in the directory > + > + ls > + rdt.l3_cache_mask > + tasks > + See, here you do the whitespace and indent thing, but above you didn't. That kind of inconsistency just bugs the hell out of me. > +Say if the cache is 2MB and cbm supports 16 bits, then setting the > +below allocates the 'right 1/4th(512KB)' of the cache to group2 Another few random whitespace fails. > + > +Edit the CBM for group2 to set the least significant 4 bits. This > +allocates 'right quarter' of the cache. > + > + cd group2 > + /bin/echo 0xf > rdt.l3_cache_mask > + > + > +Edit the CBM for group2 to set the least significant 8 bits.This > +allocates the right half of the cache to 'group2'. > + > + cd group2 > + /bin/echo 0xff > rdt.l3_cache_mask > + > +Assign tasks to the group2 > + > + /bin/echo PID1 > tasks > + /bin/echo PID2 > tasks > + > + Meaning now threads > + PID1 and PID2 get to fill the 'right half' of > + the cache as the belong to cgroup group2. This doesn't want to be indented, right? > + > +Create a group under group2 > + > + cd group2 > + mkdir group21 > + cat rdt.l3_cache_mask > + 0xff - inherits parents mask. And this would show the use of the prompt ($), allows one to distinguish between commands and output. > + > + /bin/echo 0xfff > rdt.l3_cache_mask - throws error as mask has to parent's mask's subset I'm betting you don't actually want us to type the "- ..." bit? Either use a regular bash comment (#) to make it harmless, or format it differently. Because some poor sod is going to literally type that into his console and wonder WTF just happened. > + > +In order to restrict RDT cgroups to specific set of CPUs rdt can be > +comounted with cpusets. Either RDT is in capitals or it is not, but this is silly. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/