Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752291AbaJPSoX (ORCPT ); Thu, 16 Oct 2014 14:44:23 -0400 Received: from mga02.intel.com ([134.134.136.20]:6190 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751888AbaJPSoV (ORCPT ); Thu, 16 Oct 2014 14:44:21 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,733,1406617200"; d="scan'208";a="620049809" Message-ID: <1413485050.28564.14.camel@vshiva-Udesk> Subject: Cache Allocation Technology Design From: vikas To: linux-kernel@vger.kernel.org Cc: "matt.fleming" , "will.auld" , tj@kernel.org, "vikas.shivappa" Date: Thu, 16 Oct 2014 11:44:10 -0700 Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu1 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi All , We have put together a draft design document for cache allocation technology below. Please review the same and let us know any feedback. Make sure you cc my email vikas.shivappa@linux.intel.com when replying Thanks, Vikas What is Cache Allocation Technology ( CAT ) ------------------------------------------- Cache Allocation Technology provides a way for the Software (OS/VMM) to restrict cache allocation to a defined 'subset' of cache which may be overlapping with other 'subsets'. This feature is used when allocating a line in cache ie when pulling new data into the cache. The programming of the h/w is done via programming MSRs. The different cache subsets are identified by CLOS identifier (class of service) and each CLOS has a CBM (cache bit mask). The CBM is a contiguous set of bits which defines the amount of cache resource that is available for each 'subset'. Why is CAT (cache allocation technology) needed ------------------------------------------------ The CAT enables more cache resources to be made available for higher priority applications based on guidance from the execution environment. The architecture also allows dynamically changing these subsets during runtime to further optimize the performance of the higher priority application with minimal degradation to the low priority app. Additionally, resources can be rebalanced for system throughput benefit. (Refer to Section 17.15 in the Intel SDM http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf) This technique may be useful in managing large computer systems which large LLC. Examples may be large servers running instances of webservers or database servers. In such complex systems, these subsets can be used for more careful placing of the available cache resources. The CAT kernel patch would provide a basic kernel framework for users to be able to implement such cache subsets. Kernel implementation Overview ------------------------------- Kernel implements a cgroup subsystem to support Cache Allocation. Creating a CAT cgroup would create a new CLOS <-> CBM mapping. Each cgroup would have one CBM and would just represent one cache 'subset'. The user would be allowed to create as many directories as there are CLOSs defined by the h/w. If user tries to create more than the available CLOSs , -ENOSPC is returned. Currently we support only one level of directory, ie directory can be created only under the root. There are 2 modes supported 1. Affinitized mode : Each CAT cgroup is affinitized to a set of CPUs specified by the 'cpus' file. The tasks in the CAT cgroup would be constrained only on the CPUs in the 'cpus' file. The CPUs in this file are exclusively used for this cgroup. Requests by task using the sched_setaffinity() would be filtered through the tasks 'cpus'. These tasks would get to fill the LLC cache represented by the cgroup's 'cbm' file. 'cpus' is a cpumask and works the same way as the existing cpumask datastructure. 2. Non Affinitized mode : Each CAT cgroup(inturn 'subset') would be for a group of tasks. There is no 'cpus' file and the CPUs that the tasks run are not restricted by the CAT cgroup Assignment of CBM,CLOS and modes --------------------------------- Root directory would have all bits in 'cbm' file by default. The cbm_max file in the root defines the maximum number of bits describing the available cache units. Say if cbm_max is 16 then the 'cbm' cannot have more than 16 bits. The 'affinitized' file is either 0 or 1 which represent the two modes. System would boot with affinitized mode and all CPUs would have all bits in cbm set meaning all CPUs have 100% cache(effectively cache allocation is not in effect). The 'cbm' file is restricted to having no more than its cbm_max least significant bits set. Any contiguous subset of these bits maybe set to indication the cache mapping desired. The 'cbm' between 2 directories can overlap. The 'cbm' would represent the cache 'subset' of the CAT cgroup. For ex: on a system with 16 bits of max cbm bits , if the directory has the least significant 4 bits set in its 'cbm' file, it would be allocated the right quarter of the Last level cache which means the tasks belonging to this CAT cgroup can use the right quarter of the cache to fill. If it has the most significant 8 bits set ,it would be allocated the left half of the cache(8 bits out of 16 represents 50%). The cache subset would be affinitized to a set of cpus in affinitized mode. The CPUs to which this allocation is affinitized to is represented by the 'cpus' file. The 'cpus' need to be mutually exclusive from cpus of other directories. The cache portion defined in the CBM file is available to all tasks within the CAT group and these task are not allowed to allocate space in other parts of the cache. 'cbm' file is used in both modes where as the 'cpus' file is relevant in affinitized mode and would disappear in non-affinitized mode. Scheduling and Context Switch ------------------------------ In affinitized mode , the cache 'subset' and the tasks in a CAT cgroup are affinitized to the CPUs represented by the CAT cgroup's 'cpus' file i.e when user sets the 'cbm' to 'portion' and 'cpus' to c and 'tasks' to t, the tasks 't' would always be scheduled on cpus 'c' and will get to fill in the allocated 'portion' in last level cache. As noted above ,in the affinitized mode the tasks in a CAT cgroup would also be affinitized to the CPUs in the 'cpus' file of the directory. Following hooks in the kernel are required to implement this (on the lines of cpuset code) - in sched_setaffinity to mask the requested cpu mask with what is present in the task's 'cpus' - in migrate_task to migrate the tasks only to those CPUs in the 'cpus' file if possible. - in select_task_rq In non-affinitized mode the 'affinitized' is 0 , and the 'tasks' file indicate the tasks the cache subset is affinitized to. When user adds tasks to the tasks file , the tasks would get to fill the cache subset represented by the CAT cgroup's 'cbm' file. During context switch kernel implements this by writing the corresponding CLOSid (internally maintained by kernel) of the CAT cgroup to the CPU's IA32_PQR_ASSOC MSR. Usage and Example ----------------- Following would mount the cache allocation cgroup subsystem and create 2 directories. Please refer to Documentation/cgroups/cgroups.txt on details about how to use cgroups. cd /sys/fs/cgroup mkdir cachealloc mount -t cgroup -ocachealloc cachealloc /sys/fs/cgroup/cachealloc cd cachealloc Create 2 cat cgroups mkdir group1 mkdir group2 Following are some of the Files in the directory ls cachea.cbm cachea.cpus . cpus file only appears in the affinitized mode cgroup.procs tasks cbm_max (root only) affinitized (root only) . by default itsaffinitized mode Say if the cache is 2MB and cbm supports 16 bits, then setting the below allocates the 'right 1/4th(512KB)' of the cache to group2 Edit the CBM for group2 to set the least significant 4 bits. This allocates 'right quarter' of the cache. cd group2 /bin/echo 0xf > cachealloc.cbm Change cpus in the directory. /bin/echo 1-4 > cachealloc.cpus Edit the CBM for group2 to set the least significant 8 bits.This allocates the right half of the cache to 'group2'. cd group2 /bin/echo 0xff > cachea.cbm Assign tasks to the group2 /bin/echo PID1 > tasks /bin/echo PID2 > tasks Meaning now threads PID1 and PID2 runs on CPUs 1-4 , and get to fill the 'right half' of the cache. The tasks PID1 and PID2 can only have a subset of the cpu affinity defined in the 'cpus' file Edit the affinitized to 0.mode is changed in root directory cd .. /bin/echo 0 > cachealloc.affinitized Now the tasks and the cache allocation is not affinitized to the CPUs and the task's cpu affinity is not restricted to being with the subset of 'cpus' cpumask. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/