Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933135AbbLVSND (ORCPT ); Tue, 22 Dec 2015 13:13:03 -0500 Received: from mga02.intel.com ([134.134.136.20]:40888 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932166AbbLVSNB convert rfc822-to-8bit (ORCPT ); Tue, 22 Dec 2015 13:13:01 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,465,1444719600"; d="scan'208";a="879211333" From: "Yu, Fenghua" To: Thomas Gleixner , LKML CC: Peter Zijlstra , "x86@kernel.org" , Marcelo Tosatti , Luiz Capitulino , "Shivappa, Vikas" , Tejun Heo , "Shankar, Ravi V" , "Luck, Tony" Subject: RE: [RFD] CAT user space interface revisited Thread-Topic: [RFD] CAT user space interface revisited Thread-Index: AQHRIi6aCjeIY673nkyGZXMlpWki5Z7Xbfmg Date: Tue, 22 Dec 2015 18:12:05 +0000 Message-ID: <3E5A0FA7E9CA944F9D5414FEC6C712205DF4E157@ORSMSX106.amr.corp.intel.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMGMyNzQyNmQtZWJkZC00MDE4LWExZGItMmJiNDk2OTg0ZjlmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjQuMTAuMTkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiWFdEdUJFV2FUdmdkWEJyYVBvZnRQZnhXZFpIWGI5cVN6VjRLRk5NdzQ2QT0ifQ== x-ctpclassification: CTP_IC x-originating-ip: [10.22.254.140] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5335 Lines: 135 > From: Thomas Gleixner [mailto:tglx@linutronix.de] > Sent: Wednesday, November 18, 2015 10:25 AM > Folks! > > After rereading the mail flood on CAT and staring into the SDM for a while, I > think we all should sit back and look at it from scratch again w/o our > preconceptions - I certainly had to put my own away. > > Let's look at the properties of CAT again: > > - It's a per socket facility > > - CAT slots can be associated to external hardware. This > association is per socket as well, so different sockets can have > different behaviour. I missed that detail when staring the first > time, thanks for the pointer! > > - The association ifself is per cpu. The COS selection happens on a > CPU while the set of masks which are selected via COS are shared > by all CPUs on a socket. > > There are restrictions which CAT imposes in terms of configurability: > > - The bits which select a cache partition need to be consecutive > > - The number of possible cache association masks is limited > > Let's look at the configurations (CDP omitted and size restricted) > > Default: 1 1 1 1 1 1 1 1 > 1 1 1 1 1 1 1 1 > 1 1 1 1 1 1 1 1 > 1 1 1 1 1 1 1 1 > > Shared: 1 1 1 1 1 1 1 1 > 0 0 1 1 1 1 1 1 > 0 0 0 0 1 1 1 1 > 0 0 0 0 0 0 1 1 > > Isolated: 1 1 1 1 0 0 0 0 > 0 0 0 0 1 1 0 0 > 0 0 0 0 0 0 1 0 > 0 0 0 0 0 0 0 1 > > Or any combination thereof. Surely some combinations will not make any > sense, but we really should not make any restrictions on the stupidity of a > sysadmin. The worst outcome might be L3 disabled for everything, so what? > > Now that gets even more convoluted if CDP comes into play and we really > need to look at CDP right now. We might end up with something which looks > like this: > > 1 1 1 1 0 0 0 0 Code > 1 1 1 1 0 0 0 0 Data > 0 0 0 0 0 0 1 0 Code > 0 0 0 0 1 1 0 0 Data > 0 0 0 0 0 0 0 1 Code > 0 0 0 0 1 1 0 0 Data > or > 0 0 0 0 0 0 0 1 Code > 0 0 0 0 1 1 0 0 Data > 0 0 0 0 0 0 0 1 Code > 0 0 0 0 0 1 1 0 Data > > Let's look at partitioning itself. We have two options: > > 1) Per task partitioning > > 2) Per CPU partitioning > > So far we only talked about #1, but I think that #2 has a value as well. Let me > give you a simple example. > > Assume that you have isolated a CPU and run your important task on it. You > give that task a slice of cache. Now that task needs kernel services which run > in kernel threads on that CPU. We really don't want to (and cannot) hunt > down random kernel threads (think cpu bound worker threads, softirq > threads ....) and give them another slice of cache. What we really want is: > > 1 1 1 1 0 0 0 0 <- Default cache > 0 0 0 0 1 1 1 0 <- Cache for important task > 0 0 0 0 0 0 0 1 <- Cache for CPU of important task > > It would even be sufficient for particular use cases to just associate a piece of > cache to a given CPU and do not bother with tasks at all. > > We really need to make this as configurable as possible from userspace > without imposing random restrictions to it. I played around with it on my new > intel toy and the restriction to 16 COS ids (that's 8 with CDP > enabled) makes it really useless if we force the ids to have the same meaning > on all sockets and restrict it to per task partitioning. > > Even if next generation systems will have more COS ids available, there are > not going to be enough to have a system wide consistent view unless we > have COS ids > nr_cpus. > > Aside of that I don't think that a system wide consistent view is useful at all. > > - If a task migrates between sockets, it's going to suffer anyway. > Real sensitive applications will simply pin tasks on a socket to > avoid that in the first place. If we make the whole thing > configurable enough then the sysadmin can set it up to support > even the nonsensical case of identical cache partitions on all > sockets and let tasks use the corresponding partitions when > migrating. > > - The number of cache slices is going to be limited no matter what, > so one still has to come up with a sensible partitioning scheme. > > - Even if we have enough cos ids the system wide view will not make > the configuration problem any simpler as it remains per socket. > > It's hard. Policies are hard by definition, but this one is harder than most > other policies due to the inherent limitations. > > So now to the interface part. Unfortunately we need to expose this very > close to the hardware implementation as there are really no abstractions > which allow us to express the various bitmap combinations. Any abstraction I > tried to come up with renders that thing completely useless. > > I was not able to identify any existing infrastructure where this really fits in. I > chose a directory/file based representation. We certainly could do the same Is this be /sys/devices/system/? Then create qos/cat directory. In the future, other directories may be created e.g. qos/mbm? Thanks. -Fenghua -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/