Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030339AbbKDOrb (ORCPT ); Wed, 4 Nov 2015 09:47:31 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48050 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932766AbbKDOr2 (ORCPT ); Wed, 4 Nov 2015 09:47:28 -0500 Date: Wed, 4 Nov 2015 09:42:27 -0500 From: Luiz Capitulino To: Fenghua Yu Cc: "H Peter Anvin" , "Ingo Molnar" , "Thomas Gleixner" , "Peter Zijlstra" , "linux-kernel" , "x86" , "Vikas Shivappa" , Marcelo Tosatti , tj@kernel.org, riel@redhat.com Subject: Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support Message-ID: <20151104094227.5aafdf2c@redhat.com> In-Reply-To: <1443766185-61618-1-git-send-email-fenghua.yu@intel.com> References: <1443766185-61618-1-git-send-email-fenghua.yu@intel.com> Organization: Red Hat MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3116 Lines: 76 On Thu, 1 Oct 2015 23:09:34 -0700 Fenghua Yu wrote: > This series has some preparatory patches and Intel cache allocation > support. Ping? What's the status of this series? We badly need this series for KVM-RT workloads. I did try it and it seems to work but, apart from small fixable issues which I'll reply to specific patches to point out, there are some design issues which I need some clarification. They are in order of relevance: o Cache reservations are global to all NUMA nodes CAT is mostly intended for real-time and high performance computing. For both of them the most common setup is to pin your threads to specific cores on a specific NUMA node. So, suppose I have two HPC threads pinned to specific cores on node1. I want to reserve 80% of the L3 cache to those threads. With current patches I'd do this: 1. Create a "all-tasks" cgroup which can only access 20% of the cache 2. Create a "hpc" cgroup which can access 80% of the cache 3. Move my HPC threads to "hpc" and all the other threads to "all-tasks" This has the intended behavior on node1: the "hpc" threads will write into 80% of the L3 cache and any "all-tasks" threads executing there will only write into 20% of the cache. However, this is also true for node0! So, the "all-tasks" threads can only write into 20% of the cache in node0 even though "hpc" threads will never execute there. Is this intended by design? Like, is this a hardware limitation (given that the IA32_L3_MASK_n MSRs are global anyways) or maybe a way to enforce cache coherence? I was wondering if we could have masks per NUMA node, where they are applied to processes whenever they migrate among NUMA nodes. o How does this feature apply to kernel threads? I'm just unable to move kernel threads out of the root cgroup. This means that kernel threads can always write into all cache no matter what the reservation scheme is. Is this intended by design? Why? Unless I'm missing something, reservations could and should be applied to kernel threads as well. o You can't change the root cgroup's CBM I can understand this makes the implementation a lot simpler. However, the reality is that there are way too little CBMs and loosing one for the root group seems like a waste. Can we change this or is there strong reasons not to do so? o cgroups hierarchy is limited by the number of CBMs Today on my Haswell system, this means that I can only have 3 directories in my cgroups hierarchy. If the number of CBMs are expected to grow in next processors, then I think having this feature as cgroups makes sense. However, if we're still going to be this limited in terms of directory structure, then it seems a bit overkill to me to have this as cgroups -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/