Date: Fri, 31 Jul 2015 09:24:58 -0700 (PDT)
From: Vikas Shivappa <vikas.shivappa@intel.com>
To: Tejun Heo <tj@kernel.org>
cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>,
        linux-kernel@vger.kernel.org, vikas.shivappa@intel.com, x86@kernel.org,
        hpa@zytor.com, tglx@linutronix.de, mingo@kernel.org,
        peterz@infradead.org, matt.fleming@intel.com, will.auld@intel.com,
        glenn.p.williamson@intel.com, kanaka.d.juvva@intel.com
Subject: Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service
 management
In-Reply-To: <20150730194458.GD3504@mtj.duckdns.org>
Message-ID: <alpine.DEB.2.10.1507301359200.921@vshiva-Udesk>
References: <1435789270-27010-1-git-send-email-vikas.shivappa@linux.intel.com> <1435789270-27010-6-git-send-email-vikas.shivappa@linux.intel.com> <20150730194458.GD3504@mtj.duckdns.org>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5694
Lines: 136


On Thu, 30 Jul 2015, Tejun Heo wrote:

> Hello, Vikas.
>
> On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:
>> This patch adds a cgroup subsystem for Intel Resource Director
>> Technology(RDT) feature and Class of service(CLOSid) management which is
>> part of common RDT framework.  This cgroup would eventually be used by
>> all sub-features of RDT and hence be associated with the common RDT
>> framework as well as sub-feature specific framework.  However current
>> patch series only adds cache allocation sub-feature specific code.
>>
>> When a cgroup directory is created it has a CLOSid associated with it
>> which is inherited from its parent.  The Closid is mapped to a
>> cache_mask which represents the L3 cache allocation to the cgroup.
>> Tasks belonging to the cgroup get to fill the cache represented by the
>> cache_mask.
>
> First of all, I apologize for being so late.  I've been thinking about
> it but the thoughts didn't quite crystalize (which isn't to say that
> it's very crystal now) until recently.  If I understand correctly,
> there are a couple suggested use cases for explicitly managing cache
> usage.
>
> 1. Pinning known hot areas of memory in cache.

No , the cache allocation doesnt do this. (or it isn't expected to do)

>
> 2. Explicitly regulating cache usage so that cacheline allocation can
>   be better than CPU itself doing it.

yes , this is what we want to do using cache alloc.

>
> #1 isn't part of this patchset, right?  Is there any plan for working
> towards this too?

cache allocation is not intended to do #1 , so we dont have to support this.

>
> For #2, it is likely that the targeted use cases would involve threads
> of a process or at least cooperating processes and having a simple API
> which just goes "this (or the current) thread is only gonna use this
> part of cache" would be a lot easier to use and actually beneficial.
>
> I don't really think it makes sense to implement a fully hierarchical
> cgroup solution when there isn't the basic affinity-adjusting
> interface and it isn't clear whether fully hierarchical resource
> distribution would be necessary especially given that the granularity
> of the target resource is very coarse.
>
> I can see that how cpuset would seem to invite this sort of usage but
> cpuset itself is more of an arbitrary outgrowth (regardless of
> history) in terms of resource control and most things controlled by
> cpuset already have countepart interface which is readily accessible
> to the normal applications.

Yes today we dont have an alternative interface - but we can always build one. 
We simply dont have it because till now Linux kernel just tolerated the 
degradation that could have occured by cache contention and this is the first 
interface we are building.

>
> Given that what the feature allows is restricting usage rather than
> granting anything exclusively, a programmable interface wouldn't need
> to worry about complications around priviledges while being able to
> reap most of the benefits in an a lot easier way.  Am I missing
> something?
>

For #2 , from the intel_rdt cgroup we develop a framework where the user can 
regulate the cache allocation. A user space app could also eventually use this 
as underlying support and then do things on top of it depending on the 
enterprise or other requirements.

A typical use case would be that an application which 
is say continuously polluting the cache(low priority app from cache usage 
perspective) by bringing in data from the network (copying/streaming app) and 
and not letting an app to use the cache which has legitimate requirement of 
cache usage(high priority app).

We need to map the group of tasks to a particular class of service and way for 
the user to specify the cache capacity for that class of service . Also a 
default cgroup which could have all the tasks and use all the cache.
The hierarchical interface can be used by the user as required and does not 
really interfere with allocating exclusive blocks of cache - all the user needs 
to do is make sure the masks dont overlap.
The user can configure the masks to be exclusive from others.
But note that overlapping mask provides a very easy way to share the cache usage 
which is what you may want to do sometimes. The current implementation can be 
easily extended to *enforce* exclusive capacity masks between child nodes if 
required. But since its expected for the super user to be using this , the usage 
may be limited as well or the user can still care of it like i said above. Some 
of the emails may have been confusing that we cannot do exclusive allocations - 
but thats not true all together : we can do canfigure the masks to have 
exclusive cache blocks for different cgroups but its just left to the user...


We did have a lot of discussions during the design and V3 if you remember and 
were closed on using a seperate controller ... Below is one such thread where 
we discussed the same . Dont want to loop throug again with this already full 
marathon patch :)

https://lkml.org/lkml/2015/1/27/846

quick copy from V3 thread  -
"

> proposal but was removed as we did not get agreement on lkml.
>
> the original lkml thread is here from 10/2014 for your reference -
> https://lkml.org/lkml/2014/10/16/568

Yeap, I followed that thread and this being a separate controller 
definitely makes a lot more sense.

"


Thanks,
Vikas

> Thanks.
>
> -- 
> tejun
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/