Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751524AbbGaSie (ORCPT ); Fri, 31 Jul 2015 14:38:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36912 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750856AbbGaSic (ORCPT ); Fri, 31 Jul 2015 14:38:32 -0400 Date: Fri, 31 Jul 2015 15:38:03 -0300 From: Marcelo Tosatti To: Vikas Shivappa Cc: "Auld, Will" , Vikas Shivappa , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , "hpa@zytor.com" , "tglx@linutronix.de" , "mingo@kernel.org" , "tj@kernel.org" , "peterz@infradead.org" , "Fleming, Matt" , "Williamson, Glenn P" , "Juvva, Kanaka D" Subject: Re: [summary] Re: [PATCH 3/9] x86/intel_rdt: Cache Allocation documentation and cgroup usage guide Message-ID: <20150731183803.GA29321@amt.cnet> References: <1435789270-27010-4-git-send-email-vikas.shivappa@linux.intel.com> <20150728231516.GA16204@amt.cnet> <96EC5A4F3149B74492D2D9B9B1602C27461EB932@ORSMSX105.amr.corp.intel.com> <20150729193208.GC3201@amt.cnet> <20150730202253.GA12921@amt.cnet> <20150731144541.GB22948@amt.cnet> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7464 Lines: 207 On Fri, Jul 31, 2015 at 09:41:58AM -0700, Vikas Shivappa wrote: > > To summarize the ever growing thread : > > 1. the rdt_cgroup can be used to configure exclusive cache bitmaps > for the child nodes which can be used for the scenarios which > Marcello mentions. > > simle examples which were mentioned : > max bitmask length : 16 . hence full mask is 0xffff > groupx_realtime - 0xff . > group2_systemtraffic - 0xf. : put a lot of tasks from root node to > here or which ever is offending and thrashing. > groupy_ - 0x0f > > Now the groupx has its own area of cache that can used by the > realtime/(specific scenario) apps. Similarly configure any groupy. > > 2. Can the maps can let you specify which cache ways ways the cache > is allocated ? - No , this is implementation specific as mentioned > in the SDM. So when we configure a mask , you really dont know which > ways or which exact lines are used on which SKUs .. We may not see > any use case as well which is needed for apps to allocate cache in > specific areas and the h/w does not support this as well. Ok, can you comment whether the userspace interface proposed addresses all your use cases ? > 3. Letting the user specify size in bytes instead of bitmap : we > have already gone through this discussion in older versions. The > user can simply check the size of the total cache and understand > what map could be what size. I dont see a special need to specify an > interface to enter the cache in bytes and then round off - user > could instead use the roundoff values before hand or iow it > automatically does when he specifies the bitmask. When you move from processor A with CBM bitmask format X to hardware B with CBM bitmask format Y, and the formats Y and X are different, you have to manually adjust the format. Please reply to the userspace proposal, the problem is very explicit there. > ex: find cache size from /proc/cpuinfo. - say 20MB > bitmask max - 0xfffff. > > This means the roundoff(chunk) size supported is only 1MB , so when > you specify the mask say 0x3(2MB) thats already taken care of. > Same applies to percentage - the masks automatically round off the percentage. > > Please note that this is quite different from the way we can > allocate memory in bytes and needs to be treated differently given > that the hardware provides interface in a particular way. > > 4. Letting the kernel automatically extend the bitmap may affect a > lot of other things Lets talk about them. What other things? > and will need a lot of heuristics - note that we > have overlapping masks. I proposed a way to avoid heuristics by exposing whether the cgroup is "expandable" or not and asked your input. We really do not want to waste cache if we can avoid it. > This interface lets the super-user control > the cache allocation and it may be very confusing for the user if he > has allocated a cache mask and suddenly from under the floor the > kernel changes it. Agree. > > Thanks, > Vikas > > > On Fri, 31 Jul 2015, Marcelo Tosatti wrote: > > >On Thu, Jul 30, 2015 at 04:03:07PM -0700, Vikas Shivappa wrote: > >> > >> > >>On Thu, 30 Jul 2015, Marcelo Tosatti wrote: > >> > >>>On Thu, Jul 30, 2015 at 10:47:23AM -0700, Vikas Shivappa wrote: > >>>> > >>>> > >>>>Marcello, > >>>> > >>>> > >>>>On Wed, 29 Jul 2015, Marcelo Tosatti wrote: > >>>>> > >>>>>How about this: > >>>>> > >>>>>desiredclos (closid p1 p2 p3 p4) > >>>>> 1 1 0 0 0 > >>>>> 2 0 0 0 1 > >>>>> 3 0 1 1 0 > >>>> > >>>>#1 Currently in the rdt cgroup , the root cgroup always has all the > >>>>bits set and cant be changed (because the cgroup hierarchy would by > >>>>default make this to have all bits as all the children need to have > >>>>a subset of the root's bitmask). So if the user creates a cgroup and > >>>>not put any task in it , the tasks in the root cgroup could be still > >>>>using that part of the cache. Thats the reason i say we can have > >>>>really 'exclusive' masks. > >>>> > >>>>Or in other words - there is always a desired clos (0) which has all > >>>>parts set which acts like a default pool. > >>>> > >>>>Also the parts can overlap. Please apply this for all the below > >>>>comments which will change the way they work. > >>> > >>> > >>>> > >>>>> > >>>>>p means part. > >>>> > >>>>I am assuming p = (a contiguous cache capacity bit mask) > >>>> > >>>>>closid 1 is a exclusive cgroup. > >>>>>closid 2 is a "cache hog" class. > >>>>>closid 3 is "default closid". > >>>>> > >>>>>Desiredclos is what user has specified. > >>>>> > >>>>>Transition 1: desiredclos --> effectiveclos > >>>>>Clean all bits of unused closid's > >>>>>(that must be updated whenever a > >>>>>closid1 cgroup goes from empty->nonempty > >>>>>and vice-versa). > >>>>> > >>>>>effectiveclos (closid p1 p2 p3 p4) > >>>>> 1 0 0 0 0 > >>>>> 2 0 0 0 1 > >>>>> 3 0 1 1 0 > >>>> > >>>>> > >>>>>Transition 2: effectiveclos --> expandedclos > >>>>>expandedclos (closid p1 p2 p3 p4) > >>>>> 1 0 0 0 0 > >>>>> 2 0 0 0 1 > >>>>> 3 1 1 1 0 > >>>>>Then you have different inplacecos for each > >>>>>CPU (see pseudo-code below): > >>>>> > >>>>>On the following events. > >>>>> > >>>>>- task migration to new pCPU: > >>>>>- task creation: > >>>>> > >>>>> id = smp_processor_id(); > >>>>> for (part = desiredclos.p1; ...; part++) > >>>>> /* if my cosid is set and any other > >>>>> cosid is clear, for the part, > >>>>> synchronize desiredclos --> inplacecos */ > >>>>> if (part[mycosid] == 1 && > >>>>> part[any_othercosid] == 0) > >>>>> wrmsr(part, desiredclos); > >>>>> > >>>> > >>>>Currently the root cgroup would have all the bits set which will act > >>>>like a default cgroup where all the otherwise unused parts (assuming > >>>>they are a set of contiguous cache capacity bits) will be used. > >>> > >>>Right, but we don't want to place tasks in there in case one cgroup > >>>wants exclusive cache access. > >>> > >>>So whenever you want an exclusive cgroup you'd do: > >>> > >>>create cgroup-exclusive; reserve desired part of the cache > >>>for it. > >>>create cgroup-default; reserved all cache minus that of cgroup-exclusive > >>>for it. > >>> > >>>place tasks that belong to cgroup-exclusive into it. > >>>place all other tasks (including init) into cgroup-default. > >>> > >>>Is that right? > >> > >>Yes you could do that. > >> > >>You can create cgroups to have masks which are exclusive in todays > >>implementation, just that you could also created more cgroups to > >>overlap the masks again.. iow we dont have an exclusive flag for the > >>cgroup mask. > >>Is that a common use case in the server environment that you need to > >>prevent other cgroups from using a certain mask ? (since the root > >>user should control these allocations .. he should know?) > > > >Yes, there are two known use-cases that have this characteristic: > > > >1) High performance numeric application which has been optimized > >to a certain fraction of the cache. > > > >2) Low latency application in multi-application OS. > > > >For both cases exclusive cache access is wanted. > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/