Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752839AbbKQBCR (ORCPT ); Mon, 16 Nov 2015 20:02:17 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33805 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751053AbbKQBCP (ORCPT ); Mon, 16 Nov 2015 20:02:15 -0500 Date: Mon, 16 Nov 2015 23:01:43 -0200 From: Marcelo Tosatti To: Peter Zijlstra Cc: Luiz Capitulino , Thomas Gleixner , Vikas Shivappa , Tejun Heo , Yu Fenghua , linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC] ioctl based CAT interface Message-ID: <20151117010143.GA12871@amt.cnet> References: <20151113163933.GA10222@amt.cnet> <20151113165100.GI17308@twins.programming.kicks-ass.net> <20151113173303.GB13490@amt.cnet> <20151116090756.GM17308@twins.programming.kicks-ass.net> <20151116163903.GA28870@amt.cnet> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151116163903.GA28870@amt.cnet> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4393 Lines: 137 On Mon, Nov 16, 2015 at 02:39:03PM -0200, Marcelo Tosatti wrote: > On Mon, Nov 16, 2015 at 10:07:56AM +0100, Peter Zijlstra wrote: > > On Fri, Nov 13, 2015 at 03:33:04PM -0200, Marcelo Tosatti wrote: > > > On Fri, Nov 13, 2015 at 05:51:00PM +0100, Peter Zijlstra wrote: > > > > On Fri, Nov 13, 2015 at 02:39:33PM -0200, Marcelo Tosatti wrote: > > > > > + * * one tcrid entry can be in different locations > > > > > + * in different sockets. > > > > > > > > NAK on that without cpuset integration. > > > > > > > > I do not want freely migratable tasks having radically different > > > > performance profiles depending on which CPU they land. > > > > > > Ok, so, configuration: > > > > > > > > > Socket-1 Socket-2 > > > > > > pinned thread-A with 100% L3 free > > > 80% of L3 > > > reserved > > > > > > > > > So it is a problem if a thread running on socket-2 is scheduled to > > > socket-1 because performance is radically different, fine. > > > > > > Then one way to avoid that is to not allow freely migratable tasks > > > to move to Socket-1. Fine. > > > > > > Then you want to use cpusets for that. > > > > > > Can you fill in the blanks what is missing here? > > > > I'm still not seeing what the problem with CAT-cgroup is. > > > > /cgroups/cpuset/ > > socket-1/cpus = $socket-1 > > tasks = $thread-A > > > > socket-2/cpus = $socket-2 > > tasks = $thread-B > > > > /cgroups/cat/ > > group-A/bitmap = 0x3F / 0xFF > > group-A/tasks = $thread-A > > > > group-B/bitmap = 0xFF / 0xFF > > group-B/tasks = $thread-B > > > > > > That gets you thread-A on socket-1 with 6/8 of the L3 and thread-B on > > socket-2 with 8/8 of the L3. > > Going that route, might as well expose the region shared with HW > to userspace and let userspace handle the problem of contiguous free regions, > which means the cgroups bitmask maps one-to-one to HW bitmap. > > All is necessary then is to modify the Intel patches to > > 1) Support bitmaps per socket. Consider the following scenario, one server with two sockets: socket-1 socket-2 [ [***] ] [ [***] ] L3 cache bitmap L3 cache bitmap [*] refers to the region shared with HW, as reported by CPUID (read the Intel documentation). socket-1.shared_region_with_hw = [bit 2, bit 5] socket-2.shared_region_with_hw = [bit 16, bit 18] Given that your application is critical, you do not want it to share any reservation with HW. I was informed that there is no guarantee these regions end up in the same location for different sockets. Lets say you need 15 bits of reservation, and the total is 20 bits. One possibility would be: socket-1.reservation = [bit 5, bit 15] socket-2.reservation = [bit 1, bit 15] For the current Intel CAT patchset, this restriction exists: static int cbm_validate_rdt_cgroup(struct intel_rdt *ir, unsigned long cbmvalue) { struct cgroup_subsys_state *css; struct intel_rdt *par, *c; unsigned long cbm_tmp = 0; int err = 0; if (!cbm_validate(cbmvalue)) { err = -EINVAL; goto out_err; } par = parent_rdt(ir); clos_cbm_table_read(par->closid, &cbm_tmp); if (!bitmap_subset(&cbmvalue, &cbm_tmp, MAX_CBM_LENGTH)) { err = -EINVAL; goto out_err; } Do you (or the author of the patch), can explain why is this restriction here? If the restriction has to be maintained, than one hierarchy per-socket will be necessary to support different bitmaps per socket. If the restriction can be removed, then non hierarchical support could look like: /cgroups/cat/group-A/tasks = $thread-A /cgroups/cat/group-A/socket-1/bitmap = 0x3F / 0xFF /cgroups/cat/group-A/socket-2/bitmap = 0x... / 0xFF Or one l3_cbm file containing one mask per socket, separated by commas, similar to /sys/devices/system/node/node0/cpumap > 2) Remove hierarchical support. There is nothing hierarchical in CAT, its flat. Each set of tasks is associated with a number of bits in each socket's L3 CBM mask. > 3) Lazy enforcement (which can be done later as an improvement). > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/