Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762157AbcLWL6X (ORCPT ); Fri, 23 Dec 2016 06:58:23 -0500 Received: from mail-ua0-f170.google.com ([209.85.217.170]:32979 "EHLO mail-ua0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760741AbcLWL6V (ORCPT ); Fri, 23 Dec 2016 06:58:21 -0500 MIME-Version: 1.0 In-Reply-To: <1481929988-31569-12-git-send-email-vikas.shivappa@linux.intel.com> References: <1481929988-31569-1-git-send-email-vikas.shivappa@linux.intel.com> <1481929988-31569-12-git-send-email-vikas.shivappa@linux.intel.com> From: David Carrillo-Cisneros Date: Fri, 23 Dec 2016 03:58:19 -0800 Message-ID: Subject: Re: [PATCH 11/14] x86/cqm: Add failure on open and read To: Vikas Shivappa Cc: Vikas Shivappa , linux-kernel , x86 , Thomas Gleixner , Peter Zijlstra , "Shankar, Ravi V" , "Luck, Tony" , Fenghua Yu , andi.kleen@intel.com, Stephane Eranian , hpa@zytor.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7169 Lines: 180 On Fri, Dec 16, 2016 at 3:13 PM, Vikas Shivappa wrote: > To provide reliable output to the user, cqm throws error when it does > not have enough RMIDs to monitor depending upon the mode user choses. > This also takes care to not overuse RMIDs. Default is LAZY mode. > > NOLAZY mode: This patch adds a file mon_mask in the perf_cgroup which > indicates the packages which the user wants guaranteed monitoring. For > such cgroup events RMIDs are assigned at event create and we fail if > enough RMIDs are not present. This is basically a NOLAZY allocation of > RMIDs. This mode can be used in real time scenarios where user is sure > that tasks that are monitored are scheduled. > > LAZY mode: If user did not enable the NOLAZY mode, RMIDs are allocated > only when tasks are actually scheduled. Upon failure to obtain RMIDs it > indicates a failure in read. Typical use case for this mode could be to > start monitoring cgroups which still donot have any tasks in them and > such cgroups are part of large number of cgroups which are monitored - > that way we donot overuse RMIDs. > The proposed interface is: - a global boolean cqm_cont_monitoring. - a per-package boolean in the bitfield cqm_mon_mask. So, for each package there will be four states, yet one of them is not meaningful: cont_monitoring, cqm_mon_mask[p]: meaning ------------------------------------------ 0, 0 : off 0, 1 : off but reserve a RMID that is not going to be used? 1, 0 : on with NOLAZY 1, 1 : on with LAZY the case 0,1 is problematic. How can new cases be added in the future? another file? What's wrong with having a pkg0_flags;pkg1_flags;...;pkgn_flags cont_monitoring file, that is more akin to the RDT Allocation format. (There is a parser function and implementation for that format in v3 of my CMT series). Below is a full discussion about how many per-package configuration states are useful now and if/when RMID rotation is added. There are two types of error sources introduced by not having a RMID when one is needed: - E_read : Introduced into the measurement when stealing a RMID with non-zero occupancy. - E_sched: Introduced when a thread runs but no RMID is available for it. A user may have two tolerance levels to errors that determine if an event can be read or read should fail: - NoTol : No tolerance to error at all. If there has been any type of E_read or E_sched in the past, read must give an error. - SomeTol: Tolerate _some_ error. It can be defined in terms of time, magnitude or both. As an example, in v3 of my CMT patches, I assumed a user would tolerate an error that occurred more that an arbitrarily chosen time in the past. The minimum criterion is that there should at least be a RMID at the time of read. The driver can follow two types of RMID allocation policies: - NoLazy: reserve RMID as soon as user starts monitoring (when event is created or cont_monitoring is set). This policy introduces no error. - Lazy: reserve RMID first time a task is sched in. May introduce E_sched if no RMID available on sched in. and three RMID deallocation policies: - Fixed: RMID can never be stolen. This policy introduces no error into the measurement. - Reuse: RMID can be stolen when not scheduled thread is using it and it has non-zero occupancy. This policy may introduce E_sched when no RMID available on sched_in after an incidence of reuse. - Steal: RMID can be stolen any time. This policy introduces both E_sched and E_read errors into the measurement (this is the so-called RMID rotation). Therefore there are three possible risks levels: - No Risk: possible with NoLazy & Fixed - Risk of E_sched: possible with either NoLazy & Reuse or Lazy & Fixed or Lazy & Reuse - Risk of E_sched and E_read: possible with NoLazy & Steal or Lazy & Steal Notes: a) E_read only is impossible. b) In "No Risk" a RMID must be allocated in advance and never released, even if unused (e.g. a task may run only in one package but we allocade RMID in all of them). c) For the E_sched risk, Lazy & Reuse give the highest RMID flexibility. d) For the E_read and E_sched risk, NoLazy & Steal give the highest RMID flexibility. Combining all three criteria, the possible configuration modes that make sense are: 1) No monitoring. 2) NoLazy & Fixed & NoTol. RMID is allocated when event is created (or cont_monitoring is set). No possible error. May waste RMIDs. 3) Lazy & Reusable & NoTol. RMID are allocated as needed, taken away when unused. May fail to find RMID if there is RMID contention, once it fail, the event/cgroup must be in error state. 4) Lazy & Reusable & SomeTol. Similar to (3) but event/cgroup recovers from error state if a recovered RMID stays valid for long enough. 5 and 6) Lazy allocation & Stealable with and without Tol . RMID can be stolen even if non-empty or in use. Q. Which modes are useful? Stephane and I see a clear use for (2). Users of cont_monitoring look to avoid error and may tolerate wasted RMIDs. It has the advantage that allows to fail on event creation (or when cont_monitoring is set). This is the same mode introduced with NOLAZY in cqm_mon_mask in this patch. Mode (3) can be viewed as an optimistic approach to RMID allocation that allows more concurrent users than 2 when cache occupancy drops quickly and/or task/cgroups manifest strong package locality. It still guarantees exact measurements (within hw constraints) when read succeeds. Mode (4) is more useful than 3 _if_ it can be assumed that the system will replace enough cache lines before the tolerance time expires (otherwise it reads just garbage). Yet, it's not clear to me how often this assumption is valid. Modes (5) and (6) require RMID rotation, so they wouldn't be part of this patch series. > +static ssize_t cqm_mon_mask_write(struct kernfs_open_file *of, > + char *buf, size_t nbytes, loff_t off) > +{ > + cpumask_var_t tmp_cpus, tmp_cpus1; > + struct cgrp_cqm_info *cqm_info; > + unsigned long flags; > + int ret = 0; > + > + buf = strstrip(buf); > + > + if (!zalloc_cpumask_var(&tmp_cpus, GFP_KERNEL) || > + !zalloc_cpumask_var(&tmp_cpus1, GFP_KERNEL)) { > + ret = -ENOMEM; > + goto out; > + } > + > + ret = cpulist_parse(buf, tmp_cpus); > + if (ret) > + goto out; > + > + if (cpumask_andnot(tmp_cpus1, tmp_cpus, &cqm_pkgmask)) { > + ret = -EINVAL; > + goto out; > + } > + > + raw_spin_lock_irqsave(&cache_lock, flags); > + cqm_info = css_to_cqm_info(of_css(of)); > + cpumask_copy(&cqm_info->mon_mask, tmp_cpus); > + raw_spin_unlock_irqrestore(&cache_lock, flags); So this only copies the mask so that it can be used for the next cgroup event in intel_cqm_setup_event? That defeats the purpose of a NON_LAZY cont_monitoring. There is no need to create a new cgroup file only to provide a non-lazy event; such flag could be passed in perf_event_attr::pinned or a config field.