Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S941659AbcLWUus (ORCPT ); Fri, 23 Dec 2016 15:50:48 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:40371 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753362AbcLWUur (ORCPT ); Fri, 23 Dec 2016 15:50:47 -0500 Date: Fri, 23 Dec 2016 21:33:18 +0100 From: Peter Zijlstra To: Shivappa Vikas Cc: Vikas Shivappa , linux-kernel@vger.kernel.org, x86@kernel.org, tglx@linutronix.de, ravi.v.shankar@intel.com, tony.luck@intel.com, fenghua.yu@intel.com, andi.kleen@intel.com, davidcc@google.com, eranian@google.com, hpa@zytor.com Subject: Re: [PATCH 01/14] x86/cqm: Intel Resource Monitoring Documentation Message-ID: <20161223203318.GU3107@twins.programming.kicks-ass.net> References: <1481929988-31569-1-git-send-email-vikas.shivappa@linux.intel.com> <1481929988-31569-2-git-send-email-vikas.shivappa@linux.intel.com> <20161223123228.GQ3107@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3421 Lines: 78 On Fri, Dec 23, 2016 at 11:35:03AM -0800, Shivappa Vikas wrote: > > Hello Peterz, > > On Fri, 23 Dec 2016, Peter Zijlstra wrote: > > >On Fri, Dec 16, 2016 at 03:12:55PM -0800, Vikas Shivappa wrote: > >>+Continuous monitoring > >>+--------------------- > >>+A new file cont_monitoring is added to perf_cgroup which helps to enable > >>+cqm continuous monitoring. Enabling this field would start monitoring of > >>+the cgroup without perf being launched. This can be used for long term > >>+light weight monitoring of tasks/cgroups. > >>+ > >>+To enable continuous monitoring of cgroup p1. > >>+#echo 1 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_cont_monitoring > >>+ > >>+To disable continuous monitoring of cgroup p1. > >>+#echo 0 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_cont_monitoring > >>+ > >>+To read the counters at the end of monitoring perf can be used. > >>+ > >>+LAZY and NOLAZY Monitoring > >>+-------------------------- > >>+LAZY: > >>+By default when monitoring is enabled, the RMIDs are not allocated > >>+immediately and allocated lazily only at the first sched_in. > >>+There are 2-4 RMIDs per logical processor on each package. So if a dual > >>+package has 48 logical processors, there would be upto 192 RMIDs on each > >>+package = total of 192x2 RMIDs. > >>+There is a possibility that RMIDs can runout and in that case the read > >>+reports an error since there was no RMID available to monitor for an > >>+event. > >>+ > >>+NOLAZY: > >>+When user wants guaranteed monitoring, he can enable the 'monitoring > >>+mask' which is basically used to specify the packages he wants to > >>+monitor. The RMIDs are statically allocated at open and failure is > >>+indicated if RMIDs are not available. > >>+ > >>+To specify monitoring on package 0 and package 1: > >>+#echo 0-1 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_mon_mask > >>+ > >>+An error is thrown if packages not online are specified. > > > >I very much dislike both those for adding files to the perf cgroup. > >Drivers should really not do that. > > Is the continuous monitoring the issue or the interface (adding a file in > perf_cgroup) ? I have not mentioned in the documentaion but this continuous > monitoring/ monitoring mask applies only to cgroup in this patch and hence > we thought a good place for that is in the cgroup itself because its per > cgroup. > > For task events , this wont apply and we are thinking of providing a prctl > based interface for user to toggle the continous monitoring .. More fail.. > > > >I absolutely hate the second because events already have affinity. > > This applies to continuous monitoring as well when there are no events > associated. Meaning if the monitoring mask is chosen and user tries to > enable continuous monitoring using the cgrp->cont_mon - all RMIDs are > allocated immediately. the mon_mask provides a way for the user to have > guarenteed RMIDs for both that have events and for continoous monitoring(no > perf event associated) (assuming user uses it when user knows he would > definitely use it.. or else there is LAZY mode) > > Again this is cgroup specific and wont apply to task events and is needed > when there are no events associated. So no, the problem is that a driver introduces special ABI and behaviour that radically departs from the regular behaviour. Also, the 'whoops you ran out of RMIDs, please reboot' thing totally and completely blows.