Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754168AbcLYBvP (ORCPT ); Sat, 24 Dec 2016 20:51:15 -0500 Received: from mga07.intel.com ([134.134.136.100]:38869 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750785AbcLYBvO (ORCPT ); Sat, 24 Dec 2016 20:51:14 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,403,1477983600"; d="scan'208";a="915923179" Date: Sat, 24 Dec 2016 17:51:14 -0800 (PST) From: Shivappa Vikas X-X-Sender: vikas@vshiva-Udesk To: Peter Zijlstra cc: Shivappa Vikas , Vikas Shivappa , linux-kernel@vger.kernel.org, x86@kernel.org, tglx@linutronix.de, ravi.v.shankar@intel.com, tony.luck@intel.com, fenghua.yu@intel.com, andi.kleen@intel.com, davidcc@google.com, eranian@google.com, hpa@zytor.com Subject: Re: [PATCH 01/14] x86/cqm: Intel Resource Monitoring Documentation In-Reply-To: <20161223203318.GU3107@twins.programming.kicks-ass.net> Message-ID: References: <1481929988-31569-1-git-send-email-vikas.shivappa@linux.intel.com> <1481929988-31569-2-git-send-email-vikas.shivappa@linux.intel.com> <20161223123228.GQ3107@twins.programming.kicks-ass.net> <20161223203318.GU3107@twins.programming.kicks-ass.net> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3891 Lines: 90 On Fri, 23 Dec 2016, Peter Zijlstra wrote: > On Fri, Dec 23, 2016 at 11:35:03AM -0800, Shivappa Vikas wrote: >> >> Hello Peterz, >> >> On Fri, 23 Dec 2016, Peter Zijlstra wrote: >> >>> On Fri, Dec 16, 2016 at 03:12:55PM -0800, Vikas Shivappa wrote: >>>> +Continuous monitoring >>>> +--------------------- >>>> +A new file cont_monitoring is added to perf_cgroup which helps to enable >>>> +cqm continuous monitoring. Enabling this field would start monitoring of >>>> +the cgroup without perf being launched. This can be used for long term >>>> +light weight monitoring of tasks/cgroups. >>>> + >>>> +To enable continuous monitoring of cgroup p1. >>>> +#echo 1 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_cont_monitoring >>>> + >>>> +To disable continuous monitoring of cgroup p1. >>>> +#echo 0 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_cont_monitoring >>>> + >>>> +To read the counters at the end of monitoring perf can be used. >>>> + >>>> +LAZY and NOLAZY Monitoring >>>> +-------------------------- >>>> +LAZY: >>>> +By default when monitoring is enabled, the RMIDs are not allocated >>>> +immediately and allocated lazily only at the first sched_in. >>>> +There are 2-4 RMIDs per logical processor on each package. So if a dual >>>> +package has 48 logical processors, there would be upto 192 RMIDs on each >>>> +package = total of 192x2 RMIDs. >>>> +There is a possibility that RMIDs can runout and in that case the read >>>> +reports an error since there was no RMID available to monitor for an >>>> +event. >>>> + >>>> +NOLAZY: >>>> +When user wants guaranteed monitoring, he can enable the 'monitoring >>>> +mask' which is basically used to specify the packages he wants to >>>> +monitor. The RMIDs are statically allocated at open and failure is >>>> +indicated if RMIDs are not available. >>>> + >>>> +To specify monitoring on package 0 and package 1: >>>> +#echo 0-1 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_mon_mask >>>> + >>>> +An error is thrown if packages not online are specified. >>> >>> I very much dislike both those for adding files to the perf cgroup. >>> Drivers should really not do that. >> >> Is the continuous monitoring the issue or the interface (adding a file in >> perf_cgroup) ? I have not mentioned in the documentaion but this continuous >> monitoring/ monitoring mask applies only to cgroup in this patch and hence >> we thought a good place for that is in the cgroup itself because its per >> cgroup. >> >> For task events , this wont apply and we are thinking of providing a prctl >> based interface for user to toggle the continous monitoring .. > > More fail.. > >>> >>> I absolutely hate the second because events already have affinity. >> >> This applies to continuous monitoring as well when there are no events >> associated. Meaning if the monitoring mask is chosen and user tries to >> enable continuous monitoring using the cgrp->cont_mon - all RMIDs are >> allocated immediately. the mon_mask provides a way for the user to have >> guarenteed RMIDs for both that have events and for continoous monitoring(no >> perf event associated) (assuming user uses it when user knows he would >> definitely use it.. or else there is LAZY mode) >> >> Again this is cgroup specific and wont apply to task events and is needed >> when there are no events associated. > > So no, the problem is that a driver introduces special ABI and behaviour > that radically departs from the regular behaviour. Ok , looks like the interface is the problem. Will try to fix this. We are just trying to have a light weight monitoring option so that its reasonable to monitor for a very long time (like lifetime of process etc). Mainly to not have all the perf scheduling overhead. May be a perf event attr option is a more reasonable approach for the user to choose the option ? (rather than some new interface like prctl / cgroup file..) Thanks, Vikas