Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932392AbcCHItZ (ORCPT ); Tue, 8 Mar 2016 03:49:25 -0500 Received: from casper.infradead.org ([85.118.1.10]:54391 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932151AbcCHItT (ORCPT ); Tue, 8 Mar 2016 03:49:19 -0500 Date: Tue, 8 Mar 2016 09:49:13 +0100 From: Peter Zijlstra To: "Luck, Tony" Cc: Vikas Shivappa , "Shivappa, Vikas" , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , "hpa@zytor.com" , "tglx@linutronix.de" , "mingo@kernel.org" , "Shankar, Ravi V" , "Yu, Fenghua" , "Anvin, H Peter" Subject: Re: [PATCH 4/6] x86/mbm: Memory bandwidth monitoring event management Message-ID: <20160308084913.GV6344@twins.programming.kicks-ass.net> References: <1456876108-28770-1-git-send-email-vikas.shivappa@linux.intel.com> <1456876108-28770-5-git-send-email-vikas.shivappa@linux.intel.com> <20160307230329.GT6344@twins.programming.kicks-ass.net> <3908561D78D1C84285E8C5FCA982C28F3A000C1D@ORSMSX114.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F3A000C1D@ORSMSX114.amr.corp.intel.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1884 Lines: 41 On Mon, Mar 07, 2016 at 11:27:26PM +0000, Luck, Tony wrote: > >> + bytes = mbm_current->interval_bytes * MSEC_PER_SEC; > >> + do_div(bytes, diff_time); > >> + mbm_current->bandwidth = bytes; > >> + mbm_current->interval_bytes = 0; > >> + mbm_current->interval_start = cur_time; > >> + } > >>> + > >> + return mbm_current; > >> +} > > > > How does the above time tracking deal with the event not actually having > > been scheduled the whole time? > > That's been the topic of a few philosophical debates ... what exactly are > we trying to say when we report that a process has a "memory bandwidth" > of, say, 1523 MBytes/s? We need to know both the amount of data moved > and to pick an interval to measure and divide by. Does it make a difference > whether the process voluntarily gave up the cpu for some part of the interval > (by blocking on I/O)? Or did the scheduler time-slice it out to run other jobs? > > The above code gives the average bandwidth across the last interval > (with a minimum interval size of 100ms to avoid craziness with rounding > errors on exceptionally tiny intervals). Some folks apparently want to get > a "rate" directly from perf. I think many folks will find the "bytes" counters > more helpful (where they control the sample interval with '-I" flag to perf > utility). So why didn't any of that make it into the Changelog? This is very much different from any other perf driver, at the very least this debate should have been mentioned and the choice defended. Also, why are we doing the time tracking and divisions at all? Can't we simply report the number of bytes transferred and let userspace sort out the rest? Userspace is provided the time the event was enabled, the time the event was running and it can fairly trivially obtain walltime if it so desires and then it can compute whatever definition of bandwidth it wants to use.