Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932819AbcCJWuG (ORCPT ); Thu, 10 Mar 2016 17:50:06 -0500 Received: from mga09.intel.com ([134.134.136.24]:24035 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932089AbcCJWt7 (ORCPT ); Thu, 10 Mar 2016 17:49:59 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,317,1455004800"; d="scan'208";a="667314157" Date: Thu, 10 Mar 2016 14:49:57 -0800 (PST) From: Vikas Shivappa X-X-Sender: vikas@vshiva-Udesk To: Peter Zijlstra cc: Vikas Shivappa , vikas.shivappa@intel.com, linux-kernel@vger.kernel.org, x86@kernel.org, hpa@zytor.com, tglx@linutronix.de, mingo@kernel.org, ravi.v.shankar@intel.com, tony.luck@intel.com, fenghua.yu@intel.com, h.peter.anvin@intel.com Subject: Re: [PATCH 4/6] x86/mbm: Memory bandwidth monitoring event management In-Reply-To: <20160307230329.GT6344@twins.programming.kicks-ass.net> Message-ID: References: <1456876108-28770-1-git-send-email-vikas.shivappa@linux.intel.com> <1456876108-28770-5-git-send-email-vikas.shivappa@linux.intel.com> <20160307230329.GT6344@twins.programming.kicks-ass.net> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4459 Lines: 178 On Mon, 7 Mar 2016, Peter Zijlstra wrote: > On Tue, Mar 01, 2016 at 03:48:26PM -0800, Vikas Shivappa wrote: > >> Lot of the scheduling code was taken out from Tony's patch and a 3-4 >> lines of change were added in the intel_cqm_event_read. Since the timer >> is no more added on every context switch this change was made. > > It this here to confuse people or is there some actual information in > it? Will remove the comment. > >> +/* >> + * MBM Counter is 24bits wide. MBM_CNTR_MAX defines max counter >> + * value >> + */ >> +#define MBM_CNTR_MAX 0xffffff > > #define MBM_CNTR_WIDTH 24 > #define MBM_CNTR_MAX ((1U << MBM_CNTR_WIDTH) - 1) > > Will fix >> #define QOS_L3_OCCUP_EVENT_ID (1 << 0) >> +/* >> + * MBM Event IDs as defined in SDM section 17.15.5 >> + * Event IDs are used to program EVTSEL MSRs before reading mbm event counters >> + */ >> +enum mbm_evt_type { >> + QOS_MBM_TOTAL_EVENT_ID = 0x02, >> + QOS_MBM_LOCAL_EVENT_ID, >> + QOS_MBM_TOTAL_BW_EVENT_ID, >> + QOS_MBM_LOCAL_BW_EVENT_ID, >> +}; > > QOS_L3_*_EVENT_ID is a define, these are an enum. Rather inconsistent. > Will be changing all of them to #define . and we are also removing the bw events.. >> struct rmid_read { >> u32 rmid; > > Hole, you could've filled with the enum (which ends up being an int I > think). > >> atomic64_t value; >> + enum mbm_evt_type evt_type; >> }; > >> +static bool is_mbm_event(int e) > > You had an enum type, you might as well use it. the enum will be gone.. > >> +{ >> + return (e >= QOS_MBM_TOTAL_EVENT_ID && e <= QOS_MBM_LOCAL_BW_EVENT_ID); >> +} >> > >> +static struct sample *update_sample(unsigned int rmid, >> + enum mbm_evt_type evt_type, int first) >> +{ >> + ktime_t cur_time; >> + struct sample *mbm_current; >> + u32 vrmid = rmid_2_index(rmid); >> + u64 val, bytes, diff_time; >> + u32 eventid; >> + >> + if (evt_type & QOS_MBM_LOCAL_EVENT_MASK) { >> + mbm_current = &mbm_local[vrmid]; >> + eventid = QOS_MBM_LOCAL_EVENT_ID; >> + } else { >> + mbm_current = &mbm_total[vrmid]; >> + eventid = QOS_MBM_TOTAL_EVENT_ID; >> + } >> + >> + cur_time = ktime_get(); >> + wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid); >> + rdmsrl(MSR_IA32_QM_CTR, val); >> + if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) >> + return mbm_current; > >> + val &= MBM_CNTR_MAX; > >> + if (val < mbm_current->prev_msr) >> + bytes = MBM_CNTR_MAX - mbm_current->prev_msr + val + 1; >> + else >> + bytes = val - mbm_current->prev_msr; > > Would not something like: > > shift = 64 - MBM_CNTR_WIDTH; > > bytes = (val << shift) - (prev << shift); > bytes >>= shift; > > be less obtuse? (and consistent with how every other perf update > function does it). Will fix. > > What guarantee is there we didn't wrap multiple times? Doesn't that > deserve a comment? this is taken care of in the next patch 0006. I have put a comment there that h/w guarentees that overflow wont happen with in 1s at the definition of the timers, but can add an other comment here in the patch 0006 > >> + bytes *= cqm_l3_scale; >> + >> + mbm_current->total_bytes += bytes; >> + mbm_current->interval_bytes += bytes; >> + mbm_current->prev_msr = val; >> + diff_time = ktime_ms_delta(cur_time, mbm_current->interval_start); > > Here we do a / 1e6 > >> + >> + /* >> + * The b/w measured is really the most recent/current b/w. >> + * We wait till enough time has passed to avoid >> + * arthmetic rounding problems.Having it at >=100ms, >> + * such errors would be <=1%. >> + */ >> + if (diff_time > 100) { > > This could well be > 100e6 instead, avoiding the above division most of > the time. > >> + bytes = mbm_current->interval_bytes * MSEC_PER_SEC; >> + do_div(bytes, diff_time); >> + mbm_current->bandwidth = bytes; >> + mbm_current->interval_bytes = 0; >> + mbm_current->interval_start = cur_time; >> + } >> + >> + return mbm_current; >> +} > > How does the above time tracking deal with the event not actually having > been scheduled the whole time? Will be removing the bw events - so should address all three comments above. > > >> +static void init_mbm_sample(u32 rmid, enum mbm_evt_type evt_type) >> +{ >> + struct rmid_read rr = { >> + .value = ATOMIC64_INIT(0), >> + }; >> + >> + rr.rmid = rmid; >> + rr.evt_type = evt_type; > > That's just sad.. put those two in the struct init as well. Will fix thanks, vikas > >> + /* on each socket, init sample */ >> + on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_init, &rr, 1); >> +} >