Date: Thu, 10 Mar 2016 14:49:57 -0800 (PST)
From: Vikas Shivappa <vikas.shivappa@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>, vikas.shivappa@intel.com,
        linux-kernel@vger.kernel.org, x86@kernel.org, hpa@zytor.com,
        tglx@linutronix.de, mingo@kernel.org, ravi.v.shankar@intel.com,
        tony.luck@intel.com, fenghua.yu@intel.com, h.peter.anvin@intel.com
Subject: Re: [PATCH 4/6] x86/mbm: Memory bandwidth monitoring event
 management
In-Reply-To: <20160307230329.GT6344@twins.programming.kicks-ass.net>
Message-ID: <alpine.DEB.2.10.1603101446470.2648@vshiva-Udesk>
References: <1456876108-28770-1-git-send-email-vikas.shivappa@linux.intel.com> <1456876108-28770-5-git-send-email-vikas.shivappa@linux.intel.com> <20160307230329.GT6344@twins.programming.kicks-ass.net>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4459
Lines: 178


On Mon, 7 Mar 2016, Peter Zijlstra wrote:

> On Tue, Mar 01, 2016 at 03:48:26PM -0800, Vikas Shivappa wrote:
>
>> Lot of the scheduling code was taken out from Tony's patch and a 3-4
>> lines of change were added in the intel_cqm_event_read. Since the timer
>> is no more added on every context switch this change was made.
>
> It this here to confuse people or is there some actual information in
> it?

Will remove the comment.

>
>> +/*
>> + * MBM Counter is 24bits wide. MBM_CNTR_MAX defines max counter
>> + * value
>> + */
>> +#define MBM_CNTR_MAX		0xffffff
>
> #define MBM_CNTR_WIDTH	24
> #define MBM_CNTR_MAX	((1U << MBM_CNTR_WIDTH) - 1)
>
>
Will fix

>>  #define QOS_L3_OCCUP_EVENT_ID	(1 << 0)
>> +/*
>> + * MBM Event IDs as defined in SDM section 17.15.5
>> + * Event IDs are used to program EVTSEL MSRs before reading mbm event counters
>> + */
>> +enum mbm_evt_type {
>> +	QOS_MBM_TOTAL_EVENT_ID = 0x02,
>> +	QOS_MBM_LOCAL_EVENT_ID,
>> +	QOS_MBM_TOTAL_BW_EVENT_ID,
>> +	QOS_MBM_LOCAL_BW_EVENT_ID,
>> +};
>
> QOS_L3_*_EVENT_ID is a define, these are an enum. Rather inconsistent.
>

Will be changing all of them to #define . and we are also removing the bw 
events..

>>  struct rmid_read {
>>  	u32 rmid;
>
> Hole, you could've filled with the enum (which ends up being an int I
> think).
>
>>  	atomic64_t value;
>> +	enum mbm_evt_type evt_type;
>>  };
>
>> +static bool is_mbm_event(int e)
>
> You had an enum type, you might as well use it.

the enum will be gone..

>
>> +{
>> +	return (e >= QOS_MBM_TOTAL_EVENT_ID && e <= QOS_MBM_LOCAL_BW_EVENT_ID);
>> +}
>>
>
>> +static struct sample *update_sample(unsigned int rmid,
>> +				    enum mbm_evt_type evt_type, int first)
>> +{
>> +	ktime_t cur_time;
>> +	struct sample *mbm_current;
>> +	u32 vrmid = rmid_2_index(rmid);
>> +	u64 val, bytes, diff_time;
>> +	u32 eventid;
>> +
>> +	if (evt_type & QOS_MBM_LOCAL_EVENT_MASK) {
>> +		mbm_current = &mbm_local[vrmid];
>> +		eventid     =  QOS_MBM_LOCAL_EVENT_ID;
>> +	} else {
>> +		mbm_current = &mbm_total[vrmid];
>> +		eventid     = QOS_MBM_TOTAL_EVENT_ID;
>> +	}
>> +
>> +	cur_time = ktime_get();
>> +	wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
>> +	rdmsrl(MSR_IA32_QM_CTR, val);
>> +	if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
>> +		return mbm_current;
>
>> +	val &= MBM_CNTR_MAX;
>
>> +	if (val < mbm_current->prev_msr)
>> +		bytes = MBM_CNTR_MAX - mbm_current->prev_msr + val + 1;
>> +	else
>> +		bytes = val - mbm_current->prev_msr;
>
> Would not something like:
>
> 	shift = 64 - MBM_CNTR_WIDTH;
>
> 	bytes = (val << shift) - (prev << shift);
> 	bytes >>= shift;
>
> be less obtuse? (and consistent with how every other perf update
> function does it).

Will fix.

>
> What guarantee is there we didn't wrap multiple times? Doesn't that
> deserve a comment?


this is taken care of in the next patch 0006. I have put a comment there that 
h/w guarentees that overflow wont happen with in 1s at the definition of the 
timers, but can add an other comment here in the patch 0006

>
>> +	bytes *= cqm_l3_scale;
>> +
>> +	mbm_current->total_bytes += bytes;
>> +	mbm_current->interval_bytes += bytes;
>> +	mbm_current->prev_msr = val;
>> +	diff_time = ktime_ms_delta(cur_time, mbm_current->interval_start);
>
> Here we do a / 1e6
>
>> +
>> +	/*
>> +	 * The b/w measured is really the most recent/current b/w.
>> +	 * We wait till enough time has passed to avoid
>> +	 * arthmetic rounding problems.Having it at >=100ms,
>> +	 * such errors would be <=1%.
>> +	 */
>> +	if (diff_time > 100) {
>
> This could well be > 100e6 instead, avoiding the above division most of
> the time.
>
>> +		bytes = mbm_current->interval_bytes * MSEC_PER_SEC;
>> +		do_div(bytes, diff_time);
>> +		mbm_current->bandwidth = bytes;
>> +		mbm_current->interval_bytes = 0;
>> +		mbm_current->interval_start = cur_time;
>> +	}
>> +
>> +	return mbm_current;
>> +}
>
> How does the above time tracking deal with the event not actually having
> been scheduled the whole time?

Will be removing the bw events - so should address all three comments above.

>
>
>> +static void init_mbm_sample(u32 rmid, enum mbm_evt_type evt_type)
>> +{
>> +	struct rmid_read rr = {
>> +		.value = ATOMIC64_INIT(0),
>> +	};
>> +
>> +	rr.rmid = rmid;
>> +	rr.evt_type = evt_type;
>
> That's just sad.. put those two in the struct init as well.

Will fix

thanks,
vikas
>
>> +	/* on each socket, init sample */
>> +	on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_init, &rr, 1);
>> +}
>