2016-04-23 00:27:56

by Shivappa Vikas

[permalink] [raw]
Subject: [PATCH V1 0/4] Urgent fixes for Intel CQM/MBM counting

Sending some urgent fixes for the MBM(memory b/w monitoring) which is
upstreamed from 4.6-rc1. Patches apply on 4.6-rc1.

CQM and MBM counters reported some incorrect counts for different
scenarios like interval mode or for multiple perf instances. The
1/4,2/4,3/4 address these issues.

The last patch changes the cqm driver to only support task events as
support for cgroup event was broken - Just reporting a 'not-supported'
error to perf instead of pretending to support and send incorrect counts.

[PATCH 1/4] perf/x86/cqm,mbm: Store cqm,mbm count for all events when
[PATCH 2/4] perf/x86/mbm: Store bytes counted for mbm during recycle
[PATCH 3/4] perf/x86/mbm: Fix mbm counting when RMIDs are reused
[PATCH 4/4] perf/x86/cqm: Support cqm/mbm only for perf events


2016-04-23 00:28:00

by Shivappa Vikas

[permalink] [raw]
Subject: [PATCH 1/4] perf/x86/cqm,mbm: Store cqm,mbm count for all events when RMID is recycled

During RMID recycling, when an event loses the RMID we saved the counter
for group leader but it was not being saved for all the events in an
event group. This would lead to a situation where if 2 perf instances
are counting the same PID one of them would not see the updated count
which other perf instance is seeing. This patch tries to fix the issue
by saving the count for all the events in the same event group.

Signed-off-by: Vikas Shivappa <[email protected]>
---
arch/x86/events/intel/cqm.c | 39 ++++++++++++++++++++++++---------------
1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 7b5fd81..8dfba39 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -14,6 +14,14 @@
#define MSR_IA32_QM_EVTSEL 0x0c8d

#define MBM_CNTR_WIDTH 24
+
+#define __init_rr(old_rmid, config, val) \
+((struct rmid_read) { \
+ .rmid = old_rmid, \
+ .evt_type = config, \
+ .value = ATOMIC64_INIT(val), \
+})
+
/*
* Guaranteed time in ms as per SDM where MBM counters will not overflow.
*/
@@ -478,7 +486,8 @@ static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid)
{
struct perf_event *event;
struct list_head *head = &group->hw.cqm_group_entry;
- u32 old_rmid = group->hw.cqm_rmid;
+ u32 old_rmid = group->hw.cqm_rmid, evttype;
+ struct rmid_read rr;

lockdep_assert_held(&cache_mutex);

@@ -486,14 +495,21 @@ static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid)
* If our RMID is being deallocated, perform a read now.
*/
if (__rmid_valid(old_rmid) && !__rmid_valid(rmid)) {
- struct rmid_read rr = {
- .rmid = old_rmid,
- .evt_type = group->attr.config,
- .value = ATOMIC64_INIT(0),
- };

+ rr = __init_rr(old_rmid, group->attr.config, 0);
cqm_mask_call(&rr);
local64_set(&group->count, atomic64_read(&rr.value));
+ list_for_each_entry(event, head, hw.cqm_group_entry) {
+ if (event->hw.is_group_event) {
+
+ evttype = event->attr.config;
+ rr = __init_rr(old_rmid, evttype, 0);
+
+ cqm_mask_call(&rr);
+ local64_set(&event->count,
+ atomic64_read(&rr.value));
+ }
+ }
}

raw_spin_lock_irq(&cache_lock);
@@ -983,11 +999,7 @@ static void __intel_mbm_event_init(void *info)

static void init_mbm_sample(u32 rmid, u32 evt_type)
{
- struct rmid_read rr = {
- .rmid = rmid,
- .evt_type = evt_type,
- .value = ATOMIC64_INIT(0),
- };
+ struct rmid_read rr = __init_rr(rmid, evt_type, 0);

/* on each socket, init sample */
on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_init, &rr, 1);
@@ -1181,10 +1193,7 @@ static void mbm_hrtimer_init(void)
static u64 intel_cqm_event_count(struct perf_event *event)
{
unsigned long flags;
- struct rmid_read rr = {
- .evt_type = event->attr.config,
- .value = ATOMIC64_INIT(0),
- };
+ struct rmid_read rr = __init_rr(-1, event->attr.config, 0);

/*
* We only need to worry about task events. System-wide events
--
1.9.1

2016-04-23 00:27:58

by Shivappa Vikas

[permalink] [raw]
Subject: [PATCH 3/4] perf/x86/mbm: Fix mbm counting when RMIDs are reused

When multiple instances of perf reuse RMID, then we need to start
counting for each instance rather than reporting the current RMID count.
This patch adds a st_count(start count) per event to track the same.

Signed-off-by: Vikas Shivappa <[email protected]>
---
arch/x86/events/intel/cqm.c | 71 ++++++++++++++++++++++++++++++++++++++++++---
include/linux/perf_event.h | 1 +
2 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index e679c39..7328b73 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -484,8 +484,18 @@ static inline void mbm_set_rccount(
{
u64 tmpval;

- tmpval = local64_read(&event->hw.rc_count) + atomic64_read(&rr->value);
+ tmpval = local64_read(&event->hw.rc_count) + atomic64_read(&rr->value) -
+ local64_read(&event->hw.st_count);
+
local64_set(&event->hw.rc_count, tmpval);
+
+ /*
+ * The st_count(start count) is meant to store the starting bytes
+ * for an event which is reusing an RMID which already
+ * had bytes measured.Once we start using the rc_count
+ * to keep the history bytes, reset the start bytes.
+ */
+ local64_set(&event->hw.st_count, 0UL);
local64_set(&event->count, tmpval);
}

@@ -1025,6 +1035,58 @@ static void init_mbm_sample(u32 rmid, u32 evt_type)
on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_init, &rr, 1);
}

+static inline bool first_event_ingroup(struct perf_event *group,
+ struct perf_event *event)
+{
+ struct list_head *head = &group->hw.cqm_group_entry;
+ u32 evt_type = event->attr.config;
+
+ if (evt_type == group->attr.config)
+ return false;
+ list_for_each_entry(event, head, hw.cqm_group_entry) {
+ if (evt_type == event->attr.config)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * mbm_setup_event - Does mbm specific count initialization
+ * when multiple events share RMID.
+ *
+ * If this is the first mbm event using the RMID, then initialize
+ * the total_bytes in the RMID and prev_count.
+ * else only initialize the start count of the event which is the current
+ * count of the RMID.
+ * In other words if the RMID has say counted 100MB till now because
+ * other event was already using it, we start
+ * from zero for our new event. Because after 1s if user checks the count,
+ * we need to report for the 1s duration and not the entire duration the
+ * RMID was being counted.
+*/
+static inline void mbm_setup_event(u32 rmid, struct perf_event *group,
+ struct perf_event *event)
+{
+ u32 evt_type = event->attr.config;
+ struct rmid_read rr;
+
+ if (first_event_ingroup(group, event)) {
+ init_mbm_sample(rmid, evt_type);
+ } else {
+ rr = __init_rr(rmid, evt_type, 0);
+ cqm_mask_call(&rr);
+ local64_set(&event->hw.st_count, atomic64_read(&rr.value));
+ }
+}
+
+static inline void mbm_setup_event_init(struct perf_event *event)
+{
+ event->hw.is_group_event = false;
+ local64_set(&event->hw.rc_count, 0UL);
+ local64_set(&event->hw.st_count, 0UL);
+}
+
/*
* Find a group and setup RMID.
*
@@ -1037,7 +1099,7 @@ static void intel_cqm_setup_event(struct perf_event *event,
bool conflict = false;
u32 rmid;

- event->hw.is_group_event = false;
+ mbm_setup_event_init(event);
list_for_each_entry(iter, &cache_groups, hw.cqm_groups_entry) {
rmid = iter->hw.cqm_rmid;

@@ -1046,7 +1108,7 @@ static void intel_cqm_setup_event(struct perf_event *event,
event->hw.cqm_rmid = rmid;
*group = iter;
if (is_mbm_event(event->attr.config) && __rmid_valid(rmid))
- init_mbm_sample(rmid, event->attr.config);
+ mbm_setup_event(rmid, iter, event);
return;
}

@@ -1273,7 +1335,8 @@ static u64 intel_cqm_event_count(struct perf_event *event)
if (event->hw.cqm_rmid == rr.rmid) {
if (is_mbm_event(event->attr.config)) {
tmpval = atomic64_read(&rr.value) +
- local64_read(&event->hw.rc_count);
+ local64_read(&event->hw.rc_count) -
+ local64_read(&event->hw.st_count);

local64_set(&event->count, tmpval);
} else {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ec7772a..44a7f0c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -123,6 +123,7 @@ struct hw_perf_event {
u32 cqm_rmid;
int is_group_event;
local64_t rc_count;
+ local64_t st_count;
struct list_head cqm_events_entry;
struct list_head cqm_groups_entry;
struct list_head cqm_group_entry;
--
1.9.1

2016-04-23 00:28:31

by Shivappa Vikas

[permalink] [raw]
Subject: [PATCH 2/4] perf/x86/mbm: Store bytes counted for mbm during recycle

For MBM, since we report total bytes for the duration the perf counts,
we need to keep the total bytes counted every time we loose an RMID.
Introduce rc_count(recycle count) per event
keep this history count(all bytes counted before the current RMID).

If we do not keep this count separately then we may end up sending a
count that may be less than the previous count during -I perf stat
option which leads to negative numbers being reported in the perf. This
happens say when we counted a greater amount with RMID1 and then
counted lesser with RMID2, and if user checks counts in interval mode
after RMID1 and then again after RMID2.

Signed-off-by: Vikas Shivappa <[email protected]>
---
arch/x86/events/intel/cqm.c | 40 +++++++++++++++++++++++++++++++++++++---
include/linux/perf_event.h | 1 +
2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 8dfba39..e679c39 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -479,6 +479,16 @@ static void cqm_mask_call(struct rmid_read *rr)
on_each_cpu_mask(&cqm_cpumask, __intel_cqm_event_count, rr, 1);
}

+static inline void mbm_set_rccount(
+ struct perf_event *event, struct rmid_read *rr)
+{
+ u64 tmpval;
+
+ tmpval = local64_read(&event->hw.rc_count) + atomic64_read(&rr->value);
+ local64_set(&event->hw.rc_count, tmpval);
+ local64_set(&event->count, tmpval);
+}
+
/*
* Exchange the RMID of a group of events.
*/
@@ -493,12 +503,19 @@ static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid)

/*
* If our RMID is being deallocated, perform a read now.
+ * For mbm, we need to store the bytes that were counted till now
+ * separately.
*/
if (__rmid_valid(old_rmid) && !__rmid_valid(rmid)) {

rr = __init_rr(old_rmid, group->attr.config, 0);
cqm_mask_call(&rr);
- local64_set(&group->count, atomic64_read(&rr.value));
+
+ if (is_mbm_event(group->attr.config))
+ mbm_set_rccount(group, &rr);
+ else
+ local64_set(&group->count, atomic64_read(&rr.value));
+
list_for_each_entry(event, head, hw.cqm_group_entry) {
if (event->hw.is_group_event) {

@@ -506,6 +523,9 @@ static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid)
rr = __init_rr(old_rmid, evttype, 0);

cqm_mask_call(&rr);
+ if (is_mbm_event(event->attr.config))
+ mbm_set_rccount(event, &rr);
+ else
local64_set(&event->count,
atomic64_read(&rr.value));
}
@@ -1194,6 +1214,7 @@ static u64 intel_cqm_event_count(struct perf_event *event)
{
unsigned long flags;
struct rmid_read rr = __init_rr(-1, event->attr.config, 0);
+ u64 tmpval;

/*
* We only need to worry about task events. System-wide events
@@ -1235,6 +1256,11 @@ static u64 intel_cqm_event_count(struct perf_event *event)
* busying performing the IPI calls. It's therefore necessary to
* check @event's RMID afterwards, and if it has changed,
* discard the result of the read.
+ *
+ * For MBM events, we are reading the total bytes and not
+ * a snapshot. Hence if the RMID was recycled for the duration
+ * we will be adding the rc_count which keeps the historical count
+ * of old RMIDs that were used.
*/
rr.rmid = ACCESS_ONCE(event->hw.cqm_rmid);

@@ -1244,8 +1270,16 @@ static u64 intel_cqm_event_count(struct perf_event *event)
cqm_mask_call(&rr);

raw_spin_lock_irqsave(&cache_lock, flags);
- if (event->hw.cqm_rmid == rr.rmid)
- local64_set(&event->count, atomic64_read(&rr.value));
+ if (event->hw.cqm_rmid == rr.rmid) {
+ if (is_mbm_event(event->attr.config)) {
+ tmpval = atomic64_read(&rr.value) +
+ local64_read(&event->hw.rc_count);
+
+ local64_set(&event->count, tmpval);
+ } else {
+ local64_set(&event->count, atomic64_read(&rr.value));
+ }
+ }
raw_spin_unlock_irqrestore(&cache_lock, flags);
out:
return __perf_event_count(event);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f291275..ec7772a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -122,6 +122,7 @@ struct hw_perf_event {
int cqm_state;
u32 cqm_rmid;
int is_group_event;
+ local64_t rc_count;
struct list_head cqm_events_entry;
struct list_head cqm_groups_entry;
struct list_head cqm_group_entry;
--
1.9.1

2016-04-23 00:28:30

by Shivappa Vikas

[permalink] [raw]
Subject: [PATCH 4/4] perf/x86/cqm: Support cqm/mbm only for perf events

The cgroup support for cqm is broken. Instead of mapping RMID to a
cgroup currently its mapped to the task and then hence when task moves
cgroup we get incorrect count.

Also the conflict handling code which is meant to handle the case of
co-existing cgroup and task events, is broken. It reports very
confusing numbers of intermittent zero and some occupancy when perf is
run with cgroup and task events.

Hence removing support for the parts which are broken rather than
pretending to support it and giving incorrect data.

Signed-off-by: Vikas Shivappa <[email protected]>
---
arch/x86/events/intel/cqm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 7328b73..4633fb3 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -1479,7 +1479,8 @@ static int intel_cqm_event_init(struct perf_event *event)
event->attr.exclude_idle ||
event->attr.exclude_host ||
event->attr.exclude_guest ||
- event->attr.sample_period) /* no sampling */
+ event->attr.sample_period || /* no sampling */
+ !(event->attach_state & PERF_ATTACH_TASK))
return -EINVAL;

INIT_LIST_HEAD(&event->hw.cqm_group_entry);
--
1.9.1

2016-04-25 09:13:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/4] perf/x86/mbm: Store bytes counted for mbm during recycle

On Fri, Apr 22, 2016 at 05:27:19PM -0700, Vikas Shivappa wrote:
> +static inline void mbm_set_rccount(
> + struct perf_event *event, struct rmid_read *rr)

That's horrible style, the 'normal' style is something like:

static inline
void mbm_set_rccount(struct perf_event *event, struct rmid_read *rr)
{
}

> @@ -1244,8 +1270,16 @@ static u64 intel_cqm_event_count(struct perf_event *event)
> cqm_mask_call(&rr);
>
> raw_spin_lock_irqsave(&cache_lock, flags);
> - if (event->hw.cqm_rmid == rr.rmid)
> - local64_set(&event->count, atomic64_read(&rr.value));
> + if (event->hw.cqm_rmid == rr.rmid) {
> + if (is_mbm_event(event->attr.config)) {
> + tmpval = atomic64_read(&rr.value) +
> + local64_read(&event->hw.rc_count);
> +
> + local64_set(&event->count, tmpval);
> + } else {
> + local64_set(&event->count, atomic64_read(&rr.value));
> + }
> + }
> raw_spin_unlock_irqrestore(&cache_lock, flags);
> out:
> return __perf_event_count(event);

This is a 'creative' solution; why don't you do the normal thing, which
is:

start:
prev_count = read_hw_counter();

read:
do {
prev = prev_count;
cur_val = read_hw_counter();
delta = cur_val - prev;
} while (local_cmpxchg(&prev_count, prev, cur_val) != prev);
count += delta;


2016-04-25 09:17:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/4] perf/x86/mbm: Fix mbm counting when RMIDs are reused

On Fri, Apr 22, 2016 at 05:27:20PM -0700, Vikas Shivappa wrote:
> When multiple instances of perf reuse RMID, then we need to start
> counting for each instance rather than reporting the current RMID count.
> This patch adds a st_count(start count) per event to track the same.

what?

2016-04-25 09:18:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4/4] perf/x86/cqm: Support cqm/mbm only for perf events

On Fri, Apr 22, 2016 at 05:27:21PM -0700, Vikas Shivappa wrote:
> The cgroup support for cqm is broken. Instead of mapping RMID to a
> cgroup currently its mapped to the task and then hence when task moves
> cgroup we get incorrect count.
>
> Also the conflict handling code which is meant to handle the case of
> co-existing cgroup and task events, is broken. It reports very
> confusing numbers of intermittent zero and some occupancy when perf is
> run with cgroup and task events.
>
> Hence removing support for the parts which are broken rather than
> pretending to support it and giving incorrect data.

Uh what, how about attempt to fix it?

2016-04-25 09:20:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/4] perf/x86/cqm,mbm: Store cqm,mbm count for all events when RMID is recycled

On Fri, Apr 22, 2016 at 05:27:18PM -0700, Vikas Shivappa wrote:
> During RMID recycling, when an event loses the RMID we saved the counter
> for group leader but it was not being saved for all the events in an
> event group. This would lead to a situation where if 2 perf instances
> are counting the same PID one of them would not see the updated count
> which other perf instance is seeing. This patch tries to fix the issue
> by saving the count for all the events in the same event group.


> @@ -486,14 +495,21 @@ static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid)
> * If our RMID is being deallocated, perform a read now.
> */
> if (__rmid_valid(old_rmid) && !__rmid_valid(rmid)) {
>
> + rr = __init_rr(old_rmid, group->attr.config, 0);
> cqm_mask_call(&rr);
> local64_set(&group->count, atomic64_read(&rr.value));
> + list_for_each_entry(event, head, hw.cqm_group_entry) {
> + if (event->hw.is_group_event) {
> +
> + evttype = event->attr.config;
> + rr = __init_rr(old_rmid, evttype, 0);
> +
> + cqm_mask_call(&rr);
> + local64_set(&event->count,
> + atomic64_read(&rr.value));

Randomly indent much?

> + }
> + }
> }

2016-04-25 16:24:02

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH 4/4] perf/x86/cqm: Support cqm/mbm only for perf events

>> Hence removing support for the parts which are broken rather than
>> pretending to support it and giving incorrect data.
>
> Uh what, how about attempt to fix it?

No hope to do that by 4.6 release ... so I suggested to Vikas that it would be better to disable
the feature now so users wouldn't be confused by the random numbers that they'll see if
they try to do this.

-Tony

2016-04-25 16:26:56

by Shivappa Vikas

[permalink] [raw]
Subject: Re: [PATCH 1/4] perf/x86/cqm,mbm: Store cqm,mbm count for all events when RMID is recycled



On Mon, 25 Apr 2016, Peter Zijlstra wrote:

> On Fri, Apr 22, 2016 at 05:27:18PM -0700, Vikas Shivappa wrote:
>> During RMID recycling, when an event loses the RMID we saved the counter
>> for group leader but it was not being saved for all the events in an
>> event group. This would lead to a situation where if 2 perf instances
>> are counting the same PID one of them would not see the updated count
>> which other perf instance is seeing. This patch tries to fix the issue
>> by saving the count for all the events in the same event group.
>
>
>> @@ -486,14 +495,21 @@ static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid)
>> * If our RMID is being deallocated, perform a read now.
>> */
>> if (__rmid_valid(old_rmid) && !__rmid_valid(rmid)) {
>>
>> + rr = __init_rr(old_rmid, group->attr.config, 0);
>> cqm_mask_call(&rr);
>> local64_set(&group->count, atomic64_read(&rr.value));
>> + list_for_each_entry(event, head, hw.cqm_group_entry) {
>> + if (event->hw.is_group_event) {
>> +
>> + evttype = event->attr.config;
>> + rr = __init_rr(old_rmid, evttype, 0);
>> +
>> + cqm_mask_call(&rr);
>> + local64_set(&event->count,
>> + atomic64_read(&rr.value));
>
> Randomly indent much?

Will fix. It has been added by mistake in advance for the next patch

Thanks,
Vikas

>
>> + }
>> + }
>> }
>

2016-04-25 16:45:32

by Shivappa Vikas

[permalink] [raw]
Subject: Re: [PATCH 3/4] perf/x86/mbm: Fix mbm counting when RMIDs are reused



On Mon, 25 Apr 2016, Peter Zijlstra wrote:

> On Fri, Apr 22, 2016 at 05:27:20PM -0700, Vikas Shivappa wrote:
>> When multiple instances of perf reuse RMID, then we need to start
>> counting for each instance rather than reporting the current RMID count.
>> This patch adds a st_count(start count) per event to track the same.
>
> what?
>

Will fix the comit log :

When multiple instances of perf reuse RMID for the same PID, then we need to
start counting from zero for each new event, rather than reporting the current
RMID. This patch adds a st_count(start count) per event to track the same.

For ex:
1.RMID1's total_bytes is 100MB for event1(PID1)
2.another perf instance starts measuring the same PID1 with event2. We reuse
RMID1 as the PID1 is already counted.
3.event2 stores st_count as 100MB.
4.After some time, when user wants to count event2 and say RMID1's current
total_bytes 110MB, we report 110MB - 100MB = 10MB

Thanks,
Vikas

2016-04-25 18:04:55

by Shivappa Vikas

[permalink] [raw]
Subject: Re: [PATCH 2/4] perf/x86/mbm: Store bytes counted for mbm during recycle



On Mon, 25 Apr 2016, Peter Zijlstra wrote:

> On Fri, Apr 22, 2016 at 05:27:19PM -0700, Vikas Shivappa wrote:
>> +static inline void mbm_set_rccount(
>> + struct perf_event *event, struct rmid_read *rr)
>
> That's horrible style, the 'normal' style is something like:
>
> static inline
> void mbm_set_rccount(struct perf_event *event, struct rmid_read *rr)
> {
> }

Will fix.. Thanks for pointing out.

>
>> @@ -1244,8 +1270,16 @@ static u64 intel_cqm_event_count(struct perf_event *event)
>> cqm_mask_call(&rr);
>>
>> raw_spin_lock_irqsave(&cache_lock, flags);
>> - if (event->hw.cqm_rmid == rr.rmid)
>> - local64_set(&event->count, atomic64_read(&rr.value));
>> + if (event->hw.cqm_rmid == rr.rmid) {
>> + if (is_mbm_event(event->attr.config)) {
>> + tmpval = atomic64_read(&rr.value) +
>> + local64_read(&event->hw.rc_count);
>> +
>> + local64_set(&event->count, tmpval);
>> + } else {
>> + local64_set(&event->count, atomic64_read(&rr.value));
>> + }
>> + }
>> raw_spin_unlock_irqrestore(&cache_lock, flags);
>> out:
>> return __perf_event_count(event);
>
> This is a 'creative' solution; why don't you do the normal thing, which
> is:
>
> start:
> prev_count = read_hw_counter();
>
> read:
> do {
> prev = prev_count;
> cur_val = read_hw_counter();
> delta = cur_val - prev;
> } while (local_cmpxchg(&prev_count, prev, cur_val) != prev);
> count += delta;


I may need to update the comment.

rc_count stores the total bytes for RMIDs that were used for this
event except for the count of current RMID.
Say an event used RMID(1) .. RMID(k) from init to read and it had
RMID(k) when read was called, the rc_count stores the values read
from RMID1 .. RMID(k-1).

For MBM the patch is trying to do:
count
= total_bytes of RMID(1)
+ ... +total_bytes of RMID(k-1) + total_bytes of RMID(k))
= rc_count + total_bytes of RMID(k).

1. event1 init. rc_count = 0. event1 gets RMID1.
2. event1 loses RMID1 due to recycling. Current total_bytes for RMID1 is 50MB.
3. rc_count += 50MB.
4. event1 gets RMID2. total_bytes for RMID2 is set to zero.. basically do the
prev_count = read_hw_counter()..
5. event1 loses RMID2 due to recycling. Current total_bytes for RMID2 30MB.
6. rc_count += 30MB.
7. event1 gets RMID3..
8. event1 read is called. total_bytes is 10MB (read from RMID3)..
9. count = rc_count(80MB) + 10MB (read from RMID3..)

Thanks,
Vikas

>
>
>

2016-04-25 20:02:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/4] perf/x86/mbm: Store bytes counted for mbm during recycle

On Mon, Apr 25, 2016 at 11:04:38AM -0700, Vikas Shivappa wrote:
> >This is a 'creative' solution; why don't you do the normal thing, which
> >is:
> >
> >start:
> > prev_count = read_hw_counter();
> >
> >read:
> > do {
> > prev = prev_count;
> > cur_val = read_hw_counter();
> > delta = cur_val - prev;
> > } while (local_cmpxchg(&prev_count, prev, cur_val) != prev);
> > count += delta;
>
>
> I may need to update the comment.
>
> rc_count stores the total bytes for RMIDs that were used for this event
> except for the count of current RMID.

Yeah, I got that, eventually.

> Say an event used RMID(1) .. RMID(k) from init to read and it had RMID(k)
> when read was called, the rc_count stores the values read from RMID1 ..
> RMID(k-1).
>
> For MBM the patch is trying to do:
> count
> = total_bytes of RMID(1) + ... +total_bytes of RMID(k-1) + total_bytes of
> RMID(k))
> = rc_count + total_bytes of RMID(k).

How is the regular counting scheme as outlined above not dealing with
this properly?

2016-04-25 20:05:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/4] perf/x86/mbm: Fix mbm counting when RMIDs are reused

On Mon, Apr 25, 2016 at 09:44:53AM -0700, Vikas Shivappa wrote:
>
>
> On Mon, 25 Apr 2016, Peter Zijlstra wrote:
>
> >On Fri, Apr 22, 2016 at 05:27:20PM -0700, Vikas Shivappa wrote:
> >>When multiple instances of perf reuse RMID, then we need to start
> >>counting for each instance rather than reporting the current RMID count.
> >>This patch adds a st_count(start count) per event to track the same.
> >
> >what?
> >
>
> Will fix the comit log :
>
> When multiple instances of perf reuse RMID for the same PID, then we need to
> start counting from zero for each new event, rather than reporting the
> current RMID. This patch adds a st_count(start count) per event to track the
> same.
>
> For ex:
> 1.RMID1's total_bytes is 100MB for event1(PID1)
> 2.another perf instance starts measuring the same PID1 with event2. We reuse
> RMID1 as the PID1 is already counted.
> 3.event2 stores st_count as 100MB.
> 4.After some time, when user wants to count event2 and say RMID1's current
> total_bytes 110MB, we report 110MB - 100MB = 10MB

This is naturally handled by the scheme I outlined in the other patch.

2016-04-25 20:08:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4/4] perf/x86/cqm: Support cqm/mbm only for perf events

On Mon, Apr 25, 2016 at 04:23:58PM +0000, Luck, Tony wrote:
> >> Hence removing support for the parts which are broken rather than
> >> pretending to support it and giving incorrect data.
> >
> > Uh what, how about attempt to fix it?
>
> No hope to do that by 4.6 release ... so I suggested to Vikas that it would be better to disable
> the feature now so users wouldn't be confused by the random numbers that they'll see if
> they try to do this.

Of course, that might have been useful information to have in the
changelog to begin with.

But seeing that its been broken for a number of releases I don't see how
this is urgent to stuff in this one.

I'd instead be more inclined to hurry with Stephanes patches..

2016-04-25 21:12:26

by Shivappa Vikas

[permalink] [raw]
Subject: Re: [PATCH 2/4] perf/x86/mbm: Store bytes counted for mbm during recycle



On Mon, 25 Apr 2016, Peter Zijlstra wrote:

> On Mon, Apr 25, 2016 at 11:04:38AM -0700, Vikas Shivappa wrote:
>>> This is a 'creative' solution; why don't you do the normal thing, which
>>> is:
>>>
>>> start:
>>> prev_count = read_hw_counter();
>>>
>>> read:
>>> do {
>>> prev = prev_count;
>>> cur_val = read_hw_counter();
>>> delta = cur_val - prev;
>>> } while (local_cmpxchg(&prev_count, prev, cur_val) != prev);
>>> count += delta;
>>
>>
>> I may need to update the comment.
>>
>> rc_count stores the total bytes for RMIDs that were used for this event
>> except for the count of current RMID.
>
> Yeah, I got that, eventually.
>
>> Say an event used RMID(1) .. RMID(k) from init to read and it had RMID(k)
>> when read was called, the rc_count stores the values read from RMID1 ..
>> RMID(k-1).
>>
>> For MBM the patch is trying to do:
>> count
>> = total_bytes of RMID(1) + ... +total_bytes of RMID(k-1) + total_bytes of
>> RMID(k))
>> = rc_count + total_bytes of RMID(k).
>
> How is the regular counting scheme as outlined above not dealing with
> this properly?
>
>

By regular if you mean the current upstream code
local64_set(&event->count, atomic64_read(&rr.value));

then note that the rr.value is just the current RMIDs total_bytes, then we loose
the old values. So if RMID(1) counted 100MB , then RMID(2) counted 10MB and
there was a read(which is actually count call for cqm) call after RMID(2) then
it returns 10MB and not 100MB which is the real total_bytes..

if you mean the below -

>
> start:
> prev_count = read_hw_counter();

I am assuming this means we keep the prev_count when event is initialized. This
is done in the mbm_init which calls update_sample with first parameter set to
true..



>
> read:
> do {
> prev = prev_count;
> cur_val = read_hw_counter();
> delta = cur_val - prev;
> } while (local_cmpxchg(&prev_count, prev, cur_val) != prev);
> count += delta;

the update_sample does the work to compute the delta and add the delta to
total_bytes.. it has all the code except for the while loop.

So we miss the counter values of RMIDs which may
be used by somebody else now.. they are stored during recycling just
before we loose the RMID.
If you are tyring to count multiple RMIDs(that were used by the event) with the
while loop(?) thats something we do but its done in the xchng when we loose the
RMID.. as those counts are probably not there in those respective RMIDs anymore.

2016-04-25 21:44:08

by Shivappa Vikas

[permalink] [raw]
Subject: Re: [PATCH 3/4] perf/x86/mbm: Fix mbm counting when RMIDs are reused



On Mon, 25 Apr 2016, Peter Zijlstra wrote:

> On Mon, Apr 25, 2016 at 09:44:53AM -0700, Vikas Shivappa wrote:
>>
>>
>> On Mon, 25 Apr 2016, Peter Zijlstra wrote:
>>
>>> On Fri, Apr 22, 2016 at 05:27:20PM -0700, Vikas Shivappa wrote:
>>>> When multiple instances of perf reuse RMID, then we need to start
>>>> counting for each instance rather than reporting the current RMID count.
>>>> This patch adds a st_count(start count) per event to track the same.
>>>
>>> what?
>>>
>>
>> Will fix the comit log :
>>
>> When multiple instances of perf reuse RMID for the same PID, then we need to
>> start counting from zero for each new event, rather than reporting the
>> current RMID. This patch adds a st_count(start count) per event to track the
>> same.
>>
>> For ex:
>> 1.RMID1's total_bytes is 100MB for event1(PID1)
>> 2.another perf instance starts measuring the same PID1 with event2. We reuse
>> RMID1 as the PID1 is already counted.
>> 3.event2 stores st_count as 100MB.
>> 4.After some time, when user wants to count event2 and say RMID1's current
>> total_bytes 110MB, we report 110MB - 100MB = 10MB
>
> This is naturally handled by the scheme I outlined in the other patch.

Something similar although there is one per rmid and one per event..

u64 read_sample(rmid...) // for each rmid
{
...

start:
'per rmid' prev = read_hw_counter();

count:
cur_count = read_hw_counter();
delta = cur_count - prev;
prev = cur_count;
total_bytes += delta;

return total_bytes;
}

when we lose the rmid -

xchng(event, rmid=-1)
{
..

'per event' rc_count = read_sample(event->rmid) - per event start count;
'per event' start_count = 0;

}


for each event -

start:
if rmid is reused
'per event' prev = read_sample(rmid);
else
prev = 0;

count: // we use count instead of read
count = read_sample(rmid) + rc_count - prev;


>

2016-04-25 21:49:48

by Shivappa Vikas

[permalink] [raw]
Subject: Re: [PATCH 3/4] perf/x86/mbm: Fix mbm counting when RMIDs are reused



>
> 'per event' rc_count = read_sample(event->rmid) - per event start count;

'per event' rc_count += read_sample(event->rmid) - per event start count;