2015-08-24 14:20:45

by Alexey Brodkin

[permalink] [raw]
Subject: [PATCH v3 0/6] ARCv2 port to Linux - (C) perf

Hi Peter,

This mini-series adds perf support for ARCv2 based cores, which brings in
overflow interupts and SMP. Additionally now raw events are supported as well.

Please review !

Compared to v2 this series has:
[1] Removed patch with raw-events support.
It needs some rework and let's then discuss it separately.
Still I plan to send it shortly.
[2] Merged "set usable max period as a half of real max period" into
"implement "event_set_period".
[3] Fixed arc_pmu_event_set_period() in regard of incorrect
"hwc->period_left" setup.
[4] Moved interrupts enabling from arc_pmu_add() to arc_pmu_start()

Compared to v1 this series has:
[1] Addressed review comments
[2] More verbose commit messages and comments in sources
[3] Minor cosmetics

Thanks,
Alexey

Alexey Brodkin (4):
ARCv2: perf: implement "event_set_period"
ARCv2: perf: Support sampling events using overflow interrupts
ARCv2: perf: implement exclusion of event counting in user or kernel
mode
ARCv2: perf: SMP support

Vineet Gupta (2):
ARC: perf: cap the number of counters to hardware max of 32
ARCv2: perf: Finally introduce HS perf unit

.../devicetree/bindings/arc/archs-pct.txt | 17 ++
MAINTAINERS | 2 +-
arch/arc/include/asm/perf_event.h | 21 +-
arch/arc/kernel/perf_event.c | 271 ++++++++++++++++++---
4 files changed, 275 insertions(+), 36 deletions(-)
create mode 100644 Documentation/devicetree/bindings/arc/archs-pct.txt

--
2.4.3


2015-08-24 14:20:46

by Alexey Brodkin

[permalink] [raw]
Subject: [PATCH v3 1/6] ARC: perf: cap the number of counters to hardware max of 32

From: Vineet Gupta <[email protected]>

The number of counters in PCT can never be more than 32 (while
countable conditions could be 100+) for both ARCompact and ARCv2

And while at it update copyright dates.

Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
Signed-off-by: Alexey Brodkin <[email protected]>
---

Compared to v2:
[1] Updated copyright date in arch/arc/kernel/perf_event.c

No changes since v1.

arch/arc/include/asm/perf_event.h | 5 +++--
arch/arc/kernel/perf_event.c | 6 +++---
2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h
index 2b8880e..e7b16c2 100644
--- a/arch/arc/include/asm/perf_event.h
+++ b/arch/arc/include/asm/perf_event.h
@@ -1,6 +1,7 @@
/*
* Linux performance counter support for ARC
*
+ * Copyright (C) 2014-2015 Synopsys, Inc. (http://www.synopsys.com)
* Copyright (C) 2011-2013 Synopsys, Inc. (http://www.synopsys.com)
*
* This program is free software; you can redistribute it and/or modify
@@ -12,8 +13,8 @@
#ifndef __ASM_PERF_EVENT_H
#define __ASM_PERF_EVENT_H

-/* real maximum varies per CPU, this is the maximum supported by the driver */
-#define ARC_PMU_MAX_HWEVENTS 64
+/* Max number of counters that PCT block may ever have */
+#define ARC_PERF_MAX_COUNTERS 32

#define ARC_REG_CC_BUILD 0xF6
#define ARC_REG_CC_INDEX 0x240
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 1287388..d7ee5b2 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -1,7 +1,7 @@
/*
* Linux performance counter support for ARC700 series
*
- * Copyright (C) 2013 Synopsys, Inc. (http://www.synopsys.com)
+ * Copyright (C) 2013-2015 Synopsys, Inc. (http://www.synopsys.com)
*
* This code is inspired by the perf support of various other architectures.
*
@@ -22,7 +22,7 @@ struct arc_pmu {
struct pmu pmu;
int counter_size; /* in bits */
int n_counters;
- unsigned long used_mask[BITS_TO_LONGS(ARC_PMU_MAX_HWEVENTS)];
+ unsigned long used_mask[BITS_TO_LONGS(ARC_PERF_MAX_COUNTERS)];
int ev_hw_idx[PERF_COUNT_ARC_HW_MAX];
};

@@ -284,7 +284,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
pr_err("This core does not have performance counters!\n");
return -ENODEV;
}
- BUG_ON(pct_bcr.c > ARC_PMU_MAX_HWEVENTS);
+ BUG_ON(pct_bcr.c > ARC_PERF_MAX_COUNTERS);

READ_BCR(ARC_REG_CC_BUILD, cc_bcr);
BUG_ON(!cc_bcr.v); /* Counters exist but No countable conditions ? */
--
2.4.3

2015-08-24 14:22:04

by Alexey Brodkin

[permalink] [raw]
Subject: [PATCH v3 2/6] ARCv2: perf: implement "event_set_period"

This generalization prepares for support of overflow interrupts.

Hardware event counters on ARC work that way:
Each counter counts from programmed start value (set in
ARC_REG_PCT_COUNT) to a limit value (set in ARC_REG_PCT_INT_CNT) and
once limit value is reached this timer generates an interrupt.

Even though this hardware implementation allows for more flexibility,
in Linux kernel we decided to mimic behavior of other architectures
this way:

[1] Set limit value as half of counter's max value (to allow counter to
run after reaching it limit, see below for more explanation):
---------->8-----------
arc_pmu->max_period = (1ULL << counter_size) / 2 - 1ULL;
---------->8-----------

[2] Set start value as "arc_pmu->max_period - sample_period" and then
count up to the limit

Our event counters don't stop on reaching max value (the one we set in
ARC_REG_PCT_INT_CNT) but continue to count until kernel explicitly
stops each of them.

And setting a limit as half of counter capacity is done to allow
capturing of additional events in between moment when interrupt was
triggered until we're actually processing PMU interrupts. That way
we're trying to be more precise.

For example if we count CPU cycles we keep track of cycles while
running through generic IRQ handling code:

[1] We set counter period as say 100_000 events of type "crun"
[2] Counter reaches that limit and raises its interrupt
[3] Once we get in PMU IRQ handler we read current counter value from
ARC_REG_PCT_SNAP ans see there something like 105_000.

If counters stop on reaching a limit value then we would miss
additional 5000 cycles.

Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
Signed-off-by: Alexey Brodkin <[email protected]>
---

Compared to v2:
[1] "ARCv2: perf: set usable max period as a half of real max period"
was merged in this one so we may have complete and valid commit message
that covers basics of ARC PCTs.
[2] Fixed arc_pmu_event_set_period() in regard of incorrect
"hwc->period_left" setup.

Compared to v1:
[1] Added verbose commit message with explanation of how PCT HW works on ARC
[2] Simplified arc_perf_event_update()
[3] Removed check for is_sampling_event() because we already set
PERF_PMU_CAP_NO_INTERRUPT in probe()
[4] Minor cosmetics

arch/arc/kernel/perf_event.c | 79 +++++++++++++++++++++++++++++++++++---------
1 file changed, 63 insertions(+), 16 deletions(-)

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index d7ee5b2..db53af7 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -20,9 +20,9 @@

struct arc_pmu {
struct pmu pmu;
- int counter_size; /* in bits */
int n_counters;
unsigned long used_mask[BITS_TO_LONGS(ARC_PERF_MAX_COUNTERS)];
+ u64 max_period;
int ev_hw_idx[PERF_COUNT_ARC_HW_MAX];
};

@@ -88,18 +88,15 @@ static uint64_t arc_pmu_read_counter(int idx)
static void arc_perf_event_update(struct perf_event *event,
struct hw_perf_event *hwc, int idx)
{
- uint64_t prev_raw_count, new_raw_count;
- int64_t delta;
-
- do {
- prev_raw_count = local64_read(&hwc->prev_count);
- new_raw_count = arc_pmu_read_counter(idx);
- } while (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
- new_raw_count) != prev_raw_count);
-
- delta = (new_raw_count - prev_raw_count) &
- ((1ULL << arc_pmu->counter_size) - 1ULL);
+ uint64_t prev_raw_count = local64_read(&hwc->prev_count);
+ uint64_t new_raw_count = arc_pmu_read_counter(idx);
+ int64_t delta = new_raw_count - prev_raw_count;

+ /*
+ * We don't afaraid of hwc->prev_count changing beneath our feet
+ * because there's no way for us to re-enter this function anytime.
+ */
+ local64_set(&hwc->prev_count, new_raw_count);
local64_add(delta, &event->count);
local64_sub(delta, &hwc->period_left);
}
@@ -142,6 +139,10 @@ static int arc_pmu_event_init(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
int ret;

+ hwc->sample_period = arc_pmu->max_period;
+ hwc->last_period = hwc->sample_period;
+ local64_set(&hwc->period_left, hwc->sample_period);
+
switch (event->attr.type) {
case PERF_TYPE_HARDWARE:
if (event->attr.config >= PERF_COUNT_HW_MAX)
@@ -153,6 +154,7 @@ static int arc_pmu_event_init(struct perf_event *event)
(int) event->attr.config, (int) hwc->config,
arc_pmu_ev_hw_map[event->attr.config]);
return 0;
+
case PERF_TYPE_HW_CACHE:
ret = arc_pmu_cache_event(event->attr.config);
if (ret < 0)
@@ -180,6 +182,47 @@ static void arc_pmu_disable(struct pmu *pmu)
write_aux_reg(ARC_REG_PCT_CONTROL, (tmp & 0xffff0000) | 0x0);
}

+static int arc_pmu_event_set_period(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ s64 left = local64_read(&hwc->period_left);
+ s64 period = hwc->sample_period;
+ int idx = hwc->idx;
+ int overflow = 0;
+ u64 value;
+
+ if (unlikely(left <= -period)) {
+ /* left underflowed by more than period. */
+ left = period;
+ local64_set(&hwc->period_left, left);
+ hwc->last_period = period;
+ overflow = 1;
+ } else if (unlikely(left <= 0)) {
+ /* left underflowed by less than period. */
+ left += period;
+ local64_set(&hwc->period_left, left);
+ hwc->last_period = period;
+ overflow = 1;
+ }
+
+ if (left > arc_pmu->max_period)
+ left = arc_pmu->max_period;
+
+ value = arc_pmu->max_period - left;
+ local64_set(&hwc->prev_count, value);
+
+ /* Select counter */
+ write_aux_reg(ARC_REG_PCT_INDEX, idx);
+
+ /* Write value */
+ write_aux_reg(ARC_REG_PCT_COUNTL, (u32)value);
+ write_aux_reg(ARC_REG_PCT_COUNTH, (value >> 32));
+
+ perf_event_update_userpage(event);
+
+ return overflow;
+}
+
/*
* Assigns hardware counter to hardware condition.
* Note that there is no separate start/stop mechanism;
@@ -194,9 +237,11 @@ static void arc_pmu_start(struct perf_event *event, int flags)
return;

if (flags & PERF_EF_RELOAD)
- WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+ WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
+
+ hwc->state = 0;

- event->hw.state = 0;
+ arc_pmu_event_set_period(event);

/* enable ARC pmu here */
write_aux_reg(ARC_REG_PCT_INDEX, idx);
@@ -269,6 +314,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
struct arc_reg_pct_build pct_bcr;
struct arc_reg_cc_build cc_bcr;
int i, j;
+ int counter_size; /* in bits */

union cc_name {
struct {
@@ -294,10 +340,11 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
return -ENOMEM;

arc_pmu->n_counters = pct_bcr.c;
- arc_pmu->counter_size = 32 + (pct_bcr.s << 4);
+ counter_size = 32 + (pct_bcr.s << 4);
+ arc_pmu->max_period = (1ULL << counter_size) / 2 - 1ULL;

pr_info("ARC perf\t: %d counters (%d bits), %d countable conditions\n",
- arc_pmu->n_counters, arc_pmu->counter_size, cc_bcr.c);
+ arc_pmu->n_counters, counter_size, cc_bcr.c);

cc_name.str[8] = 0;
for (i = 0; i < PERF_COUNT_ARC_HW_MAX; i++)
--
2.4.3

2015-08-24 14:20:50

by Alexey Brodkin

[permalink] [raw]
Subject: [PATCH v3 3/6] ARCv2: perf: Support sampling events using overflow interrupts

In times of ARC 700 performance counters didn't have support of
interrupt an so for ARC we only had support of non-sampling events.

Put simply only "perf stat" was functional.

Now with ARC HS we have support of interrupts in performance counters
which this change introduces support of.

ARC performance counters act in the following way in regard of
interrupts generation.
[1] A counter counts starting from value set in PCT_COUNT register pair
[2] Once counter reaches value set in PCT_INT_CNT interrupt is raised

Basic setup look like this:
[1] PCT_COUNT = 0;
[2] PCT_INT_CNT = __limit_value__;
[3] Enable interrupts for that counter and let it run
[4] Let counter reach its limit
[5] Handle interrupt when it happens

Note that PCT HW block is build in CPU core and so ints interrupt
line (which is basically OR of all counters IRQs) is wired directly to
top-level IRQC. That means do de-assert PCT interrupt it's required to
reset IRQs from all counters that have reached their limit values.

Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Vineet Gupta <[email protected]>
Signed-off-by: Alexey Brodkin <[email protected]>
---

Compared to v2:
[1] Moved interrupts enabling from arc_pmu_add() to arc_pmu_start()

Compared to v1:
[1] Added commit message
[2] Removed check for is_sampling_event() because we already set
PERF_PMU_CAP_NO_INTERRUPT in probe()
[3] Minor cosmetics

arch/arc/include/asm/perf_event.h | 8 ++-
arch/arc/kernel/perf_event.c | 128 +++++++++++++++++++++++++++++++++++---
2 files changed, 126 insertions(+), 10 deletions(-)

diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h
index e7b16c2..9ed593e 100644
--- a/arch/arc/include/asm/perf_event.h
+++ b/arch/arc/include/asm/perf_event.h
@@ -29,15 +29,19 @@
#define ARC_REG_PCT_CONFIG 0x254
#define ARC_REG_PCT_CONTROL 0x255
#define ARC_REG_PCT_INDEX 0x256
+#define ARC_REG_PCT_INT_CNTL 0x25C
+#define ARC_REG_PCT_INT_CNTH 0x25D
+#define ARC_REG_PCT_INT_CTRL 0x25E
+#define ARC_REG_PCT_INT_ACT 0x25F

#define ARC_REG_PCT_CONTROL_CC (1 << 16) /* clear counts */
#define ARC_REG_PCT_CONTROL_SN (1 << 17) /* snapshot */

struct arc_reg_pct_build {
#ifdef CONFIG_CPU_BIG_ENDIAN
- unsigned int m:8, c:8, r:6, s:2, v:8;
+ unsigned int m:8, c:8, r:5, i:1, s:2, v:8;
#else
- unsigned int v:8, s:2, r:6, c:8, m:8;
+ unsigned int v:8, s:2, i:1, r:5, c:8, m:8;
#endif
};

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index db53af7..ce0fa60 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -11,6 +11,7 @@
*
*/
#include <linux/errno.h>
+#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/of.h>
#include <linux/perf_event.h>
@@ -24,6 +25,7 @@ struct arc_pmu {
unsigned long used_mask[BITS_TO_LONGS(ARC_PERF_MAX_COUNTERS)];
u64 max_period;
int ev_hw_idx[PERF_COUNT_ARC_HW_MAX];
+ struct perf_event *act_counter[ARC_PERF_MAX_COUNTERS];
};

struct arc_callchain_trace {
@@ -139,9 +141,11 @@ static int arc_pmu_event_init(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
int ret;

- hwc->sample_period = arc_pmu->max_period;
- hwc->last_period = hwc->sample_period;
- local64_set(&hwc->period_left, hwc->sample_period);
+ if (!is_sampling_event(event)) {
+ hwc->sample_period = arc_pmu->max_period;
+ hwc->last_period = hwc->sample_period;
+ local64_set(&hwc->period_left, hwc->sample_period);
+ }

switch (event->attr.type) {
case PERF_TYPE_HARDWARE:
@@ -243,6 +247,11 @@ static void arc_pmu_start(struct perf_event *event, int flags)

arc_pmu_event_set_period(event);

+ /* Enable interrupt for this counter */
+ if (is_sampling_event(event))
+ write_aux_reg(ARC_REG_PCT_INT_CTRL,
+ read_aux_reg(ARC_REG_PCT_INT_CTRL) | (1 << idx));
+
/* enable ARC pmu here */
write_aux_reg(ARC_REG_PCT_INDEX, idx);
write_aux_reg(ARC_REG_PCT_CONFIG, hwc->config);
@@ -253,6 +262,17 @@ static void arc_pmu_stop(struct perf_event *event, int flags)
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;

+ /* Disable interrupt for this counter */
+ if (is_sampling_event(event)) {
+ /*
+ * Reset interrupt flag by writing of 1. This is required
+ * to make sure pending interrupt was not left.
+ */
+ write_aux_reg(ARC_REG_PCT_INT_ACT, 1 << idx);
+ write_aux_reg(ARC_REG_PCT_INT_CTRL,
+ read_aux_reg(ARC_REG_PCT_INT_CTRL) & ~(1 << idx));
+ }
+
if (!(event->hw.state & PERF_HES_STOPPED)) {
/* stop ARC pmu here */
write_aux_reg(ARC_REG_PCT_INDEX, idx);
@@ -275,6 +295,8 @@ static void arc_pmu_del(struct perf_event *event, int flags)
arc_pmu_stop(event, PERF_EF_UPDATE);
__clear_bit(event->hw.idx, arc_pmu->used_mask);

+ arc_pmu->act_counter[event->hw.idx] = 0;
+
perf_event_update_userpage(event);
}

@@ -295,6 +317,16 @@ static int arc_pmu_add(struct perf_event *event, int flags)
}

write_aux_reg(ARC_REG_PCT_INDEX, idx);
+
+ arc_pmu->act_counter[idx] = event;
+
+ if (is_sampling_event(event)) {
+ /* Mimic full counter overflow as other arches do */
+ write_aux_reg(ARC_REG_PCT_INT_CNTL, (u32)arc_pmu->max_period);
+ write_aux_reg(ARC_REG_PCT_INT_CNTH,
+ (arc_pmu->max_period >> 32));
+ }
+
write_aux_reg(ARC_REG_PCT_CONFIG, 0);
write_aux_reg(ARC_REG_PCT_COUNTL, 0);
write_aux_reg(ARC_REG_PCT_COUNTH, 0);
@@ -309,11 +341,70 @@ static int arc_pmu_add(struct perf_event *event, int flags)
return 0;
}

+#ifdef CONFIG_ISA_ARCV2
+static irqreturn_t arc_pmu_intr(int irq, void *dev)
+{
+ struct perf_sample_data data;
+ struct arc_pmu *arc_pmu = (struct arc_pmu *)dev;
+ struct pt_regs *regs;
+ int active_ints;
+ int idx;
+
+ arc_pmu_disable(&arc_pmu->pmu);
+
+ active_ints = read_aux_reg(ARC_REG_PCT_INT_ACT);
+
+ regs = get_irq_regs();
+
+ for (idx = 0; idx < arc_pmu->n_counters; idx++) {
+ struct perf_event *event = arc_pmu->act_counter[idx];
+ struct hw_perf_event *hwc;
+
+ if (!(active_ints & (1 << idx)))
+ continue;
+
+ /* Reset interrupt flag by writing of 1 */
+ write_aux_reg(ARC_REG_PCT_INT_ACT, 1 << idx);
+
+ /*
+ * On reset of "interrupt active" bit corresponding
+ * "interrupt enable" bit gets automatically reset as well.
+ * Now we need to re-enable interrupt for the counter.
+ */
+ write_aux_reg(ARC_REG_PCT_INT_CTRL,
+ read_aux_reg(ARC_REG_PCT_INT_CTRL) | (1 << idx));
+
+ hwc = &event->hw;
+
+ WARN_ON_ONCE(hwc->idx != idx);
+
+ arc_perf_event_update(event, &event->hw, event->hw.idx);
+ perf_sample_data_init(&data, 0, hwc->last_period);
+ if (!arc_pmu_event_set_period(event))
+ continue;
+
+ if (perf_event_overflow(event, &data, regs))
+ arc_pmu_stop(event, 0);
+ }
+
+ arc_pmu_enable(&arc_pmu->pmu);
+
+ return IRQ_HANDLED;
+}
+#else
+
+static irqreturn_t arc_pmu_intr(int irq, void *dev)
+{
+ return IRQ_NONE;
+}
+
+#endif /* CONFIG_ISA_ARCV2 */
+
static int arc_pmu_device_probe(struct platform_device *pdev)
{
struct arc_reg_pct_build pct_bcr;
struct arc_reg_cc_build cc_bcr;
- int i, j;
+ int i, j, has_interrupts;
int counter_size; /* in bits */

union cc_name {
@@ -339,12 +430,16 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
if (!arc_pmu)
return -ENOMEM;

+ has_interrupts = is_isa_arcv2() ? pct_bcr.i : 0;
+
arc_pmu->n_counters = pct_bcr.c;
counter_size = 32 + (pct_bcr.s << 4);
+
arc_pmu->max_period = (1ULL << counter_size) / 2 - 1ULL;

- pr_info("ARC perf\t: %d counters (%d bits), %d countable conditions\n",
- arc_pmu->n_counters, counter_size, cc_bcr.c);
+ pr_info("ARC perf\t: %d counters (%d bits), %d conditions%s\n",
+ arc_pmu->n_counters, counter_size, cc_bcr.c,
+ has_interrupts ? ", [overflow IRQ support]":"");

cc_name.str[8] = 0;
for (i = 0; i < PERF_COUNT_ARC_HW_MAX; i++)
@@ -379,8 +474,25 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
.read = arc_pmu_read,
};

- /* ARC 700 PMU does not support sampling events */
- arc_pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
+ if (has_interrupts) {
+ int irq = platform_get_irq(pdev, 0);
+
+ if (irq < 0) {
+ pr_err("Cannot get IRQ number for the platform\n");
+ return -ENODEV;
+ }
+
+ ret = devm_request_irq(&pdev->dev, irq, arc_pmu_intr, 0,
+ "arc-pmu", arc_pmu);
+ if (ret) {
+ pr_err("could not allocate PMU IRQ\n");
+ return ret;
+ }
+
+ /* Clean all pending interrupt flags */
+ write_aux_reg(ARC_REG_PCT_INT_ACT, 0xffffffff);
+ } else
+ arc_pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;

return perf_pmu_register(&arc_pmu->pmu, pdev->name, PERF_TYPE_RAW);
}
--
2.4.3

2015-08-24 14:20:53

by Alexey Brodkin

[permalink] [raw]
Subject: [PATCH v3 4/6] ARCv2: perf: implement exclusion of event counting in user or kernel mode

Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Alexey Brodkin <[email protected]>
---

No changes since v2.

No changes since v1.

arch/arc/include/asm/perf_event.h | 3 +++
arch/arc/kernel/perf_event.c | 16 ++++++++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h
index 9ed593e..876e216 100644
--- a/arch/arc/include/asm/perf_event.h
+++ b/arch/arc/include/asm/perf_event.h
@@ -34,6 +34,9 @@
#define ARC_REG_PCT_INT_CTRL 0x25E
#define ARC_REG_PCT_INT_ACT 0x25F

+#define ARC_REG_PCT_CONFIG_USER (1 << 18) /* count in user mode */
+#define ARC_REG_PCT_CONFIG_KERN (1 << 19) /* count in kernel mode */
+
#define ARC_REG_PCT_CONTROL_CC (1 << 16) /* clear counts */
#define ARC_REG_PCT_CONTROL_SN (1 << 17) /* snapshot */

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index ce0fa60..997ccbd 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -147,13 +147,25 @@ static int arc_pmu_event_init(struct perf_event *event)
local64_set(&hwc->period_left, hwc->sample_period);
}

+ hwc->config = 0;
+
+ if (is_isa_arcv2()) {
+ /* "exclude user" means "count only kernel" */
+ if (event->attr.exclude_user)
+ hwc->config |= ARC_REG_PCT_CONFIG_KERN;
+
+ /* "exclude kernel" means "count only user" */
+ if (event->attr.exclude_kernel)
+ hwc->config |= ARC_REG_PCT_CONFIG_USER;
+ }
+
switch (event->attr.type) {
case PERF_TYPE_HARDWARE:
if (event->attr.config >= PERF_COUNT_HW_MAX)
return -ENOENT;
if (arc_pmu->ev_hw_idx[event->attr.config] < 0)
return -ENOENT;
- hwc->config = arc_pmu->ev_hw_idx[event->attr.config];
+ hwc->config |= arc_pmu->ev_hw_idx[event->attr.config];
pr_debug("init event %d with h/w %d \'%s\'\n",
(int) event->attr.config, (int) hwc->config,
arc_pmu_ev_hw_map[event->attr.config]);
@@ -163,7 +175,7 @@ static int arc_pmu_event_init(struct perf_event *event)
ret = arc_pmu_cache_event(event->attr.config);
if (ret < 0)
return ret;
- hwc->config = arc_pmu->ev_hw_idx[ret];
+ hwc->config |= arc_pmu->ev_hw_idx[ret];
return 0;
default:
return -ENOENT;
--
2.4.3

2015-08-24 14:20:56

by Alexey Brodkin

[permalink] [raw]
Subject: [PATCH v3 5/6] ARCv2: perf: SMP support

* split off pmu info into singleton and per-cpu bits
* setup PMU on all cores

Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Alexey Brodkin <[email protected]>
---

No changes since v2.

Compared to v1:
[1] Rebase on top of previos patches hence changes in patch itself
[2] Cosmetics

arch/arc/kernel/perf_event.c | 69 ++++++++++++++++++++++++++++++++++----------
1 file changed, 54 insertions(+), 15 deletions(-)

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 997ccbd..80f5a85 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -21,10 +21,22 @@

struct arc_pmu {
struct pmu pmu;
+ unsigned int irq;
int n_counters;
- unsigned long used_mask[BITS_TO_LONGS(ARC_PERF_MAX_COUNTERS)];
u64 max_period;
int ev_hw_idx[PERF_COUNT_ARC_HW_MAX];
+};
+
+struct arc_pmu_cpu {
+ /*
+ * A 1 bit for an index indicates that the counter is being used for
+ * an event. A 0 means that the counter can be used.
+ */
+ unsigned long used_mask[BITS_TO_LONGS(ARC_PERF_MAX_COUNTERS)];
+
+ /*
+ * The events that are active on the PMU for the given index.
+ */
struct perf_event *act_counter[ARC_PERF_MAX_COUNTERS];
};

@@ -67,6 +79,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
}

static struct arc_pmu *arc_pmu;
+static DEFINE_PER_CPU(struct arc_pmu_cpu, arc_pmu_cpu);

/* read counter #idx; note that counter# != event# on ARC! */
static uint64_t arc_pmu_read_counter(int idx)
@@ -304,10 +317,12 @@ static void arc_pmu_stop(struct perf_event *event, int flags)

static void arc_pmu_del(struct perf_event *event, int flags)
{
+ struct arc_pmu_cpu *pmu_cpu = this_cpu_ptr(&arc_pmu_cpu);
+
arc_pmu_stop(event, PERF_EF_UPDATE);
- __clear_bit(event->hw.idx, arc_pmu->used_mask);
+ __clear_bit(event->hw.idx, pmu_cpu->used_mask);

- arc_pmu->act_counter[event->hw.idx] = 0;
+ pmu_cpu->act_counter[event->hw.idx] = 0;

perf_event_update_userpage(event);
}
@@ -315,22 +330,23 @@ static void arc_pmu_del(struct perf_event *event, int flags)
/* allocate hardware counter and optionally start counting */
static int arc_pmu_add(struct perf_event *event, int flags)
{
+ struct arc_pmu_cpu *pmu_cpu = this_cpu_ptr(&arc_pmu_cpu);
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;

- if (__test_and_set_bit(idx, arc_pmu->used_mask)) {
- idx = find_first_zero_bit(arc_pmu->used_mask,
+ if (__test_and_set_bit(idx, pmu_cpu->used_mask)) {
+ idx = find_first_zero_bit(pmu_cpu->used_mask,
arc_pmu->n_counters);
if (idx == arc_pmu->n_counters)
return -EAGAIN;

- __set_bit(idx, arc_pmu->used_mask);
+ __set_bit(idx, pmu_cpu->used_mask);
hwc->idx = idx;
}

write_aux_reg(ARC_REG_PCT_INDEX, idx);

- arc_pmu->act_counter[idx] = event;
+ pmu_cpu->act_counter[idx] = event;

if (is_sampling_event(event)) {
/* Mimic full counter overflow as other arches do */
@@ -357,7 +373,7 @@ static int arc_pmu_add(struct perf_event *event, int flags)
static irqreturn_t arc_pmu_intr(int irq, void *dev)
{
struct perf_sample_data data;
- struct arc_pmu *arc_pmu = (struct arc_pmu *)dev;
+ struct arc_pmu_cpu *pmu_cpu = this_cpu_ptr(&arc_pmu_cpu);
struct pt_regs *regs;
int active_ints;
int idx;
@@ -369,7 +385,7 @@ static irqreturn_t arc_pmu_intr(int irq, void *dev)
regs = get_irq_regs();

for (idx = 0; idx < arc_pmu->n_counters; idx++) {
- struct perf_event *event = arc_pmu->act_counter[idx];
+ struct perf_event *event = pmu_cpu->act_counter[idx];
struct hw_perf_event *hwc;

if (!(active_ints & (1 << idx)))
@@ -412,6 +428,17 @@ static irqreturn_t arc_pmu_intr(int irq, void *dev)

#endif /* CONFIG_ISA_ARCV2 */

+void arc_cpu_pmu_irq_init(void)
+{
+ struct arc_pmu_cpu *pmu_cpu = this_cpu_ptr(&arc_pmu_cpu);
+
+ arc_request_percpu_irq(arc_pmu->irq, smp_processor_id(), arc_pmu_intr,
+ "ARC perf counters", pmu_cpu);
+
+ /* Clear all pending interrupt flags */
+ write_aux_reg(ARC_REG_PCT_INT_ACT, 0xffffffff);
+}
+
static int arc_pmu_device_probe(struct platform_device *pdev)
{
struct arc_reg_pct_build pct_bcr;
@@ -488,18 +515,30 @@ static int arc_pmu_device_probe(struct platform_device *pdev)

if (has_interrupts) {
int irq = platform_get_irq(pdev, 0);
+ unsigned long flags;

if (irq < 0) {
pr_err("Cannot get IRQ number for the platform\n");
return -ENODEV;
}

- ret = devm_request_irq(&pdev->dev, irq, arc_pmu_intr, 0,
- "arc-pmu", arc_pmu);
- if (ret) {
- pr_err("could not allocate PMU IRQ\n");
- return ret;
- }
+ arc_pmu->irq = irq;
+
+ /*
+ * arc_cpu_pmu_irq_init() needs to be called on all cores for
+ * their respective local PMU.
+ * However we use opencoded on_each_cpu() to ensure it is called
+ * on core0 first, so that arc_request_percpu_irq() sets up
+ * AUTOEN etc. Otherwise enable_percpu_irq() fails to enable
+ * perf IRQ on non master cores.
+ * see arc_request_percpu_irq()
+ */
+ preempt_disable();
+ local_irq_save(flags);
+ arc_cpu_pmu_irq_init();
+ local_irq_restore(flags);
+ smp_call_function((smp_call_func_t)arc_cpu_pmu_irq_init, 0, 1);
+ preempt_enable();

/* Clean all pending interrupt flags */
write_aux_reg(ARC_REG_PCT_INT_ACT, 0xffffffff);
--
2.4.3

2015-08-24 14:21:00

by Alexey Brodkin

[permalink] [raw]
Subject: [PATCH v3 6/6] ARCv2: perf: Finally introduce HS perf unit

From: Vineet Gupta <[email protected]>

With all features in place, the ARC HS pct block can now be effectively
allowed to be probed/used

Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>
Signed-off-by: Alexey Brodkin <[email protected]>
---

No changes since v2.

Compared to v1:
[1] MAINTAINERS file updated to cover new file.

Documentation/devicetree/bindings/arc/archs-pct.txt | 17 +++++++++++++++++
MAINTAINERS | 2 +-
arch/arc/include/asm/perf_event.h | 5 ++++-
arch/arc/kernel/perf_event.c | 3 ++-
4 files changed, 24 insertions(+), 3 deletions(-)
create mode 100644 Documentation/devicetree/bindings/arc/archs-pct.txt

diff --git a/Documentation/devicetree/bindings/arc/archs-pct.txt b/Documentation/devicetree/bindings/arc/archs-pct.txt
new file mode 100644
index 0000000..1ae98b8
--- /dev/null
+++ b/Documentation/devicetree/bindings/arc/archs-pct.txt
@@ -0,0 +1,17 @@
+* ARC HS Performance Counters
+
+The ARC HS can be configured with a pipeline performance monitor for counting
+CPU and cache events like cache misses and hits. Like conventional PCT there
+are 100+ hardware conditions dynamically mapped to upto 32 counters.
+It also supports overflow interrupts.
+
+Required properties:
+
+- compatible : should contain
+ "snps,archs-pct"
+
+Example:
+
+pmu {
+ compatible = "snps,archs-pct";
+};
diff --git a/MAINTAINERS b/MAINTAINERS
index 569568f..3ab63b6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9883,7 +9883,7 @@ SYNOPSYS ARC ARCHITECTURE
M: Vineet Gupta <[email protected]>
S: Supported
F: arch/arc/
-F: Documentation/devicetree/bindings/arc/
+F: Documentation/devicetree/bindings/arc/*
F: drivers/tty/serial/arc_uart.c

SYNOPSYS ARC SDP platform support
diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h
index 876e216..848de94 100644
--- a/arch/arc/include/asm/perf_event.h
+++ b/arch/arc/include/asm/perf_event.h
@@ -105,8 +105,11 @@ static const char * const arc_pmu_ev_hw_map[] = {
[PERF_COUNT_HW_INSTRUCTIONS] = "iall",
[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = "ijmp",
[PERF_COUNT_ARC_BPOK] = "bpok", /* NP-NT, PT-T, PNT-NT */
+#ifdef CONFIG_ISA_ARCV2
+ [PERF_COUNT_HW_BRANCH_MISSES] = "bpmp",
+#else
[PERF_COUNT_HW_BRANCH_MISSES] = "bpfail", /* NP-T, PT-NT, PNT-T */
-
+#endif
[PERF_COUNT_ARC_LDC] = "imemrdc", /* Instr: mem read cached */
[PERF_COUNT_ARC_STC] = "imemwrc", /* Instr: mem write cached */

diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 80f5a85..19d6115 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -551,6 +551,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
#ifdef CONFIG_OF
static const struct of_device_id arc_pmu_match[] = {
{ .compatible = "snps,arc700-pct" },
+ { .compatible = "snps,archs-pct" },
{},
};
MODULE_DEVICE_TABLE(of, arc_pmu_match);
@@ -558,7 +559,7 @@ MODULE_DEVICE_TABLE(of, arc_pmu_match);

static struct platform_driver arc_pmu_driver = {
.driver = {
- .name = "arc700-pct",
+ .name = "arc-pct",
.of_match_table = of_match_ptr(arc_pmu_match),
},
.probe = arc_pmu_device_probe,
--
2.4.3

2015-08-24 14:32:56

by Vineet Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] ARCv2: perf: implement exclusion of event counting in user or kernel mode

On Monday 24 August 2015 07:50 PM, Alexey Brodkin wrote:
> Cc: Peter Zijlstra <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Signed-off-by: Alexey Brodkin <[email protected]>
> ---
>
> No changes since v2.
>
> No changes since v1.
>
> ....
> }
>
> + hwc->config = 0;
> +
> + if (is_isa_arcv2()) {
> + /* "exclude user" means "count only kernel" */
> + if (event->attr.exclude_user)
> + hwc->config |= ARC_REG_PCT_CONFIG_KERN;
> +
> + /* "exclude kernel" means "count only user" */
> + if (event->attr.exclude_kernel)
> + hwc->config |= ARC_REG_PCT_CONFIG_USER;
> + }
> +
> switch (event->attr.type) {
> case PERF_TYPE_HARDWARE:
> if (event->attr.config >= PERF_COUNT_HW_MAX)
> return -ENOENT;
> if (arc_pmu->ev_hw_idx[event->attr.config] < 0)
> return -ENOENT;
> - hwc->config = arc_pmu->ev_hw_idx[event->attr.config];
> + hwc->config |= arc_pmu->ev_hw_idx[event->attr.config];

With raw events patch dropped - this hunk need not be present.

> pr_debug("init event %d with h/w %d \'%s\'\n",
> (int) event->attr.config, (int) hwc->config,
> arc_pmu_ev_hw_map[event->attr.config]);
> @@ -163,7 +175,7 @@ static int arc_pmu_event_init(struct perf_event *event)
> ret = arc_pmu_cache_event(event->attr.config);
> if (ret < 0)
> return ret;
> - hwc->config = arc_pmu->ev_hw_idx[ret];
> + hwc->config |= arc_pmu->ev_hw_idx[ret];
> return 0;
> default:
> return -ENOENT;

2015-08-24 16:38:32

by Vineet Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] ARCv2: perf: implement exclusion of event counting in user or kernel mode

On Monday 24 August 2015 08:00 PM, Vineet Gupta wrote:
> On Monday 24 August 2015 07:50 PM, Alexey Brodkin wrote:
>> > Cc: Peter Zijlstra <[email protected]>
>> > Cc: Arnaldo Carvalho de Melo <[email protected]>
>> > Signed-off-by: Alexey Brodkin <[email protected]>
>> > ---
>> >
>> > No changes since v2.
>> >
>> > No changes since v1.
>> >
>> > ....
>> > }
>> >
>> > + hwc->config = 0;
>> > +
>> > + if (is_isa_arcv2()) {
>> > + /* "exclude user" means "count only kernel" */
>> > + if (event->attr.exclude_user)
>> > + hwc->config |= ARC_REG_PCT_CONFIG_KERN;
>> > +
>> > + /* "exclude kernel" means "count only user" */
>> > + if (event->attr.exclude_kernel)
>> > + hwc->config |= ARC_REG_PCT_CONFIG_USER;
>> > + }
>> > +
>> > switch (event->attr.type) {
>> > case PERF_TYPE_HARDWARE:
>> > if (event->attr.config >= PERF_COUNT_HW_MAX)
>> > return -ENOENT;
>> > if (arc_pmu->ev_hw_idx[event->attr.config] < 0)
>> > return -ENOENT;
>> > - hwc->config = arc_pmu->ev_hw_idx[event->attr.config];
>> > + hwc->config |= arc_pmu->ev_hw_idx[event->attr.config];
> With raw events patch dropped - this hunk need not be present.

Please ignore this stupid comment - this was written when I was presumably smoking
pot !

>
>> > pr_debug("init event %d with h/w %d \'%s\'\n",
>> > (int) event->attr.config, (int) hwc->config,
>> > arc_pmu_ev_hw_map[event->attr.config]);
>> > @@ -163,7 +175,7 @@ static int arc_pmu_event_init(struct perf_event *event)
>> > ret = arc_pmu_cache_event(event->attr.config);
>> > if (ret < 0)
>> > return ret;
>> > - hwc->config = arc_pmu->ev_hw_idx[ret];
>> > + hwc->config |= arc_pmu->ev_hw_idx[ret];
>> > return 0;
>> > default:
>> > return -ENOENT;

2015-08-26 12:07:13

by Alexey Brodkin

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] ARCv2 port to Linux - (C) perf

Hi Peter,

On Mon, 2015-08-24 at 17:20 +-0300, Alexey Brodkin wrote:
+AD4- Hi Peter,
+AD4-
+AD4- This mini-series adds perf support for ARCv2 based cores, which brings in
+AD4- overflow interupts and SMP. Additionally now raw events are supported as well.
+AD4-
+AD4- Please review +ACE-
+AD4-
+AD4- Compared to v2 this series has:
+AD4- +AFs-1+AF0- Removed patch with raw-events support.
+AD4- It needs some rework and let's then discuss it separately.
+AD4- Still I plan to send it shortly.
+AD4- +AFs-2+AF0- Merged +ACI-set usable max period as a half of real max period+ACI- into
+AD4- +ACI-implement +ACI-event+AF8-set+AF8-period+ACI-.
+AD4- +AFs-3+AF0- Fixed arc+AF8-pmu+AF8-event+AF8-set+AF8-period() in regard of incorrect
+AD4- +ACI-hwc-+AD4-period+AF8-left+ACI- setup.
+AD4- +AFs-4+AF0- Moved interrupts enabling from arc+AF8-pmu+AF8-add() to arc+AF8-pmu+AF8-start()
+AD4-
+AD4- Compared to v1 this series has:
+AD4- +AFs-1+AF0- Addressed review comments
+AD4- +AFs-2+AF0- More verbose commit messages and comments in sources
+AD4- +AFs-3+AF0- Minor cosmetics
+AD4-
+AD4- Thanks,
+AD4- Alexey
+AD4-
+AD4- Alexey Brodkin (4):
+AD4- ARCv2: perf: implement +ACI-event+AF8-set+AF8-period+ACI-
+AD4- ARCv2: perf: Support sampling events using overflow interrupts
+AD4- ARCv2: perf: implement exclusion of event counting in user or kernel
+AD4- mode
+AD4- ARCv2: perf: SMP support
+AD4-
+AD4- Vineet Gupta (2):
+AD4- ARC: perf: cap the number of counters to hardware max of 32
+AD4- ARCv2: perf: Finally introduce HS perf unit
+AD4-
+AD4- .../devicetree/bindings/arc/archs-pct.txt +AHw- 17 +-+-
+AD4- MAINTAINERS +AHw- 2 +--
+AD4- arch/arc/include/asm/perf+AF8-event.h +AHw- 21 +--
+AD4- arch/arc/kernel/perf+AF8-event.c +AHw- 271 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+----
+AD4- 4 files changed, 275 insertions(+-), 36 deletions(-)
+AD4- create mode 100644 Documentation/devicetree/bindings/arc/archs-pct.txt

Any chance for this re-spin to be reviewed sometime soon?

-Alexey-

2015-08-26 13:08:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] ARCv2: perf: Support sampling events using overflow interrupts

On Mon, Aug 24, 2015 at 05:20:20PM +0300, Alexey Brodkin wrote:
> @@ -139,9 +141,11 @@ static int arc_pmu_event_init(struct perf_event *event)
> struct hw_perf_event *hwc = &event->hw;
> int ret;
>
> - hwc->sample_period = arc_pmu->max_period;
> - hwc->last_period = hwc->sample_period;
> - local64_set(&hwc->period_left, hwc->sample_period);
> + if (!is_sampling_event(event)) {
> + hwc->sample_period = arc_pmu->max_period;
> + hwc->last_period = hwc->sample_period;
> + local64_set(&hwc->period_left, hwc->sample_period);
> + }

So here we set a max_period sample period for !sampling events such that
we can properly deal with (short) counter overflow and accumulate into a
64bit value.

> switch (event->attr.type) {
> case PERF_TYPE_HARDWARE:
> @@ -243,6 +247,11 @@ static void arc_pmu_start(struct perf_event *event, int flags)
>
> arc_pmu_event_set_period(event);
>
> + /* Enable interrupt for this counter */
> + if (is_sampling_event(event))
> + write_aux_reg(ARC_REG_PCT_INT_CTRL,
> + read_aux_reg(ARC_REG_PCT_INT_CTRL) | (1 << idx));
> +

Yet here you fail to actually enable the interrupt for the non sampling
events, which makes the above not work.

> /* enable ARC pmu here */
> write_aux_reg(ARC_REG_PCT_INDEX, idx);
> write_aux_reg(ARC_REG_PCT_CONFIG, hwc->config);

2015-08-26 13:12:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] ARCv2: perf: Support sampling events using overflow interrupts

On Mon, Aug 24, 2015 at 05:20:20PM +0300, Alexey Brodkin wrote:
> @@ -295,6 +317,16 @@ static int arc_pmu_add(struct perf_event *event, int flags)
> }
>
> write_aux_reg(ARC_REG_PCT_INDEX, idx);
> +
> + arc_pmu->act_counter[idx] = event;
> +
> + if (is_sampling_event(event)) {
> + /* Mimic full counter overflow as other arches do */
> + write_aux_reg(ARC_REG_PCT_INT_CNTL, (u32)arc_pmu->max_period);
> + write_aux_reg(ARC_REG_PCT_INT_CNTH,
> + (arc_pmu->max_period >> 32));
> + }
> +

pmu::add should call pmu::start when PERF_EF_START, without that it
should not start the counter, only schedule it.

(although currently all pmu::add() calls will have EF_START set)

> write_aux_reg(ARC_REG_PCT_CONFIG, 0);
> write_aux_reg(ARC_REG_PCT_COUNTL, 0);
> write_aux_reg(ARC_REG_PCT_COUNTH, 0);

2015-08-26 13:17:31

by Alexey Brodkin

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] ARCv2: perf: Support sampling events using overflow interrupts

Hi Peter,

On Wed, 2015-08-26 at 15:07 +-0200, Peter Zijlstra wrote:
+AD4- On Mon, Aug 24, 2015 at 05:20:20PM +-0300, Alexey Brodkin wrote:
+AD4- +AD4- +AEAAQA- -139,9 +-141,11 +AEAAQA- static int arc+AF8-pmu+AF8-event+AF8-init(struct perf+AF8-event +ACo-event)
+AD4- +AD4- struct hw+AF8-perf+AF8-event +ACo-hwc +AD0- +ACY-event-+AD4-hw+ADs-
+AD4- +AD4- int ret+ADs-
+AD4- +AD4-
+AD4- +AD4- - hwc-+AD4-sample+AF8-period +AD0- arc+AF8-pmu-+AD4-max+AF8-period+ADs-
+AD4- +AD4- - hwc-+AD4-last+AF8-period +AD0- hwc-+AD4-sample+AF8-period+ADs-
+AD4- +AD4- - local64+AF8-set(+ACY-hwc-+AD4-period+AF8-left, hwc-+AD4-sample+AF8-period)+ADs-
+AD4- +AD4- +- if (+ACE-is+AF8-sampling+AF8-event(event)) +AHs-
+AD4- +AD4- +- hwc-+AD4-sample+AF8-period +AD0- arc+AF8-pmu-+AD4-max+AF8-period+ADs-
+AD4- +AD4- +- hwc-+AD4-last+AF8-period +AD0- hwc-+AD4-sample+AF8-period+ADs-
+AD4- +AD4- +- local64+AF8-set(+ACY-hwc-+AD4-period+AF8-left, hwc-+AD4-sample+AF8-period)+ADs-
+AD4- +AD4- +- +AH0-
+AD4-
+AD4- So here we set a max+AF8-period sample period for +ACE-sampling events such that
+AD4- we can properly deal with (short) counter overflow and accumulate into a
+AD4- 64bit value.
+AD4-
+AD4- +AD4- switch (event-+AD4-attr.type) +AHs-
+AD4- +AD4- case PERF+AF8-TYPE+AF8-HARDWARE:
+AD4- +AD4- +AEAAQA- -243,6 +-247,11 +AEAAQA- static void arc+AF8-pmu+AF8-start(struct perf+AF8-event +ACo-event, int flags)
+AD4- +AD4-
+AD4- +AD4- arc+AF8-pmu+AF8-event+AF8-set+AF8-period(event)+ADs-
+AD4- +AD4-
+AD4- +AD4- +- /+ACo- Enable interrupt for this counter +ACo-/
+AD4- +AD4- +- if (is+AF8-sampling+AF8-event(event))
+AD4- +AD4- +- write+AF8-aux+AF8-reg(ARC+AF8-REG+AF8-PCT+AF8-INT+AF8-CTRL,
+AD4- +AD4- +- read+AF8-aux+AF8-reg(ARC+AF8-REG+AF8-PCT+AF8-INT+AF8-CTRL) +AHw- (1 +ADwAPA- idx))+ADs-
+AD4- +AD4- +-
+AD4-
+AD4- Yet here you fail to actually enable the interrupt for the non sampling
+AD4- events, which makes the above not work.

Indeed we intentionally leave interrupts disabled for non-sampling events.
+AFs-1+AF0- We have quite large counters so we don't expect to overflow normally
+AFs-2+AF0- We may re-use the same code for hardware that lacks support of IRQs in PCT.
See we check if IRQs are available and if not set PERF+AF8-PMU+AF8-CAP+AF8-NO+AF8-INTERRUPT
that will guarantee we won't get sampling event and for non-sampling events
we won't use IRQs.

-Alexey-

2015-08-26 13:21:14

by Alexey Brodkin

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] ARCv2: perf: Support sampling events using overflow interrupts

Hi Peter,

On Wed, 2015-08-26 at 15:12 +-0200, Peter Zijlstra wrote:
+AD4- On Mon, Aug 24, 2015 at 05:20:20PM +-0300, Alexey Brodkin wrote:
+AD4- +AD4- +AEAAQA- -295,6 +-317,16 +AEAAQA- static int arc+AF8-pmu+AF8-add(struct perf+AF8-event +ACo-event, int flags)
+AD4- +AD4- +AH0-
+AD4- +AD4-
+AD4- +AD4- write+AF8-aux+AF8-reg(ARC+AF8-REG+AF8-PCT+AF8-INDEX, idx)+ADs-
+AD4- +AD4- +-
+AD4- +AD4- +- arc+AF8-pmu-+AD4-act+AF8-counter+AFs-idx+AF0- +AD0- event+ADs-
+AD4- +AD4- +-
+AD4- +AD4- +- if (is+AF8-sampling+AF8-event(event)) +AHs-
+AD4- +AD4- +- /+ACo- Mimic full counter overflow as other arches do +ACo-/
+AD4- +AD4- +- write+AF8-aux+AF8-reg(ARC+AF8-REG+AF8-PCT+AF8-INT+AF8-CNTL, (u32)arc+AF8-pmu-+AD4-max+AF8-period)+ADs-
+AD4- +AD4- +- write+AF8-aux+AF8-reg(ARC+AF8-REG+AF8-PCT+AF8-INT+AF8-CNTH,
+AD4- +AD4- +- (arc+AF8-pmu-+AD4-max+AF8-period +AD4APg- 32))+ADs-
+AD4- +AD4- +- +AH0-
+AD4- +AD4- +-
+AD4-
+AD4- pmu::add should call pmu::start when PERF+AF8-EF+AF8-START, without that it
+AD4- should not start the counter, only schedule it.
+AD4-
+AD4- (although currently all pmu::add() calls will have EF+AF8-START set)

And that's what we do, don't we?
-----------------------+AD4-8-----------------------
if (flags +ACY- PERF+AF8-EF+AF8-START)
arc+AF8-pmu+AF8-start(event, PERF+AF8-EF+AF8-RELOAD)+ADs-
-----------------------+AD4-8-----------------------

-Alexey-

2015-08-26 14:25:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] ARCv2: perf: Support sampling events using overflow interrupts

On Wed, Aug 26, 2015 at 03:12:25PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 24, 2015 at 05:20:20PM +0300, Alexey Brodkin wrote:
> > @@ -295,6 +317,16 @@ static int arc_pmu_add(struct perf_event *event, int flags)
> > }
> >
> > write_aux_reg(ARC_REG_PCT_INDEX, idx);
> > +
> > + arc_pmu->act_counter[idx] = event;
> > +
> > + if (is_sampling_event(event)) {
> > + /* Mimic full counter overflow as other arches do */
> > + write_aux_reg(ARC_REG_PCT_INT_CNTL, (u32)arc_pmu->max_period);
> > + write_aux_reg(ARC_REG_PCT_INT_CNTH,
> > + (arc_pmu->max_period >> 32));
> > + }
> > +
>
> pmu::add should call pmu::start when PERF_EF_START, without that it
> should not start the counter, only schedule it.
>
> (although currently all pmu::add() calls will have EF_START set)
>
> > write_aux_reg(ARC_REG_PCT_CONFIG, 0);
> > write_aux_reg(ARC_REG_PCT_COUNTL, 0);
> > write_aux_reg(ARC_REG_PCT_COUNTH, 0);

Does the below clarify things a bit? If there's still some uncertainty
please say what/where and I'll try and expand.



---
include/linux/perf_event.h | 100 +++++++++++++++++++++++++++++++++++++++------
1 file changed, 87 insertions(+), 13 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2027809433b3..8f78a0b7bfe5 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -140,27 +140,60 @@ struct hw_perf_event {
};
#endif
};
+ /*
+ * If the event is a per task event, this will point to the task in
+ * question. See the comment in perf_event_alloc().
+ */
struct task_struct *target;
+
+/*
+ * hw_perf_event::state flags; used to track the PERF_EF_* state.
+ */
+#define PERF_HES_STOPPED 0x01 /* the counter is stopped */
+#define PERF_HES_UPTODATE 0x02 /* event->count up-to-date */
+#define PERF_HES_ARCH 0x04
+
int state;
+
+ /*
+ * The last observed hardware counter value, updated with a
+ * local64_cmpxchg() such that pmu::read() can be called nested.
+ */
local64_t prev_count;
+
+ /*
+ * The period to start the next sample with.
+ */
u64 sample_period;
+
+ /*
+ * The period we started this sample with.
+ */
u64 last_period;
+
+ /*
+ * However much is left of the current period; note that this is
+ * a full 64bit value and allows for generation of periods longer
+ * than hardware might allow.
+ */
local64_t period_left;
+
+ /*
+ * State for throttling the event, see __perf_event_overflow() and
+ * perf_adjust_freq_unthr_context().
+ */
u64 interrupts_seq;
u64 interrupts;

+ /*
+ * State for freq target events, see __perf_event_overflow() and
+ * perf_adjust_freq_unthr_context().
+ */
u64 freq_time_stamp;
u64 freq_count_stamp;
#endif
};

-/*
- * hw_perf_event::state flags
- */
-#define PERF_HES_STOPPED 0x01 /* the counter is stopped */
-#define PERF_HES_UPTODATE 0x02 /* event->count up-to-date */
-#define PERF_HES_ARCH 0x04
-
struct perf_event;

/*
@@ -210,7 +243,19 @@ struct pmu {

/*
* Try and initialize the event for this PMU.
- * Should return -ENOENT when the @event doesn't match this PMU.
+ *
+ * Returns:
+ * -ENOENT -- @event is not for this PMU
+ *
+ * -ENODEV -- @event is for this PMU but PMU not present
+ * -EBUSY -- @event is for this PMU but PMU temporarily unavailable
+ * -EINVAL -- @event is for this PMU but @event is not valid
+ * -EOPNOTSUPP -- @event is for this PMU, @event is valid, but not supported
+ * -EACCESS -- @event is for this PMU, @event is valid, but no privilidges
+ *
+ * 0 -- @event is for this PMU and valid
+ *
+ * Other error return values are allowed.
*/
int (*event_init) (struct perf_event *event);

@@ -221,27 +266,56 @@ struct pmu {
void (*event_mapped) (struct perf_event *event); /*optional*/
void (*event_unmapped) (struct perf_event *event); /*optional*/

+ /*
+ * Flags for ->add()/->del()/ ->start()/->stop(). There are
+ * matching hw_perf_event::state flags.
+ */
#define PERF_EF_START 0x01 /* start the counter when adding */
#define PERF_EF_RELOAD 0x02 /* reload the counter when starting */
#define PERF_EF_UPDATE 0x04 /* update the counter when stopping */

/*
- * Adds/Removes a counter to/from the PMU, can be done inside
- * a transaction, see the ->*_txn() methods.
+ * Adds/Removes a counter to/from the PMU, can be done inside a
+ * transaction, see the ->*_txn() methods.
+ *
+ * The add/del callbacks will reserve all hardware resources required
+ * to service the event, this includes any counter constraint
+ * scheduling etc.
+ *
+ * Called with IRQs disabled and the PMU disabled.
+ *
+ * ->add() called without PERF_EF_START should result in the same state
+ * as ->add() followed by ->stop().
+ *
+ * ->del() must always PERF_EF_UPDATE stop an event. If it calls
+ * ->stop() that must deal with already being stopped without
+ * PERF_EF_UPDATE.
*/
int (*add) (struct perf_event *event, int flags);
void (*del) (struct perf_event *event, int flags);

/*
- * Starts/Stops a counter present on the PMU. The PMI handler
- * should stop the counter when perf_event_overflow() returns
- * !0. ->start() will be used to continue.
+ * Starts/Stops a counter present on the PMU.
+ *
+ * The PMI handler should stop the counter when perf_event_overflow()
+ * returns !0. ->start() will be used to continue.
+ *
+ * Also used to change the sample period.
+ *
+ * ->stop() with PERF_EF_UPDATE will read the counter and update
+ * period/count values like ->read() would.
+ *
+ * ->start() with PERF_EF_RELOAD will reprogram the the counter
+ * value, must be preceded by a ->stop() with PERF_EF_UPDATE.
*/
void (*start) (struct perf_event *event, int flags);
void (*stop) (struct perf_event *event, int flags);

/*
* Updates the counter value of the event.
+ *
+ * For sampling capable PMUs this will also update the software period
+ * hw_perf_event::period_left field.
*/
void (*read) (struct perf_event *event);

2015-08-26 14:32:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] ARCv2: perf: Support sampling events using overflow interrupts

On Wed, Aug 26, 2015 at 01:21:08PM +0000, Alexey Brodkin wrote:
> Hi Peter,
>
> On Wed, 2015-08-26 at 15:12 +0200, Peter Zijlstra wrote:
> > On Mon, Aug 24, 2015 at 05:20:20PM +0300, Alexey Brodkin wrote:
> > > @@ -295,6 +317,16 @@ static int arc_pmu_add(struct perf_event *event, int flags)
> > > }
> > >
> > > write_aux_reg(ARC_REG_PCT_INDEX, idx);
> > > +
> > > + arc_pmu->act_counter[idx] = event;
> > > +
> > > + if (is_sampling_event(event)) {
> > > + /* Mimic full counter overflow as other arches do */
> > > + write_aux_reg(ARC_REG_PCT_INT_CNTL, (u32)arc_pmu->max_period);
> > > + write_aux_reg(ARC_REG_PCT_INT_CNTH,
> > > + (arc_pmu->max_period >> 32));
> > > + }
> > > +
> >
> > pmu::add should call pmu::start when PERF_EF_START, without that it
> > should not start the counter, only schedule it.
> >
> > (although currently all pmu::add() calls will have EF_START set)
>
> And that's what we do, don't we?
> ----------------------->8-----------------------
> if (flags & PERF_EF_START)
> arc_pmu_start(event, PERF_EF_RELOAD);
> ----------------------->8-----------------------
>

D'uh indeed! I read that above as enabling it, while what it really does
it simply program the interrupt thresholds.

2015-08-26 14:35:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] ARCv2: perf: Support sampling events using overflow interrupts

On Wed, Aug 26, 2015 at 01:17:20PM +0000, Alexey Brodkin wrote:
> Hi Peter,
>
> On Wed, 2015-08-26 at 15:07 +0200, Peter Zijlstra wrote:
> > On Mon, Aug 24, 2015 at 05:20:20PM +0300, Alexey Brodkin wrote:
> > > @@ -139,9 +141,11 @@ static int arc_pmu_event_init(struct perf_event *event)
> > > struct hw_perf_event *hwc = &event->hw;
> > > int ret;
> > >
> > > - hwc->sample_period = arc_pmu->max_period;
> > > - hwc->last_period = hwc->sample_period;
> > > - local64_set(&hwc->period_left, hwc->sample_period);
> > > + if (!is_sampling_event(event)) {
> > > + hwc->sample_period = arc_pmu->max_period;
> > > + hwc->last_period = hwc->sample_period;
> > > + local64_set(&hwc->period_left, hwc->sample_period);
> > > + }
> >
> > So here we set a max_period sample period for !sampling events such that
> > we can properly deal with (short) counter overflow and accumulate into a
> > 64bit value.
> >
> > > switch (event->attr.type) {
> > > case PERF_TYPE_HARDWARE:
> > > @@ -243,6 +247,11 @@ static void arc_pmu_start(struct perf_event *event, int flags)
> > >
> > > arc_pmu_event_set_period(event);
> > >
> > > + /* Enable interrupt for this counter */
> > > + if (is_sampling_event(event))
> > > + write_aux_reg(ARC_REG_PCT_INT_CTRL,
> > > + read_aux_reg(ARC_REG_PCT_INT_CTRL) | (1 << idx));
> > > +
> >
> > Yet here you fail to actually enable the interrupt for the non sampling
> > events, which makes the above not work.
>
> Indeed we intentionally leave interrupts disabled for non-sampling events.
> [1] We have quite large counters so we don't expect to overflow normally
> [2] We may re-use the same code for hardware that lacks support of IRQs in PCT.
> See we check if IRQs are available and if not set PERF_PMU_CAP_NO_INTERRUPT
> that will guarantee we won't get sampling event and for non-sampling events
> we won't use IRQs.

Tricky, I was seeing is_isa_arcv2() calls elsewhere, so I figured you'd
make it conditional on that.

But sure, if you think you can live with 1 this'll work.

2015-08-26 14:36:08

by Alexey Brodkin

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] ARCv2: perf: Support sampling events using overflow interrupts

Hi Peter,

On Wed, 2015-08-26 at 16:32 +-0200, Peter Zijlstra wrote:
+AD4- On Wed, Aug 26, 2015 at 01:21:08PM +-0000, Alexey Brodkin wrote:
+AD4- +AD4- Hi Peter,
+AD4- +AD4-
+AD4- +AD4- On Wed, 2015-08-26 at 15:12 +-0200, Peter Zijlstra wrote:
+AD4- +AD4- +AD4- On Mon, Aug 24, 2015 at 05:20:20PM +-0300, Alexey Brodkin wrote:
+AD4- +AD4- +AD4- +AD4- +AEAAQA- -295,6 +-317,16 +AEAAQA- static int arc+AF8-pmu+AF8-add(struct perf+AF8-event +ACo-event, int flags)
+AD4- +AD4- +AD4- +AD4- +AH0-
+AD4- +AD4- +AD4- +AD4-
+AD4- +AD4- +AD4- +AD4- write+AF8-aux+AF8-reg(ARC+AF8-REG+AF8-PCT+AF8-INDEX, idx)+ADs-
+AD4- +AD4- +AD4- +AD4- +-
+AD4- +AD4- +AD4- +AD4- +- arc+AF8-pmu-+AD4-act+AF8-counter+AFs-idx+AF0- +AD0- event+ADs-
+AD4- +AD4- +AD4- +AD4- +-
+AD4- +AD4- +AD4- +AD4- +- if (is+AF8-sampling+AF8-event(event)) +AHs-
+AD4- +AD4- +AD4- +AD4- +- /+ACo- Mimic full counter overflow as other arches do +ACo-/
+AD4- +AD4- +AD4- +AD4- +- write+AF8-aux+AF8-reg(ARC+AF8-REG+AF8-PCT+AF8-INT+AF8-CNTL, (u32)arc+AF8-pmu-+AD4-max+AF8-period)+ADs-
+AD4- +AD4- +AD4- +AD4- +- write+AF8-aux+AF8-reg(ARC+AF8-REG+AF8-PCT+AF8-INT+AF8-CNTH,
+AD4- +AD4- +AD4- +AD4- +- (arc+AF8-pmu-+AD4-max+AF8-period +AD4APg- 32))+ADs-
+AD4- +AD4- +AD4- +AD4- +- +AH0-
+AD4- +AD4- +AD4- +AD4- +-
+AD4- +AD4- +AD4-
+AD4- +AD4- +AD4- pmu::add should call pmu::start when PERF+AF8-EF+AF8-START, without that it
+AD4- +AD4- +AD4- should not start the counter, only schedule it.
+AD4- +AD4- +AD4-
+AD4- +AD4- +AD4- (although currently all pmu::add() calls will have EF+AF8-START set)
+AD4- +AD4-
+AD4- +AD4- And that's what we do, don't we?
+AD4- +AD4- -----------------------+AD4-8-----------------------
+AD4- +AD4- if (flags +ACY- PERF+AF8-EF+AF8-START)
+AD4- +AD4- arc+AF8-pmu+AF8-start(event, PERF+AF8-EF+AF8-RELOAD)+ADs-
+AD4- +AD4- -----------------------+AD4-8-----------------------
+AD4- +AD4-
+AD4-
+AD4- D'uh indeed+ACE- I read that above as enabling it, while what it really does
+AD4- it simply program the interrupt thresholds.

That's ok.
So do I need to do anything now or both your initial comments today are no longer valid?

-Alexey-

2015-08-26 14:42:58

by Alexey Brodkin

[permalink] [raw]
Subject: Re: [arc-linux-dev] Re: [PATCH v3 3/6] ARCv2: perf: Support sampling events using overflow interrupts

Hi Peter,

On Wed, 2015-08-26 at 16:35 +-0200, Peter Zijlstra wrote:
+AD4- On Wed, Aug 26, 2015 at 01:17:20PM +-0000, Alexey Brodkin wrote:
+AD4- +AD4- Hi Peter,
+AD4- +AD4-
+AD4- +AD4- On Wed, 2015-08-26 at 15:07 +-0200, Peter Zijlstra wrote:
+AD4- +AD4- +AD4- On Mon, Aug 24, 2015 at 05:20:20PM +-0300, Alexey Brodkin wrote:
+AD4- +AD4- +AD4- +AD4- +AEAAQA- -139,9 +-141,11 +AEAAQA- static int arc+AF8-pmu+AF8-event+AF8-init(struct perf+AF8-event +ACo-event)
+AD4- +AD4- +AD4- +AD4- struct hw+AF8-perf+AF8-event +ACo-hwc +AD0- +ACY-event-+AD4-hw+ADs-
+AD4- +AD4- +AD4- +AD4- int ret+ADs-
+AD4- +AD4- +AD4- +AD4-
+AD4- +AD4- +AD4- +AD4- - hwc-+AD4-sample+AF8-period +AD0- arc+AF8-pmu-+AD4-max+AF8-period+ADs-
+AD4- +AD4- +AD4- +AD4- - hwc-+AD4-last+AF8-period +AD0- hwc-+AD4-sample+AF8-period+ADs-
+AD4- +AD4- +AD4- +AD4- - local64+AF8-set(+ACY-hwc-+AD4-period+AF8-left, hwc-+AD4-sample+AF8-period)+ADs-
+AD4- +AD4- +AD4- +AD4- +- if (+ACE-is+AF8-sampling+AF8-event(event)) +AHs-
+AD4- +AD4- +AD4- +AD4- +- hwc-+AD4-sample+AF8-period +AD0- arc+AF8-pmu-+AD4-max+AF8-period+ADs-
+AD4- +AD4- +AD4- +AD4- +- hwc-+AD4-last+AF8-period +AD0- hwc-+AD4-sample+AF8-period+ADs-
+AD4- +AD4- +AD4- +AD4- +- local64+AF8-set(+ACY-hwc-+AD4-period+AF8-left, hwc-+AD4-sample+AF8-period)+ADs-
+AD4- +AD4- +AD4- +AD4- +- +AH0-
+AD4- +AD4- +AD4-
+AD4- +AD4- +AD4- So here we set a max+AF8-period sample period for +ACE-sampling events such that
+AD4- +AD4- +AD4- we can properly deal with (short) counter overflow and accumulate into a
+AD4- +AD4- +AD4- 64bit value.
+AD4- +AD4- +AD4-
+AD4- +AD4- +AD4- +AD4- switch (event-+AD4-attr.type) +AHs-
+AD4- +AD4- +AD4- +AD4- case PERF+AF8-TYPE+AF8-HARDWARE:
+AD4- +AD4- +AD4- +AD4- +AEAAQA- -243,6 +-247,11 +AEAAQA- static void arc+AF8-pmu+AF8-start(struct perf+AF8-event +ACo-event, int flags)
+AD4- +AD4- +AD4- +AD4-
+AD4- +AD4- +AD4- +AD4- arc+AF8-pmu+AF8-event+AF8-set+AF8-period(event)+ADs-
+AD4- +AD4- +AD4- +AD4-
+AD4- +AD4- +AD4- +AD4- +- /+ACo- Enable interrupt for this counter +ACo-/
+AD4- +AD4- +AD4- +AD4- +- if (is+AF8-sampling+AF8-event(event))
+AD4- +AD4- +AD4- +AD4- +- write+AF8-aux+AF8-reg(ARC+AF8-REG+AF8-PCT+AF8-INT+AF8-CTRL,
+AD4- +AD4- +AD4- +AD4- +- read+AF8-aux+AF8-reg(ARC+AF8-REG+AF8-PCT+AF8-INT+AF8-CTRL) +AHw- (1 +ADwAPA- idx))+ADs-
+AD4- +AD4- +AD4- +AD4- +-
+AD4- +AD4- +AD4-
+AD4- +AD4- +AD4- Yet here you fail to actually enable the interrupt for the non sampling
+AD4- +AD4- +AD4- events, which makes the above not work.
+AD4- +AD4-
+AD4- +AD4- Indeed we intentionally leave interrupts disabled for non-sampling events.
+AD4- +AD4- +AFs-1+AF0- We have quite large counters so we don't expect to overflow normally
+AD4- +AD4- +AFs-2+AF0- We may re-use the same code for hardware that lacks support of IRQs in PCT.
+AD4- +AD4- See we check if IRQs are available and if not set PERF+AF8-PMU+AF8-CAP+AF8-NO+AF8-INTERRUPT
+AD4- +AD4- that will guarantee we won't get sampling event and for non-sampling events
+AD4- +AD4- we won't use IRQs.
+AD4-
+AD4- Tricky, I was seeing is+AF8-isa+AF8-arcv2() calls elsewhere, so I figured you'd
+AD4- make it conditional on that.
+AD4-
+AD4- But sure, if you think you can live with 1 this'll work.

Well indeed there's a room for improvement always.
But from our current experience existing implementation works pretty fine.

Moreover having now PCT IRQs we mostly use sampling events that allow doing
real profiling.

-Alexey-

2015-08-27 06:58:14

by Alexey Brodkin

[permalink] [raw]
Subject: Re: [arc-linux-dev] Re: [PATCH v3 0/6] ARCv2 port to Linux - (C) perf

Hi Peter,

On Wed, 2015-08-26 at 12:07 +-0000, Alexey Brodkin wrote:
+AD4- Hi Peter,
+AD4-
+AD4- On Mon, 2015-08-24 at 17:20 +-0300, Alexey Brodkin wrote:
+AD4- +AD4- Hi Peter,
+AD4- +AD4-
+AD4- +AD4- This mini-series adds perf support for ARCv2 based cores, which brings in
+AD4- +AD4- overflow interupts and SMP. Additionally now raw events are supported as well.
+AD4- +AD4-
+AD4- +AD4- Please review +ACE-
+AD4- +AD4-
+AD4- +AD4- Compared to v2 this series has:
+AD4- +AD4- +AFs-1+AF0- Removed patch with raw-events support.
+AD4- +AD4- It needs some rework and let's then discuss it separately.
+AD4- +AD4- Still I plan to send it shortly.
+AD4- +AD4- +AFs-2+AF0- Merged +ACI-set usable max period as a half of real max period+ACI- into
+AD4- +AD4- +ACI-implement +ACI-event+AF8-set+AF8-period+ACI-.
+AD4- +AD4- +AFs-3+AF0- Fixed arc+AF8-pmu+AF8-event+AF8-set+AF8-period() in regard of incorrect
+AD4- +AD4- +ACI-hwc-+AD4-period+AF8-left+ACI- setup.
+AD4- +AD4- +AFs-4+AF0- Moved interrupts enabling from arc+AF8-pmu+AF8-add() to arc+AF8-pmu+AF8-start()
+AD4- +AD4-
+AD4- +AD4- Compared to v1 this series has:
+AD4- +AD4- +AFs-1+AF0- Addressed review comments
+AD4- +AD4- +AFs-2+AF0- More verbose commit messages and comments in sources
+AD4- +AD4- +AFs-3+AF0- Minor cosmetics
+AD4- +AD4-
+AD4- +AD4- Thanks,
+AD4- +AD4- Alexey

I'm wondering if you have any other comments on this series,
otherwise please consider it for applying.

-Alexey-