2018-12-14 09:14:07

by Anju T Sudhakar

[permalink] [raw]
Subject: [PATCH v2 0/5] powerpc/perf: IMC trace-mode support

IMC (In-Memory collection counters) is a hardware monitoring facility
that collects large number of hardware performance events.
POWER9 support two modes for IMC which are the Accumulation mode and
Trace mode. In Accumulation mode, event counts are accumulated in system
Memory. Hypervisor then reads the posted counts periodically or when
requested. In IMC Trace mode, event counted is fixed for cycles and on
each overflow, hardware snapshots the program counter along with other
details and writes into memory pointed by LDBAR(ring buffer memory,
hardware wraps around). LDBAR has bit which indicates the IMC trace-mode.

Trace-IMC Implementation:
--------------------------
To enable trace-imc, we need to

* Add trace node in the DTS file for power9, so that the new trace node can
be discovered by the kernel.

Information included in the DTS file are as follows, (a snippet from
the ima-catalog)

TRACE_IMC: trace-events {
#address-cells = <0x1>;
#size-cells = <0x1>;
event@10200000 {
event-name = "cycles" ;
reg = <0x10200000 0x8>;
desc = "Reference cycles" ;
};
};
trace@0 {
compatible = "ibm,imc-counters";
events-prefix = "trace_";
reg = <0x0 0x8>;
events = < &TRACE_IMC >;
type = <0x2>;
size = <0x40000>;
};

OP-BUILD changes needed to include the "trace node" is already pulled in
to the ima-catalog repo.

ps://github.com/open-power/op-build/commit/d3e75dc26d1283d7d5eb444bff1ec9e40d5dfc07

* Enchance the opal_imc_counters_* calls to support this new trace mode
in imc. Add support to initialize the trace-mode scom.

TRACE_IMC_SCOM bit representation:

0:1 : SAMPSEL
2:33 : CPMC_LOAD
34:40 : CPMC1SEL
41:47 : CPMC2SEL
48:50 : BUFFERSIZE
51:63 : RESERVED

CPMC_LOAD contains the sampling duration. SAMPSEL and CPMC*SEL determines
the event to count. BUFFRSIZE indicates the memory range. On each overflow,
hardware snapshots program counter along with other details and update the
memory and reloads the CMPC_LOAD value for the next sampling duration.
IMC hardware does not support exceptions, so it quietly wraps around if
memory buffer reaches the end.

Link to the skiboot patches to enhance the opal_imc_counters_* calls:
https://lists.ozlabs.org/pipermail/skiboot/2018-December/012878.html
https://lists.ozlabs.org/pipermail/skiboot/2018-December/012879.html
https://lists.ozlabs.org/pipermail/skiboot/2018-December/012882.html
https://lists.ozlabs.org/pipermail/skiboot/2018-December/012880.html
https://lists.ozlabs.org/pipermail/skiboot/2018-December/012881.html
https://lists.ozlabs.org/pipermail/skiboot/2018-December/012883.html

* Set LDBAR spr to enable imc-trace mode.

LDBAR Layout:

0 : Enable/Disable
1 : 0 -> Accumulation Mode
1 -> Trace Mode
2:3 : Reserved
4-6 : PB scope
7 : Reserved
8:50 : Counter Address
51:63 : Reserved

----------------

Key benefit of imc trace-mode is, each sample record contains the address
pointer along with other information. So that, we can profile the IP
without interrupting the application.

Performance data using 'perf top' with and without trace-imc event:

When the application is monitored with trace-imc event, we dont take any
PMI interrupts.

PMI interrupts count when `perf top` command is executed without trac-imc event.

# perf top
12.53% [kernel] [k] arch_cpu_idle
11.32% [kernel] [k] rcu_idle_enter
10.76% [kernel] [k] __next_timer_interrupt
9.49% [kernel] [k] find_next_bit
8.06% [kernel] [k] rcu_dynticks_eqs_exit
7.82% [kernel] [k] do_idle
5.71% [kernel] [k] tick_nohz_idle_stop_tic
[-----------------------]
# cat /proc/interrupts (a snippet from the output)
9944 1072 804 804 1644 804 1306
804 804 804 804 804 804 804
804 804 1961 1602 804 804 1258
[-----------------------------------------------------------------]
803 803 803 803 803 803 803
803 803 803 803 804 804 804
804 804 804 804 804 804 803
803 803 803 803 803 1306 803
803 Performance monitoring interrupts


`perf top` with trace-imc (right after 'perf top' without trace-imc event):

# perf top -e trace_imc/trace_cycles/
12.50% [kernel] [k] arch_cpu_idle
11.81% [kernel] [k] __next_timer_interrupt
11.22% [kernel] [k] rcu_idle_enter
10.25% [kernel] [k] find_next_bit
7.91% [kernel] [k] do_idle
7.69% [kernel] [k] rcu_dynticks_eqs_exit
5.20% [kernel] [k] tick_nohz_idle_stop_tick
[-----------------------]

# cat /proc/interrupts (a snippet from the output)

9944 1072 804 804 1644 804 1306
804 804 804 804 804 804 804
804 804 1961 1602 804 804 1258
[-----------------------------------------------------------------]
803 803 803 803 803 803 803
803 803 803 804 804 804 804
804 804 804 804 804 804 803
803 803 803 803 803 1306 803
803 Performance monitoring interrupts

The PMI interrupts count remains the same.

Changelog:

From v1 -> v2
--------------

* Added privileged access check for thread-imc and trace-imc

Suggestions/comments are welcome.

Anju T Sudhakar (4):
powerpc/include: Add data structures and macros for IMC trace mode
powerpc/perf: Rearrange setting of ldbar for thread-imc
powerpc/perf: Trace imc events detection and cpuhotplug
powerpc/perf: Trace imc PMU functions

Madhavan Srinivasan (1):
powerpc/perf: Add privileged access check for thread_imc

arch/powerpc/include/asm/imc-pmu.h | 39 +++
arch/powerpc/include/asm/opal-api.h | 1 +
arch/powerpc/perf/imc-pmu.c | 297 +++++++++++++++++++++-
arch/powerpc/platforms/powernv/opal-imc.c | 3 +
include/linux/cpuhotplug.h | 1 +
5 files changed, 330 insertions(+), 11 deletions(-)
--
2.17.1



2018-12-14 09:13:07

by Anju T Sudhakar

[permalink] [raw]
Subject: [PATCH v2 1/5] powerpc/include: Add data structures and macros for IMC trace mode

Add the macros needed for IMC (In-Memory Collection Counters) trace-mode
and data structure to hold the trace-imc record data.
Also, add the new type "OPAL_IMC_COUNTERS_TRACE" in 'opal-api.h', since
there is a new switch case added in the opal-calls for IMC.

Signed-off-by: Anju T Sudhakar <[email protected]>
---
arch/powerpc/include/asm/imc-pmu.h | 39 +++++++++++++++++++++++++++++
arch/powerpc/include/asm/opal-api.h | 1 +
2 files changed, 40 insertions(+)

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index 69f516ecb2fd..7c2ef0e42661 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -33,6 +33,7 @@
*/
#define THREAD_IMC_LDBAR_MASK 0x0003ffffffffe000ULL
#define THREAD_IMC_ENABLE 0x8000000000000000ULL
+#define TRACE_IMC_ENABLE 0x4000000000000000ULL

/*
* For debugfs interface for imc-mode and imc-command
@@ -59,6 +60,34 @@ struct imc_events {
char *scale;
};

+/*
+ * Trace IMC hardware updates a 64bytes record on
+ * Core Performance Monitoring Counter (CPMC)
+ * overflow. Here is the layout for the trace imc record
+ *
+ * DW 0 : Timebase
+ * DW 1 : Program Counter
+ * DW 2 : PIDR information
+ * DW 3 : CPMC1
+ * DW 4 : CPMC2
+ * DW 5 : CPMC3
+ * Dw 6 : CPMC4
+ * DW 7 : Timebase
+ * .....
+ *
+ * The following is the data structure to hold trace imc data.
+ */
+struct trace_imc_data {
+ u64 tb1;
+ u64 ip;
+ u64 val;
+ u64 cpmc1;
+ u64 cpmc2;
+ u64 cpmc3;
+ u64 cpmc4;
+ u64 tb2;
+};
+
/* Event attribute array index */
#define IMC_FORMAT_ATTR 0
#define IMC_EVENT_ATTR 1
@@ -68,6 +97,13 @@ struct imc_events {
/* PMU Format attribute macros */
#define IMC_EVENT_OFFSET_MASK 0xffffffffULL

+/*
+ * Macro to mask bits 0:21 of first double word(which is the timebase) to
+ * compare with 8th double word (timebase) of trace imc record data.
+ */
+#define IMC_TRACE_RECORD_TB1_MASK 0x3ffffffffffULL
+
+
/*
* Device tree parser code detects IMC pmu support and
* registers new IMC pmus. This structure will hold the
@@ -113,6 +149,7 @@ struct imc_pmu_ref {

enum {
IMC_TYPE_THREAD = 0x1,
+ IMC_TYPE_TRACE = 0x2,
IMC_TYPE_CORE = 0x4,
IMC_TYPE_CHIP = 0x10,
};
@@ -123,6 +160,8 @@ enum {
#define IMC_DOMAIN_NEST 1
#define IMC_DOMAIN_CORE 2
#define IMC_DOMAIN_THREAD 3
+/* For trace-imc the domain is still thread but it operates in trace-mode */
+#define IMC_DOMAIN_TRACE 4

extern int init_imc_pmu(struct device_node *parent,
struct imc_pmu *pmu_ptr, int pmu_id);
diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 870fb7b239ea..a4130b21b159 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -1118,6 +1118,7 @@ enum {
enum {
OPAL_IMC_COUNTERS_NEST = 1,
OPAL_IMC_COUNTERS_CORE = 2,
+ OPAL_IMC_COUNTERS_TRACE = 3,
};


--
2.17.1


2018-12-14 09:13:23

by Anju T Sudhakar

[permalink] [raw]
Subject: [PATCH v2 3/5] powerpc/perf: Add privileged access check for thread_imc

From: Madhavan Srinivasan <[email protected]>

Add code to restrict user access to thread_imc pmu since
some event report privilege level information.

Fixes: f74c89bd80fb3 ('powerpc/perf: Add thread IMC PMU support')
Signed-off-by: Madhavan Srinivasan <[email protected]>
Signed-off-by: Anju T Sudhakar <[email protected]>
---
arch/powerpc/perf/imc-pmu.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 3bef46f8417d..5ca80545a849 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -877,6 +877,9 @@ static int thread_imc_event_init(struct perf_event *event)
if (event->attr.type != event->pmu->type)
return -ENOENT;

+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+
/* Sampling not supported */
if (event->hw.sample_period)
return -EINVAL;
--
2.17.1


2018-12-14 09:14:14

by Anju T Sudhakar

[permalink] [raw]
Subject: [PATCH v2 5/5] powerpc/perf: Trace imc PMU functions

Add PMU functions to support trace-imc.

Signed-off-by: Anju T Sudhakar <[email protected]>
---
arch/powerpc/perf/imc-pmu.c | 175 ++++++++++++++++++++++++++++++++++++
1 file changed, 175 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 1f09265c8fb0..32ff0e449fca 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1120,6 +1120,173 @@ static int trace_imc_cpu_init(void)
ppc_trace_imc_cpu_offline);
}

+static u64 get_trace_imc_event_base_addr(void)
+{
+ return (u64)per_cpu(trace_imc_mem, smp_processor_id());
+}
+
+/*
+ * Function to parse trace-imc data obtained
+ * and to prepare the perf sample.
+ */
+static int trace_imc_prepare_sample(struct trace_imc_data *mem,
+ struct perf_sample_data *data,
+ u64 *prev_tb,
+ struct perf_event_header *header,
+ struct perf_event *event)
+{
+ /* Sanity checks for a valid record */
+ if (be64_to_cpu(READ_ONCE(mem->tb1)) > *prev_tb)
+ *prev_tb = be64_to_cpu(READ_ONCE(mem->tb1));
+ else
+ return -EINVAL;
+
+ if ((be64_to_cpu(READ_ONCE(mem->tb1)) & IMC_TRACE_RECORD_TB1_MASK) !=
+ be64_to_cpu(READ_ONCE(mem->tb2)))
+ return -EINVAL;
+
+ /* Prepare perf sample */
+ data->ip = be64_to_cpu(READ_ONCE(mem->ip));
+ data->period = event->hw.last_period;
+
+ header->type = PERF_RECORD_SAMPLE;
+ header->size = sizeof(*header) + event->header_size;
+ header->misc = 0;
+
+ if (is_kernel_addr(data->ip))
+ header->misc |= PERF_RECORD_MISC_KERNEL;
+ else
+ header->misc |= PERF_RECORD_MISC_USER;
+
+ perf_event_header__init_id(header, data, event);
+
+ return 0;
+}
+
+static void dump_trace_imc_data(struct perf_event *event)
+{
+ struct trace_imc_data *mem;
+ int i, ret;
+ u64 prev_tb = 0;
+
+ mem = (struct trace_imc_data *)get_trace_imc_event_base_addr();
+ for (i = 0; i < (trace_imc_mem_size / sizeof(struct trace_imc_data));
+ i++, mem++) {
+ struct perf_sample_data data;
+ struct perf_event_header header;
+
+ ret = trace_imc_prepare_sample(mem, &data, &prev_tb, &header, event);
+ if (ret) /* Exit, if not a valid record */
+ break;
+ else {
+ /* If this is a valid record, create the sample */
+ struct perf_output_handle handle;
+
+ if (perf_output_begin(&handle, event, header.size))
+ return;
+
+ perf_output_sample(&handle, &header, &data, event);
+ perf_output_end(&handle);
+ }
+ }
+}
+
+static int trace_imc_event_add(struct perf_event *event, int flags)
+{
+ /* Enable the sched_task to start the engine */
+ perf_sched_cb_inc(event->ctx->pmu);
+ return 0;
+}
+
+static void trace_imc_event_read(struct perf_event *event)
+{
+ dump_trace_imc_data(event);
+}
+
+static void trace_imc_event_stop(struct perf_event *event, int flags)
+{
+ trace_imc_event_read(event);
+}
+
+static void trace_imc_event_start(struct perf_event *event, int flags)
+{
+ return;
+}
+
+static void trace_imc_event_del(struct perf_event *event, int flags)
+{
+ perf_sched_cb_dec(event->ctx->pmu);
+}
+
+void trace_imc_pmu_sched_task(struct perf_event_context *ctx,
+ bool sched_in)
+{
+ int core_id = smp_processor_id() / threads_per_core;
+ struct imc_pmu_ref *ref;
+ u64 local_mem, ldbar_value;
+
+ /* Set trace-imc bit in ldbar and load ldbar with per-thread memory address */
+ local_mem = get_trace_imc_event_base_addr();
+ ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | TRACE_IMC_ENABLE;
+
+ ref = &core_imc_refc[core_id];
+ if (!ref)
+ return;
+
+ if (sched_in) {
+ mtspr(SPRN_LDBAR, ldbar_value);
+ mutex_lock(&ref->lock);
+ if (ref->refc == 0) {
+ if (opal_imc_counters_start(OPAL_IMC_COUNTERS_TRACE,
+ get_hard_smp_processor_id(smp_processor_id()))) {
+ mutex_unlock(&ref->lock);
+ pr_err("trace-imc: Unable to start the counters for core %d\n", core_id);
+ mtspr(SPRN_LDBAR, 0);
+ return;
+ }
+ }
+ ++ref->refc;
+ mutex_unlock(&ref->lock);
+ } else {
+ mtspr(SPRN_LDBAR, 0);
+ mutex_lock(&ref->lock);
+ ref->refc--;
+ if (ref->refc == 0) {
+ if (opal_imc_counters_stop(OPAL_IMC_COUNTERS_TRACE,
+ get_hard_smp_processor_id(smp_processor_id()))) {
+ mutex_unlock(&ref->lock);
+ pr_err("trace-imc: Unable to stop the counters for core %d\n", core_id);
+ return;
+ }
+ } else if (ref->refc < 0) {
+ ref->refc = 0;
+ }
+ mutex_unlock(&ref->lock);
+ }
+ return;
+}
+
+static int trace_imc_event_init(struct perf_event *event)
+{
+ struct task_struct *target;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+
+ /* Return if this is a couting event */
+ if (event->attr.sample_period == 0)
+ return -ENOENT;
+
+ event->hw.idx = -1;
+ target = event->hw.target;
+
+ event->pmu->task_ctx_nr = perf_hw_context;
+ return 0;
+}
+
/* update_pmu_ops : Populate the appropriate operations for "pmu" */
static int update_pmu_ops(struct imc_pmu *pmu)
{
@@ -1149,6 +1316,14 @@ static int update_pmu_ops(struct imc_pmu *pmu)
pmu->pmu.cancel_txn = thread_imc_pmu_cancel_txn;
pmu->pmu.commit_txn = thread_imc_pmu_commit_txn;
break;
+ case IMC_DOMAIN_TRACE:
+ pmu->pmu.event_init = trace_imc_event_init;
+ pmu->pmu.add = trace_imc_event_add;
+ pmu->pmu.del = trace_imc_event_del;
+ pmu->pmu.start = trace_imc_event_start;
+ pmu->pmu.stop = trace_imc_event_stop;
+ pmu->pmu.read = trace_imc_event_read;
+ pmu->pmu.sched_task = trace_imc_pmu_sched_task;
default:
break;
}
--
2.17.1


2018-12-14 09:14:19

by Anju T Sudhakar

[permalink] [raw]
Subject: [PATCH v2 2/5] powerpc/perf: Rearrange setting of ldbar for thread-imc

LDBAR holds the memory address allocated for each cpu. For thread-imc
the mode bit (i.e bit 1) of LDBAR is set to accumulation.
Currently, ldbar is loaded with per cpu memory address and mode set to
accumulation at boot time.

To enable trace-imc, the mode bit of ldbar should be set to 'trace'. So to
accommodate trace-mode of IMC, reposition setting of ldbar for thread-imc
to thread_imc_event_add(). Also reset ldbar at thread_imc_event_del().

Signed-off-by: Anju T Sudhakar <[email protected]>
---
arch/powerpc/perf/imc-pmu.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f292a3f284f1..3bef46f8417d 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -806,8 +806,11 @@ static int core_imc_event_init(struct perf_event *event)
}

/*
- * Allocates a page of memory for each of the online cpus, and write the
- * physical base address of that page to the LDBAR for that cpu.
+ * Allocates a page of memory for each of the online cpus, and load
+ * LDBAR with 0.
+ * The physical base address of the page allocated for a cpu will be
+ * written to the LDBAR for that cpu, when the thread-imc event
+ * is added.
*
* LDBAR Register Layout:
*
@@ -825,7 +828,7 @@ static int core_imc_event_init(struct perf_event *event)
*/
static int thread_imc_mem_alloc(int cpu_id, int size)
{
- u64 ldbar_value, *local_mem = per_cpu(thread_imc_mem, cpu_id);
+ u64 *local_mem = per_cpu(thread_imc_mem, cpu_id);
int nid = cpu_to_node(cpu_id);

if (!local_mem) {
@@ -842,9 +845,7 @@ static int thread_imc_mem_alloc(int cpu_id, int size)
per_cpu(thread_imc_mem, cpu_id) = local_mem;
}

- ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | THREAD_IMC_ENABLE;
-
- mtspr(SPRN_LDBAR, ldbar_value);
+ mtspr(SPRN_LDBAR, 0);
return 0;
}

@@ -995,6 +996,7 @@ static int thread_imc_event_add(struct perf_event *event, int flags)
{
int core_id;
struct imc_pmu_ref *ref;
+ u64 ldbar_value, *local_mem = per_cpu(thread_imc_mem, smp_processor_id());

if (flags & PERF_EF_START)
imc_event_start(event, flags);
@@ -1003,6 +1005,9 @@ static int thread_imc_event_add(struct perf_event *event, int flags)
return -EINVAL;

core_id = smp_processor_id() / threads_per_core;
+ ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | THREAD_IMC_ENABLE;
+ mtspr(SPRN_LDBAR, ldbar_value);
+
/*
* imc pmus are enabled only when it is used.
* See if this is triggered for the first time.
@@ -1034,11 +1039,7 @@ static void thread_imc_event_del(struct perf_event *event, int flags)
int core_id;
struct imc_pmu_ref *ref;

- /*
- * Take a snapshot and calculate the delta and update
- * the event counter values.
- */
- imc_event_update(event);
+ mtspr(SPRN_LDBAR, 0);

core_id = smp_processor_id() / threads_per_core;
ref = &core_imc_refc[core_id];
@@ -1057,6 +1058,11 @@ static void thread_imc_event_del(struct perf_event *event, int flags)
ref->refc = 0;
}
mutex_unlock(&ref->lock);
+ /*
+ * Take a snapshot and calculate the delta and update
+ * the event counter values.
+ */
+ imc_event_update(event);
}

/* update_pmu_ops : Populate the appropriate operations for "pmu" */
--
2.17.1


2018-12-14 09:14:19

by Anju T Sudhakar

[permalink] [raw]
Subject: [PATCH v2 4/5] powerpc/perf: Trace imc events detection and cpuhotplug

Patch detects trace-imc events, does memory initilizations for each online
cpu, and registers cpuhotplug call-backs.

Signed-off-by: Anju T Sudhakar <[email protected]>
---
arch/powerpc/perf/imc-pmu.c | 91 +++++++++++++++++++++++
arch/powerpc/platforms/powernv/opal-imc.c | 3 +
include/linux/cpuhotplug.h | 1 +
3 files changed, 95 insertions(+)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 5ca80545a849..1f09265c8fb0 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -43,6 +43,10 @@ static DEFINE_PER_CPU(u64 *, thread_imc_mem);
static struct imc_pmu *thread_imc_pmu;
static int thread_imc_mem_size;

+/* Trace IMC data structures */
+static DEFINE_PER_CPU(u64 *, trace_imc_mem);
+static int trace_imc_mem_size;
+
static struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
{
return container_of(event->pmu, struct imc_pmu, pmu);
@@ -1068,6 +1072,54 @@ static void thread_imc_event_del(struct perf_event *event, int flags)
imc_event_update(event);
}

+/*
+ * Allocate a page of memory for each cpu, and load LDBAR with 0.
+ */
+static int trace_imc_mem_alloc(int cpu_id, int size)
+{
+ u64 *local_mem = per_cpu(trace_imc_mem, cpu_id);
+ int phys_id = cpu_to_node(cpu_id), rc = 0;
+
+ if (!local_mem) {
+ local_mem = page_address(alloc_pages_node(phys_id,
+ GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE |
+ __GFP_NOWARN, get_order(size)));
+ if (!local_mem)
+ return -ENOMEM;
+ per_cpu(trace_imc_mem, cpu_id) = local_mem;
+
+ /* Initialise the counters for trace mode */
+ rc = opal_imc_counters_init(OPAL_IMC_COUNTERS_TRACE, __pa((void *)local_mem),
+ get_hard_smp_processor_id(cpu_id));
+ if (rc) {
+ pr_info("IMC:opal init failed for trace imc\n");
+ return rc;
+ }
+ }
+
+ mtspr(SPRN_LDBAR, 0);
+ return 0;
+}
+
+static int ppc_trace_imc_cpu_online(unsigned int cpu)
+{
+ return trace_imc_mem_alloc(cpu, trace_imc_mem_size);
+}
+
+static int ppc_trace_imc_cpu_offline(unsigned int cpu)
+{
+ mtspr(SPRN_LDBAR, 0);
+ return 0;
+}
+
+static int trace_imc_cpu_init(void)
+{
+ return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE,
+ "perf/powerpc/imc_trace:online",
+ ppc_trace_imc_cpu_online,
+ ppc_trace_imc_cpu_offline);
+}
+
/* update_pmu_ops : Populate the appropriate operations for "pmu" */
static int update_pmu_ops(struct imc_pmu *pmu)
{
@@ -1189,6 +1241,17 @@ static void cleanup_all_thread_imc_memory(void)
}
}

+static void cleanup_all_trace_imc_memory(void)
+{
+ int i, order = get_order(trace_imc_mem_size);
+
+ for_each_online_cpu(i) {
+ if (per_cpu(trace_imc_mem, i))
+ free_pages((u64)per_cpu(trace_imc_mem, i), order);
+
+ }
+}
+
/* Function to free the attr_groups which are dynamically allocated */
static void imc_common_mem_free(struct imc_pmu *pmu_ptr)
{
@@ -1230,6 +1293,11 @@ static void imc_common_cpuhp_mem_free(struct imc_pmu *pmu_ptr)
cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE);
cleanup_all_thread_imc_memory();
}
+
+ if (pmu_ptr->domain == IMC_DOMAIN_TRACE) {
+ cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE);
+ cleanup_all_trace_imc_memory();
+ }
}

/*
@@ -1312,6 +1380,21 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr, struct device_node *parent,

thread_imc_pmu = pmu_ptr;
break;
+ case IMC_DOMAIN_TRACE:
+ /* Update the pmu name */
+ pmu_ptr->pmu.name = kasprintf(GFP_KERNEL, "%s%s", s, "_imc");
+ if (!pmu_ptr->pmu.name)
+ return -ENOMEM;
+
+ trace_imc_mem_size = pmu_ptr->counter_mem_size;
+ for_each_online_cpu(cpu) {
+ res = trace_imc_mem_alloc(cpu, trace_imc_mem_size);
+ if (res) {
+ cleanup_all_trace_imc_memory();
+ goto err;
+ }
+ }
+ break;
default:
return -EINVAL;
}
@@ -1384,6 +1467,14 @@ int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_id
goto err_free_mem;
}

+ break;
+ case IMC_DOMAIN_TRACE:
+ ret = trace_imc_cpu_init();
+ if (ret) {
+ cleanup_all_trace_imc_memory();
+ goto err_free_mem;
+ }
+
break;
default:
return -EINVAL; /* Unknown domain */
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 58a07948c76e..dedc9ae22662 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -284,6 +284,9 @@ static int opal_imc_counters_probe(struct platform_device *pdev)
case IMC_TYPE_THREAD:
domain = IMC_DOMAIN_THREAD;
break;
+ case IMC_TYPE_TRACE:
+ domain = IMC_DOMAIN_TRACE;
+ break;
default:
pr_warn("IMC Unknown Device type \n");
domain = -1;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index e0cd2baa8380..c471f2de878b 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -167,6 +167,7 @@ enum cpuhp_state {
CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE,
CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
+ CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE,
CPUHP_AP_WATCHDOG_ONLINE,
CPUHP_AP_WORKQUEUE_ONLINE,
CPUHP_AP_RCUTREE_ONLINE,
--
2.17.1


2018-12-19 06:06:25

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] powerpc/include: Add data structures and macros for IMC trace mode


On 14/12/18 2:41 PM, Anju T Sudhakar wrote:
> Add the macros needed for IMC (In-Memory Collection Counters) trace-mode
> and data structure to hold the trace-imc record data.
> Also, add the new type "OPAL_IMC_COUNTERS_TRACE" in 'opal-api.h', since
> there is a new switch case added in the opal-calls for IMC.

Reviewed-by: Madhavan Srinivasan <[email protected]>

> Signed-off-by: Anju T Sudhakar <[email protected]>
> ---
> arch/powerpc/include/asm/imc-pmu.h | 39 +++++++++++++++++++++++++++++
> arch/powerpc/include/asm/opal-api.h | 1 +
> 2 files changed, 40 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
> index 69f516ecb2fd..7c2ef0e42661 100644
> --- a/arch/powerpc/include/asm/imc-pmu.h
> +++ b/arch/powerpc/include/asm/imc-pmu.h
> @@ -33,6 +33,7 @@
> */
> #define THREAD_IMC_LDBAR_MASK 0x0003ffffffffe000ULL
> #define THREAD_IMC_ENABLE 0x8000000000000000ULL
> +#define TRACE_IMC_ENABLE 0x4000000000000000ULL
>
> /*
> * For debugfs interface for imc-mode and imc-command
> @@ -59,6 +60,34 @@ struct imc_events {
> char *scale;
> };
>
> +/*
> + * Trace IMC hardware updates a 64bytes record on
> + * Core Performance Monitoring Counter (CPMC)
> + * overflow. Here is the layout for the trace imc record
> + *
> + * DW 0 : Timebase
> + * DW 1 : Program Counter
> + * DW 2 : PIDR information
> + * DW 3 : CPMC1
> + * DW 4 : CPMC2
> + * DW 5 : CPMC3
> + * Dw 6 : CPMC4
> + * DW 7 : Timebase
> + * .....
> + *
> + * The following is the data structure to hold trace imc data.
> + */
> +struct trace_imc_data {
> + u64 tb1;
> + u64 ip;
> + u64 val;
> + u64 cpmc1;
> + u64 cpmc2;
> + u64 cpmc3;
> + u64 cpmc4;
> + u64 tb2;
> +};
> +
> /* Event attribute array index */
> #define IMC_FORMAT_ATTR 0
> #define IMC_EVENT_ATTR 1
> @@ -68,6 +97,13 @@ struct imc_events {
> /* PMU Format attribute macros */
> #define IMC_EVENT_OFFSET_MASK 0xffffffffULL
>
> +/*
> + * Macro to mask bits 0:21 of first double word(which is the timebase) to
> + * compare with 8th double word (timebase) of trace imc record data.
> + */
> +#define IMC_TRACE_RECORD_TB1_MASK 0x3ffffffffffULL
> +
> +
> /*
> * Device tree parser code detects IMC pmu support and
> * registers new IMC pmus. This structure will hold the
> @@ -113,6 +149,7 @@ struct imc_pmu_ref {
>
> enum {
> IMC_TYPE_THREAD = 0x1,
> + IMC_TYPE_TRACE = 0x2,
> IMC_TYPE_CORE = 0x4,
> IMC_TYPE_CHIP = 0x10,
> };
> @@ -123,6 +160,8 @@ enum {
> #define IMC_DOMAIN_NEST 1
> #define IMC_DOMAIN_CORE 2
> #define IMC_DOMAIN_THREAD 3
> +/* For trace-imc the domain is still thread but it operates in trace-mode */
> +#define IMC_DOMAIN_TRACE 4
>
> extern int init_imc_pmu(struct device_node *parent,
> struct imc_pmu *pmu_ptr, int pmu_id);
> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index 870fb7b239ea..a4130b21b159 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -1118,6 +1118,7 @@ enum {
> enum {
> OPAL_IMC_COUNTERS_NEST = 1,
> OPAL_IMC_COUNTERS_CORE = 2,
> + OPAL_IMC_COUNTERS_TRACE = 3,
> };
>
>


2018-12-19 06:09:02

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] powerpc/perf: Rearrange setting of ldbar for thread-imc


On 14/12/18 2:41 PM, Anju T Sudhakar wrote:
> LDBAR holds the memory address allocated for each cpu. For thread-imc
> the mode bit (i.e bit 1) of LDBAR is set to accumulation.
> Currently, ldbar is loaded with per cpu memory address and mode set to
> accumulation at boot time.
>
> To enable trace-imc, the mode bit of ldbar should be set to 'trace'. So to
> accommodate trace-mode of IMC, reposition setting of ldbar for thread-imc
> to thread_imc_event_add(). Also reset ldbar at thread_imc_event_del().

Changes looks fine to me.

Reviewed-by: Madhavan Srinivasan <[email protected]>

> Signed-off-by: Anju T Sudhakar <[email protected]>
> ---
> arch/powerpc/perf/imc-pmu.c | 28 +++++++++++++++++-----------
> 1 file changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index f292a3f284f1..3bef46f8417d 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -806,8 +806,11 @@ static int core_imc_event_init(struct perf_event *event)
> }
>
> /*
> - * Allocates a page of memory for each of the online cpus, and write the
> - * physical base address of that page to the LDBAR for that cpu.
> + * Allocates a page of memory for each of the online cpus, and load
> + * LDBAR with 0.
> + * The physical base address of the page allocated for a cpu will be
> + * written to the LDBAR for that cpu, when the thread-imc event
> + * is added.
> *
> * LDBAR Register Layout:
> *
> @@ -825,7 +828,7 @@ static int core_imc_event_init(struct perf_event *event)
> */
> static int thread_imc_mem_alloc(int cpu_id, int size)
> {
> - u64 ldbar_value, *local_mem = per_cpu(thread_imc_mem, cpu_id);
> + u64 *local_mem = per_cpu(thread_imc_mem, cpu_id);
> int nid = cpu_to_node(cpu_id);
>
> if (!local_mem) {
> @@ -842,9 +845,7 @@ static int thread_imc_mem_alloc(int cpu_id, int size)
> per_cpu(thread_imc_mem, cpu_id) = local_mem;
> }
>
> - ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | THREAD_IMC_ENABLE;
> -
> - mtspr(SPRN_LDBAR, ldbar_value);
> + mtspr(SPRN_LDBAR, 0);
> return 0;
> }
>
> @@ -995,6 +996,7 @@ static int thread_imc_event_add(struct perf_event *event, int flags)
> {
> int core_id;
> struct imc_pmu_ref *ref;
> + u64 ldbar_value, *local_mem = per_cpu(thread_imc_mem, smp_processor_id());
>
> if (flags & PERF_EF_START)
> imc_event_start(event, flags);
> @@ -1003,6 +1005,9 @@ static int thread_imc_event_add(struct perf_event *event, int flags)
> return -EINVAL;
>
> core_id = smp_processor_id() / threads_per_core;
> + ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | THREAD_IMC_ENABLE;
> + mtspr(SPRN_LDBAR, ldbar_value);
> +
> /*
> * imc pmus are enabled only when it is used.
> * See if this is triggered for the first time.
> @@ -1034,11 +1039,7 @@ static void thread_imc_event_del(struct perf_event *event, int flags)
> int core_id;
> struct imc_pmu_ref *ref;
>
> - /*
> - * Take a snapshot and calculate the delta and update
> - * the event counter values.
> - */
> - imc_event_update(event);
> + mtspr(SPRN_LDBAR, 0);
>
> core_id = smp_processor_id() / threads_per_core;
> ref = &core_imc_refc[core_id];
> @@ -1057,6 +1058,11 @@ static void thread_imc_event_del(struct perf_event *event, int flags)
> ref->refc = 0;
> }
> mutex_unlock(&ref->lock);
> + /*
> + * Take a snapshot and calculate the delta and update
> + * the event counter values.
> + */
> + imc_event_update(event);
> }
>
> /* update_pmu_ops : Populate the appropriate operations for "pmu" */


2018-12-19 06:10:48

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] powerpc/perf: Trace imc events detection and cpuhotplug


On 14/12/18 2:41 PM, Anju T Sudhakar wrote:
> Patch detects trace-imc events, does memory initilizations for each online
> cpu, and registers cpuhotplug call-backs.

Reviewed-by: Madhavan Srinivasan <[email protected]>

> Signed-off-by: Anju T Sudhakar <[email protected]>
> ---
> arch/powerpc/perf/imc-pmu.c | 91 +++++++++++++++++++++++
> arch/powerpc/platforms/powernv/opal-imc.c | 3 +
> include/linux/cpuhotplug.h | 1 +
> 3 files changed, 95 insertions(+)
>
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index 5ca80545a849..1f09265c8fb0 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -43,6 +43,10 @@ static DEFINE_PER_CPU(u64 *, thread_imc_mem);
> static struct imc_pmu *thread_imc_pmu;
> static int thread_imc_mem_size;
>
> +/* Trace IMC data structures */
> +static DEFINE_PER_CPU(u64 *, trace_imc_mem);
> +static int trace_imc_mem_size;
> +
> static struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
> {
> return container_of(event->pmu, struct imc_pmu, pmu);
> @@ -1068,6 +1072,54 @@ static void thread_imc_event_del(struct perf_event *event, int flags)
> imc_event_update(event);
> }
>
> +/*
> + * Allocate a page of memory for each cpu, and load LDBAR with 0.
> + */
> +static int trace_imc_mem_alloc(int cpu_id, int size)
> +{
> + u64 *local_mem = per_cpu(trace_imc_mem, cpu_id);
> + int phys_id = cpu_to_node(cpu_id), rc = 0;
> +
> + if (!local_mem) {
> + local_mem = page_address(alloc_pages_node(phys_id,
> + GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE |
> + __GFP_NOWARN, get_order(size)));
> + if (!local_mem)
> + return -ENOMEM;
> + per_cpu(trace_imc_mem, cpu_id) = local_mem;
> +
> + /* Initialise the counters for trace mode */
> + rc = opal_imc_counters_init(OPAL_IMC_COUNTERS_TRACE, __pa((void *)local_mem),
> + get_hard_smp_processor_id(cpu_id));
> + if (rc) {
> + pr_info("IMC:opal init failed for trace imc\n");
> + return rc;
> + }
> + }
> +
> + mtspr(SPRN_LDBAR, 0);
> + return 0;
> +}
> +
> +static int ppc_trace_imc_cpu_online(unsigned int cpu)
> +{
> + return trace_imc_mem_alloc(cpu, trace_imc_mem_size);
> +}
> +
> +static int ppc_trace_imc_cpu_offline(unsigned int cpu)
> +{
> + mtspr(SPRN_LDBAR, 0);
> + return 0;
> +}
> +
> +static int trace_imc_cpu_init(void)
> +{
> + return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE,
> + "perf/powerpc/imc_trace:online",
> + ppc_trace_imc_cpu_online,
> + ppc_trace_imc_cpu_offline);
> +}
> +
> /* update_pmu_ops : Populate the appropriate operations for "pmu" */
> static int update_pmu_ops(struct imc_pmu *pmu)
> {
> @@ -1189,6 +1241,17 @@ static void cleanup_all_thread_imc_memory(void)
> }
> }
>
> +static void cleanup_all_trace_imc_memory(void)
> +{
> + int i, order = get_order(trace_imc_mem_size);
> +
> + for_each_online_cpu(i) {
> + if (per_cpu(trace_imc_mem, i))
> + free_pages((u64)per_cpu(trace_imc_mem, i), order);
> +
> + }
> +}
> +
> /* Function to free the attr_groups which are dynamically allocated */
> static void imc_common_mem_free(struct imc_pmu *pmu_ptr)
> {
> @@ -1230,6 +1293,11 @@ static void imc_common_cpuhp_mem_free(struct imc_pmu *pmu_ptr)
> cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE);
> cleanup_all_thread_imc_memory();
> }
> +
> + if (pmu_ptr->domain == IMC_DOMAIN_TRACE) {
> + cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE);
> + cleanup_all_trace_imc_memory();
> + }
> }
>
> /*
> @@ -1312,6 +1380,21 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr, struct device_node *parent,
>
> thread_imc_pmu = pmu_ptr;
> break;
> + case IMC_DOMAIN_TRACE:
> + /* Update the pmu name */
> + pmu_ptr->pmu.name = kasprintf(GFP_KERNEL, "%s%s", s, "_imc");
> + if (!pmu_ptr->pmu.name)
> + return -ENOMEM;
> +
> + trace_imc_mem_size = pmu_ptr->counter_mem_size;
> + for_each_online_cpu(cpu) {
> + res = trace_imc_mem_alloc(cpu, trace_imc_mem_size);
> + if (res) {
> + cleanup_all_trace_imc_memory();
> + goto err;
> + }
> + }
> + break;
> default:
> return -EINVAL;
> }
> @@ -1384,6 +1467,14 @@ int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_id
> goto err_free_mem;
> }
>
> + break;
> + case IMC_DOMAIN_TRACE:
> + ret = trace_imc_cpu_init();
> + if (ret) {
> + cleanup_all_trace_imc_memory();
> + goto err_free_mem;
> + }
> +
> break;
> default:
> return -EINVAL; /* Unknown domain */
> diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
> index 58a07948c76e..dedc9ae22662 100644
> --- a/arch/powerpc/platforms/powernv/opal-imc.c
> +++ b/arch/powerpc/platforms/powernv/opal-imc.c
> @@ -284,6 +284,9 @@ static int opal_imc_counters_probe(struct platform_device *pdev)
> case IMC_TYPE_THREAD:
> domain = IMC_DOMAIN_THREAD;
> break;
> + case IMC_TYPE_TRACE:
> + domain = IMC_DOMAIN_TRACE;
> + break;
> default:
> pr_warn("IMC Unknown Device type \n");
> domain = -1;
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index e0cd2baa8380..c471f2de878b 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -167,6 +167,7 @@ enum cpuhp_state {
> CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE,
> CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
> CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
> + CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE,
> CPUHP_AP_WATCHDOG_ONLINE,
> CPUHP_AP_WORKQUEUE_ONLINE,
> CPUHP_AP_RCUTREE_ONLINE,


2018-12-19 06:11:15

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] powerpc/perf: Trace imc PMU functions


On 14/12/18 2:41 PM, Anju T Sudhakar wrote:
> Add PMU functions to support trace-imc.

Reviewed-by: Madhavan Srinivasan <[email protected]>

>
> Signed-off-by: Anju T Sudhakar <[email protected]>
> ---
> arch/powerpc/perf/imc-pmu.c | 175 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 175 insertions(+)
>
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index 1f09265c8fb0..32ff0e449fca 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -1120,6 +1120,173 @@ static int trace_imc_cpu_init(void)
> ppc_trace_imc_cpu_offline);
> }
>
> +static u64 get_trace_imc_event_base_addr(void)
> +{
> + return (u64)per_cpu(trace_imc_mem, smp_processor_id());
> +}
> +
> +/*
> + * Function to parse trace-imc data obtained
> + * and to prepare the perf sample.
> + */
> +static int trace_imc_prepare_sample(struct trace_imc_data *mem,
> + struct perf_sample_data *data,
> + u64 *prev_tb,
> + struct perf_event_header *header,
> + struct perf_event *event)
> +{
> + /* Sanity checks for a valid record */
> + if (be64_to_cpu(READ_ONCE(mem->tb1)) > *prev_tb)
> + *prev_tb = be64_to_cpu(READ_ONCE(mem->tb1));
> + else
> + return -EINVAL;
> +
> + if ((be64_to_cpu(READ_ONCE(mem->tb1)) & IMC_TRACE_RECORD_TB1_MASK) !=
> + be64_to_cpu(READ_ONCE(mem->tb2)))
> + return -EINVAL;
> +
> + /* Prepare perf sample */
> + data->ip = be64_to_cpu(READ_ONCE(mem->ip));
> + data->period = event->hw.last_period;
> +
> + header->type = PERF_RECORD_SAMPLE;
> + header->size = sizeof(*header) + event->header_size;
> + header->misc = 0;
> +
> + if (is_kernel_addr(data->ip))
> + header->misc |= PERF_RECORD_MISC_KERNEL;
> + else
> + header->misc |= PERF_RECORD_MISC_USER;
> +
> + perf_event_header__init_id(header, data, event);
> +
> + return 0;
> +}
> +
> +static void dump_trace_imc_data(struct perf_event *event)
> +{
> + struct trace_imc_data *mem;
> + int i, ret;
> + u64 prev_tb = 0;
> +
> + mem = (struct trace_imc_data *)get_trace_imc_event_base_addr();
> + for (i = 0; i < (trace_imc_mem_size / sizeof(struct trace_imc_data));
> + i++, mem++) {
> + struct perf_sample_data data;
> + struct perf_event_header header;
> +
> + ret = trace_imc_prepare_sample(mem, &data, &prev_tb, &header, event);
> + if (ret) /* Exit, if not a valid record */
> + break;
> + else {
> + /* If this is a valid record, create the sample */
> + struct perf_output_handle handle;
> +
> + if (perf_output_begin(&handle, event, header.size))
> + return;
> +
> + perf_output_sample(&handle, &header, &data, event);
> + perf_output_end(&handle);
> + }
> + }
> +}
> +
> +static int trace_imc_event_add(struct perf_event *event, int flags)
> +{
> + /* Enable the sched_task to start the engine */
> + perf_sched_cb_inc(event->ctx->pmu);
> + return 0;
> +}
> +
> +static void trace_imc_event_read(struct perf_event *event)
> +{
> + dump_trace_imc_data(event);
> +}
> +
> +static void trace_imc_event_stop(struct perf_event *event, int flags)
> +{
> + trace_imc_event_read(event);
> +}
> +
> +static void trace_imc_event_start(struct perf_event *event, int flags)
> +{
> + return;
> +}
> +
> +static void trace_imc_event_del(struct perf_event *event, int flags)
> +{
> + perf_sched_cb_dec(event->ctx->pmu);
> +}
> +
> +void trace_imc_pmu_sched_task(struct perf_event_context *ctx,
> + bool sched_in)
> +{
> + int core_id = smp_processor_id() / threads_per_core;
> + struct imc_pmu_ref *ref;
> + u64 local_mem, ldbar_value;
> +
> + /* Set trace-imc bit in ldbar and load ldbar with per-thread memory address */
> + local_mem = get_trace_imc_event_base_addr();
> + ldbar_value = ((u64)local_mem & THREAD_IMC_LDBAR_MASK) | TRACE_IMC_ENABLE;
> +
> + ref = &core_imc_refc[core_id];
> + if (!ref)
> + return;
> +
> + if (sched_in) {
> + mtspr(SPRN_LDBAR, ldbar_value);
> + mutex_lock(&ref->lock);
> + if (ref->refc == 0) {
> + if (opal_imc_counters_start(OPAL_IMC_COUNTERS_TRACE,
> + get_hard_smp_processor_id(smp_processor_id()))) {
> + mutex_unlock(&ref->lock);
> + pr_err("trace-imc: Unable to start the counters for core %d\n", core_id);
> + mtspr(SPRN_LDBAR, 0);
> + return;
> + }
> + }
> + ++ref->refc;
> + mutex_unlock(&ref->lock);
> + } else {
> + mtspr(SPRN_LDBAR, 0);
> + mutex_lock(&ref->lock);
> + ref->refc--;
> + if (ref->refc == 0) {
> + if (opal_imc_counters_stop(OPAL_IMC_COUNTERS_TRACE,
> + get_hard_smp_processor_id(smp_processor_id()))) {
> + mutex_unlock(&ref->lock);
> + pr_err("trace-imc: Unable to stop the counters for core %d\n", core_id);
> + return;
> + }
> + } else if (ref->refc < 0) {
> + ref->refc = 0;
> + }
> + mutex_unlock(&ref->lock);
> + }
> + return;
> +}
> +
> +static int trace_imc_event_init(struct perf_event *event)
> +{
> + struct task_struct *target;
> +
> + if (event->attr.type != event->pmu->type)
> + return -ENOENT;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EACCES;
> +
> + /* Return if this is a couting event */
> + if (event->attr.sample_period == 0)
> + return -ENOENT;
> +
> + event->hw.idx = -1;
> + target = event->hw.target;
> +
> + event->pmu->task_ctx_nr = perf_hw_context;
> + return 0;
> +}
> +
> /* update_pmu_ops : Populate the appropriate operations for "pmu" */
> static int update_pmu_ops(struct imc_pmu *pmu)
> {
> @@ -1149,6 +1316,14 @@ static int update_pmu_ops(struct imc_pmu *pmu)
> pmu->pmu.cancel_txn = thread_imc_pmu_cancel_txn;
> pmu->pmu.commit_txn = thread_imc_pmu_commit_txn;
> break;
> + case IMC_DOMAIN_TRACE:
> + pmu->pmu.event_init = trace_imc_event_init;
> + pmu->pmu.add = trace_imc_event_add;
> + pmu->pmu.del = trace_imc_event_del;
> + pmu->pmu.start = trace_imc_event_start;
> + pmu->pmu.stop = trace_imc_event_stop;
> + pmu->pmu.read = trace_imc_event_read;
> + pmu->pmu.sched_task = trace_imc_pmu_sched_task;
> default:
> break;
> }