The current method for allocating trace source ID values to sources is
to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10).
The STM is allocated ID 0x1.
This fixed algorithm is used in both the CoreSight driver code, and by
perf when writing the trace metadata in the AUXTRACE_INFO record.
The method needs replacing as currently:-
1. It is inefficient in using available IDs.
2. Does not scale to larger systems with many cores and the algorithm
has no limits so will generate invalid trace IDs for cpu number > 44.
Additionally requirements to allocate additional system IDs on some
systems have been seen.
This patch set introduces an API that allows the allocation of trace IDs
in a dynamic manner.
Architecturally reserved IDs are never allocated, and the system is
limited to allocating only valid IDs.
Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use
the new API.
For the ETMx.x devices IDs are allocated on certain events
a) When using sysfs, an ID will be allocated on hardware enable, or a read of
sysfs TRCTRACEID register and freed when the sysfs reset is written.
b) When using perf, ID is allocated on hardware enable, and freed on
hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet.
The ID allocator is notified when perf sessions start and stop
so CPU based IDs are kept constant throughout any perf session.
Note: This patchset breaks backward compatibility for perf record and
perf report.
Because the method for generating the AUXTRACE_INFO meta data has
changed, using an older perf record will result in metadata that
does not match the trace IDs used in the recorded trace data.
This mismatch will cause subsequent decode to fail.
The version of the AUXTRACE_INFO has been updated to reflect the fact that
the trace source IDs are no longer present in the metadata. This will
mean older versions of perf report cannot decode the file.
Applies to coresight/next [c06475910b52]
Tested on DB410c
Changes since v1:
(after feedback & discussion with Mathieu & Suzuki).
1) API has changed. The global trace ID map is managed internally, so it
is no longer passed in to the API functions.
2) perf record does not use sysfs to find the trace IDs. These are now
output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report
have been updated accordingly to generate and handle these events.
Mike Leach (13):
coresight: trace-id: Add API to dynamically assign Trace ID values
coresight: trace-id: update CoreSight core to use Trace ID API
coresight: stm: Update STM driver to use Trace ID API
coresight: etm4x: Update ETM4 driver to use Trace ID API
coresight: etm3x: Update ETM3 driver to use Trace ID API
coresight: etmX.X: stm: Remove unused legacy source Trace ID ops
coresight: perf: traceid: Add perf notifiers for Trace ID
perf: cs-etm: Move mapping of Trace ID and cpu into helper function
perf: cs-etm: Update record event to use new Trace ID protocol
kernel: events: Export perf_report_aux_output_id()
perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet
coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID
coresight: trace-id: Add debug & test macros to Trace ID allocation
drivers/hwtracing/coresight/Makefile | 2 +-
drivers/hwtracing/coresight/coresight-core.c | 49 +---
.../hwtracing/coresight/coresight-etm-perf.c | 17 ++
drivers/hwtracing/coresight/coresight-etm.h | 3 +-
.../coresight/coresight-etm3x-core.c | 85 +++---
.../coresight/coresight-etm3x-sysfs.c | 28 +-
.../coresight/coresight-etm4x-core.c | 65 ++++-
.../coresight/coresight-etm4x-sysfs.c | 32 ++-
drivers/hwtracing/coresight/coresight-etm4x.h | 3 +
drivers/hwtracing/coresight/coresight-stm.c | 49 +---
.../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++
.../hwtracing/coresight/coresight-trace-id.h | 65 +++++
include/linux/coresight-pmu.h | 31 ++-
include/linux/coresight.h | 3 -
kernel/events/core.c | 1 +
tools/include/linux/coresight-pmu.h | 31 ++-
tools/perf/arch/arm/util/cs-etm.c | 21 +-
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 +
tools/perf/util/cs-etm.c | 220 +++++++++++++--
tools/perf/util/cs-etm.h | 14 +-
20 files changed, 784 insertions(+), 207 deletions(-)
create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c
create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
--
2.17.1
The existing mechanism to assign Trace ID values to sources is limited
and does not scale for larger multicore / multi trace source systems.
The API introduces functions that reserve IDs based on availabilty
represented by a coresight_trace_id_map structure. This records the
used and free IDs in a bitmap.
CPU bound sources such as ETMs use the coresight_trace_id_get_cpu_id /
coresight_trace_id_put_cpu_id pair of functions. The API will record
the ID associated with the CPU. This ensures that the same ID will be
re-used while perf events are active on the CPU. The put_cpu_id function
will pend release of the ID until all perf cs_etm sessions are complete.
Non-cpu sources, such as the STM can use coresight_trace_id_get_system_id /
coresight_trace_id_put_system_id.
Signed-off-by: Mike Leach <[email protected]>
---
drivers/hwtracing/coresight/Makefile | 2 +-
.../hwtracing/coresight/coresight-trace-id.c | 230 ++++++++++++++++++
.../hwtracing/coresight/coresight-trace-id.h | 65 +++++
3 files changed, 296 insertions(+), 1 deletion(-)
create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c
create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
index b6c4a48140ec..329a0c704b87 100644
--- a/drivers/hwtracing/coresight/Makefile
+++ b/drivers/hwtracing/coresight/Makefile
@@ -6,7 +6,7 @@ obj-$(CONFIG_CORESIGHT) += coresight.o
coresight-y := coresight-core.o coresight-etm-perf.o coresight-platform.o \
coresight-sysfs.o coresight-syscfg.o coresight-config.o \
coresight-cfg-preload.o coresight-cfg-afdo.o \
- coresight-syscfg-configfs.o
+ coresight-syscfg-configfs.o coresight-trace-id.o
obj-$(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) += coresight-tmc.o
coresight-tmc-y := coresight-tmc-core.o coresight-tmc-etf.o \
coresight-tmc-etr.o
diff --git a/drivers/hwtracing/coresight/coresight-trace-id.c b/drivers/hwtracing/coresight/coresight-trace-id.c
new file mode 100644
index 000000000000..dac9c89ae00d
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-trace-id.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022, Linaro Limited, All rights reserved.
+ * Author: Mike Leach <[email protected]>
+ */
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/spinlock.h>
+
+#include "coresight-trace-id.h"
+
+/* need to keep data on ids & association with cpus. */
+struct cpu_id_info {
+ int id;
+ bool pend_rel;
+};
+
+/* default trace ID map. Used for systems that do not require per sink mappings */
+static struct coresight_trace_id_map id_map_default;
+
+/* maintain a record of the current mapping of cpu IDs */
+static DEFINE_PER_CPU(struct cpu_id_info, cpu_ids);
+
+/* perf session active flag */
+static int perf_cs_etm_session_active;
+
+/* lock to protect id_map and cpu data */
+static DEFINE_SPINLOCK(id_map_lock);
+
+/* ID 0 is reserved */
+#define CORESIGHT_TRACE_ID_RES_0 0
+
+/* ID 0x70 onwards are reserved */
+#define CORESIGHT_TRACE_ID_RES_RANGE_LO 0x70
+#define CORESIGHT_TRACE_ID_RES_RANGE_HI 0x7F
+
+#define IS_VALID_ID(id) \
+ ((id > CORESIGHT_TRACE_ID_RES_0) && (id < CORESIGHT_TRACE_ID_RES_RANGE_LO))
+
+static void coresight_trace_id_set_inuse(int id, struct coresight_trace_id_map *id_map)
+{
+ if (IS_VALID_ID(id))
+ set_bit(id, id_map->avail_ids);
+}
+
+static void coresight_trace_id_clear_inuse(int id, struct coresight_trace_id_map *id_map)
+{
+ if (IS_VALID_ID(id))
+ clear_bit(id, id_map->avail_ids);
+}
+
+static void coresight_trace_id_set_pend_rel(int id, struct coresight_trace_id_map *id_map)
+{
+ if (IS_VALID_ID(id))
+ set_bit(id, id_map->pend_rel_ids);
+}
+
+static void coresight_trace_id_clear_pend_rel(int id, struct coresight_trace_id_map *id_map)
+{
+ if (IS_VALID_ID(id))
+ clear_bit(id, id_map->pend_rel_ids);
+}
+
+static int coresight_trace_id_find_new_id(struct coresight_trace_id_map *id_map)
+{
+ int id;
+
+ id = find_first_zero_bit(id_map->avail_ids, CORESIGHT_TRACE_IDS_MAX);
+ if (id >= CORESIGHT_TRACE_IDS_MAX)
+ id = -EINVAL;
+ return id;
+}
+
+/* release all pending IDs for all current maps & clear CPU associations */
+static void coresight_trace_id_release_all_pending(void)
+{
+ struct coresight_trace_id_map *id_map = &id_map_default;
+ int cpu, bit;
+
+ for_each_set_bit(bit, id_map->pend_rel_ids, CORESIGHT_TRACE_IDS_MAX) {
+ clear_bit(bit, id_map->avail_ids);
+ clear_bit(bit, id_map->pend_rel_ids);
+ }
+
+ for_each_possible_cpu(cpu) {
+ if (per_cpu(cpu_ids, cpu).pend_rel) {
+ per_cpu(cpu_ids, cpu).pend_rel = false;
+ per_cpu(cpu_ids, cpu).id = 0;
+ }
+ }
+}
+
+static void coresight_trace_id_init_id_map(struct coresight_trace_id_map *id_map)
+{
+ int bit;
+
+ /* set all reserved bits as in-use */
+ set_bit(CORESIGHT_TRACE_ID_RES_0, id_map->avail_ids);
+ for (bit = CORESIGHT_TRACE_ID_RES_RANGE_LO;
+ bit <= CORESIGHT_TRACE_ID_RES_RANGE_HI; bit++)
+ set_bit(bit, id_map->avail_ids);
+}
+
+static int coresight_trace_id_map_get_cpu_id(int cpu, struct coresight_trace_id_map *id_map)
+{
+ unsigned long flags;
+ int id;
+
+ spin_lock_irqsave(&id_map_lock, flags);
+
+ /* check for existing allocation for this CPU */
+ id = per_cpu(cpu_ids, cpu).id;
+ if (id)
+ goto get_cpu_id_out;
+
+ /* find a new ID */
+ id = coresight_trace_id_find_new_id(id_map);
+ if (id < 0)
+ goto get_cpu_id_out;
+
+ /* got a valid new ID - save details */
+ per_cpu(cpu_ids, cpu).id = id;
+ per_cpu(cpu_ids, cpu).pend_rel = false;
+ coresight_trace_id_set_inuse(id, id_map);
+ coresight_trace_id_clear_pend_rel(id, id_map);
+
+get_cpu_id_out:
+ spin_unlock_irqrestore(&id_map_lock, flags);
+ return id;
+}
+
+static void coresight_trace_id_map_put_cpu_id(int cpu, struct coresight_trace_id_map *id_map)
+{
+ unsigned long flags;
+ int id;
+
+ spin_lock_irqsave(&id_map_lock, flags);
+ id = per_cpu(cpu_ids, cpu).id;
+ if (!id)
+ goto put_cpu_id_out;
+
+ if (perf_cs_etm_session_active) {
+ /* set release at pending if perf still active */
+ coresight_trace_id_set_pend_rel(id, id_map);
+ per_cpu(cpu_ids, cpu).pend_rel = true;
+ } else {
+ /* otherwise clear id */
+ coresight_trace_id_clear_inuse(id, id_map);
+ per_cpu(cpu_ids, cpu).id = 0;
+ }
+
+ put_cpu_id_out:
+ spin_unlock_irqrestore(&id_map_lock, flags);
+}
+
+static int coresight_trace_id_map_get_system_id(struct coresight_trace_id_map *id_map)
+{
+ unsigned long flags;
+ int id;
+
+ spin_lock_irqsave(&id_map_lock, flags);
+ id = coresight_trace_id_find_new_id(id_map);
+ if (id > 0)
+ coresight_trace_id_set_inuse(id, id_map);
+ spin_unlock_irqrestore(&id_map_lock, flags);
+
+ return id;
+}
+
+static void coresight_trace_id_map_put_system_id(struct coresight_trace_id_map *id_map, int id)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&id_map_lock, flags);
+ coresight_trace_id_clear_inuse(id, id_map);
+ spin_unlock_irqrestore(&id_map_lock, flags);
+}
+
+/* API functions */
+int coresight_trace_id_get_cpu_id(int cpu)
+{
+ return coresight_trace_id_map_get_cpu_id(cpu, &id_map_default);
+}
+EXPORT_SYMBOL_GPL(coresight_trace_id_get_cpu_id);
+
+void coresight_trace_id_put_cpu_id(int cpu)
+{
+ coresight_trace_id_map_put_cpu_id(cpu, &id_map_default);
+}
+EXPORT_SYMBOL_GPL(coresight_trace_id_put_cpu_id);
+
+int coresight_trace_id_get_system_id(void)
+{
+ return coresight_trace_id_map_get_system_id(&id_map_default);
+}
+EXPORT_SYMBOL_GPL(coresight_trace_id_get_system_id);
+
+void coresight_trace_id_put_system_id(int id)
+{
+ coresight_trace_id_map_put_system_id(&id_map_default, id);
+}
+EXPORT_SYMBOL_GPL(coresight_trace_id_put_system_id);
+
+void coresight_trace_id_perf_start(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&id_map_lock, flags);
+ perf_cs_etm_session_active++;
+ spin_unlock_irqrestore(&id_map_lock, flags);
+}
+EXPORT_SYMBOL_GPL(coresight_trace_id_perf_start);
+
+void coresight_trace_id_perf_stop(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&id_map_lock, flags);
+ perf_cs_etm_session_active--;
+ if (!perf_cs_etm_session_active)
+ coresight_trace_id_release_all_pending();
+ spin_unlock_irqrestore(&id_map_lock, flags);
+}
+EXPORT_SYMBOL_GPL(coresight_trace_id_perf_stop);
+
+void coresight_trace_id_init_default_map(void)
+{
+ coresight_trace_id_init_id_map(&id_map_default);
+}
+EXPORT_SYMBOL_GPL(coresight_trace_id_init_default_map);
diff --git a/drivers/hwtracing/coresight/coresight-trace-id.h b/drivers/hwtracing/coresight/coresight-trace-id.h
new file mode 100644
index 000000000000..63950087edf6
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-trace-id.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright(C) 2022 Linaro Limited. All rights reserved.
+ * Author: Mike Leach <[email protected]>
+ */
+
+#ifndef _CORESIGHT_TRACE_ID_H
+#define _CORESIGHT_TRACE_ID_H
+
+/*
+ * Coresight trace ID allocation API
+ *
+ * With multi cpu systems, and more additional trace sources a scalable
+ * trace ID reservation system is required.
+ *
+ * The system will allocate Ids on a demand basis, and allow them to be
+ * released when done.
+ *
+ * In order to ensure that a consistent cpu / ID matching is maintained
+ * throughout a perf cs_etm event session - a session in progress flag will
+ * be maintained, and released IDs not cleared until the perf session is
+ * complete. This allows the same CPU to be re-allocated its prior ID.
+ *
+ *
+ * Trace ID maps will be created and initialised to prevent architecturally
+ * reserved IDs from being allocated.
+ *
+ * API permits multiple maps to be maintained - for large systems where
+ * different sets of cpus trace into different independent sinks.
+ */
+
+#include <linux/bitops.h>
+#include <linux/types.h>
+
+
+/* architecturally we have 128 IDs some of which are reserved */
+#define CORESIGHT_TRACE_IDS_MAX 128
+
+/**
+ * Trace ID map.
+ *
+ * @avail_ids: Bitmap to register available (bit = 0) and in use (bit = 1) IDs.
+ * Initialised so that the reserved IDs are permanently marked as in use.
+ * @pend_rel_ids: CPU IDs that have been released by the trace source but not yet marked
+ * as available, to allow re-allocation to the same CPU during a perf session.
+ */
+struct coresight_trace_id_map {
+ DECLARE_BITMAP(avail_ids, CORESIGHT_TRACE_IDS_MAX);
+ DECLARE_BITMAP(pend_rel_ids, CORESIGHT_TRACE_IDS_MAX);
+};
+
+/* Allocate and release IDs for a single default trace ID map */
+int coresight_trace_id_get_cpu_id(int cpu);
+int coresight_trace_id_get_system_id(void);
+void coresight_trace_id_put_cpu_id(int cpu);
+void coresight_trace_id_put_system_id(int id);
+
+/* notifiers for perf session start and stop */
+void coresight_trace_id_perf_start(void);
+void coresight_trace_id_perf_stop(void);
+
+/* initialise the default ID map */
+void coresight_trace_id_init_default_map(void);
+
+#endif /* _CORESIGHT_TRACE_ID_H */
--
2.17.1
Use the perf_report_aux_output_id() call to output the CoreSight trace ID
and associated CPU as a PERF_RECORD_AUX_OUTPUT_HW_ID record in the
perf.data file.
Signed-off-by: Mike Leach <[email protected]>
---
drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++
include/linux/coresight-pmu.h | 14 ++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index ad3fdc07c60b..531f5d42272b 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -4,6 +4,7 @@
* Author: Mathieu Poirier <[email protected]>
*/
+#include <linux/bitfield.h>
#include <linux/coresight.h>
#include <linux/coresight-pmu.h>
#include <linux/cpumask.h>
@@ -437,6 +438,7 @@ static void etm_event_start(struct perf_event *event, int flags)
struct perf_output_handle *handle = &ctxt->handle;
struct coresight_device *sink, *csdev = per_cpu(csdev_src, cpu);
struct list_head *path;
+ u64 hw_id;
if (!csdev)
goto fail;
@@ -482,6 +484,11 @@ static void etm_event_start(struct perf_event *event, int flags)
if (source_ops(csdev)->enable(csdev, event, CS_MODE_PERF))
goto fail_disable_path;
+ /* output cpu / trace ID in perf record */
+ hw_id = FIELD_PREP(CS_AUX_HW_ID_VERSION_MASK, CS_AUX_HW_ID_CURR_VERSION) |
+ FIELD_PREP(CS_AUX_HW_ID_TRACE_ID_MASK, coresight_trace_id_get_cpu_id(cpu));
+ perf_report_aux_output_id(event, hw_id);
+
out:
/* Tell the perf core the event is alive */
event->hw.state = 0;
@@ -600,6 +607,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
/* Disabling the path make its elements available to other sessions */
coresight_disable_path(path);
+
+ /* release the trace ID we read on event start */
+ coresight_trace_id_put_cpu_id(cpu);
}
static int etm_event_add(struct perf_event *event, int mode)
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
index 9f7ee380266b..5572d0e10822 100644
--- a/include/linux/coresight-pmu.h
+++ b/include/linux/coresight-pmu.h
@@ -7,6 +7,8 @@
#ifndef _LINUX_CORESIGHT_PMU_H
#define _LINUX_CORESIGHT_PMU_H
+#include <linux/bits.h>
+
#define CORESIGHT_ETM_PMU_NAME "cs_etm"
/*
@@ -38,4 +40,16 @@
#define ETM4_CFG_BIT_RETSTK 12
#define ETM4_CFG_BIT_VMID_OPT 15
+/*
+ * Interpretation of the PERF_RECORD_AUX_OUTPUT_HW_ID payload.
+ * Used to associate a CPU with the CoreSight Trace ID.
+ * [63:16] - unused SBZ
+ * [15:08] - Trace ID
+ * [07:00] - Version
+ */
+#define CS_AUX_HW_ID_VERSION_MASK GENMASK_ULL(7, 0)
+#define CS_AUX_HW_ID_TRACE_ID_MASK GENMASK_ULL(15, 8)
+
+#define CS_AUX_HW_ID_CURR_VERSION 0
+
#endif
--
2.17.1
CoreSight trace being updated to use the perf_report_aux_output_id()
in a similar way to intel-pt.
This function in needs export visibility to allow it to be called from
kernel loadable modules, which CoreSight may configured to be built as.
Signed-off-by: Mike Leach <[email protected]>
---
kernel/events/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 80782cddb1da..f5835e5833cd 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9117,6 +9117,7 @@ void perf_report_aux_output_id(struct perf_event *event, u64 hw_id)
perf_output_end(&handle);
}
+EXPORT_SYMBOL_GPL(perf_report_aux_output_id);
static int
__perf_event_account_interrupt(struct perf_event *event, int throttle)
--
2.17.1
Hi Mike,
Thanks for the patch, please find my comments inline.
On 04/07/2022 09:11, Mike Leach wrote:
> The existing mechanism to assign Trace ID values to sources is limited
> and does not scale for larger multicore / multi trace source systems.
>
> The API introduces functions that reserve IDs based on availabilty
> represented by a coresight_trace_id_map structure. This records the
> used and free IDs in a bitmap.
>
> CPU bound sources such as ETMs use the coresight_trace_id_get_cpu_id /
> coresight_trace_id_put_cpu_id pair of functions. The API will record
> the ID associated with the CPU. This ensures that the same ID will be
> re-used while perf events are active on the CPU. The put_cpu_id function
> will pend release of the ID until all perf cs_etm sessions are complete.
>
> Non-cpu sources, such as the STM can use coresight_trace_id_get_system_id /
> coresight_trace_id_put_system_id.
>
> Signed-off-by: Mike Leach <[email protected]>
> ---
> drivers/hwtracing/coresight/Makefile | 2 +-
> .../hwtracing/coresight/coresight-trace-id.c | 230 ++++++++++++++++++
> .../hwtracing/coresight/coresight-trace-id.h | 65 +++++
> 3 files changed, 296 insertions(+), 1 deletion(-)
> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c
> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
>
> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
> index b6c4a48140ec..329a0c704b87 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -6,7 +6,7 @@ obj-$(CONFIG_CORESIGHT) += coresight.o
> coresight-y := coresight-core.o coresight-etm-perf.o coresight-platform.o \
> coresight-sysfs.o coresight-syscfg.o coresight-config.o \
> coresight-cfg-preload.o coresight-cfg-afdo.o \
> - coresight-syscfg-configfs.o
> + coresight-syscfg-configfs.o coresight-trace-id.o
> obj-$(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) += coresight-tmc.o
> coresight-tmc-y := coresight-tmc-core.o coresight-tmc-etf.o \
> coresight-tmc-etr.o
> diff --git a/drivers/hwtracing/coresight/coresight-trace-id.c b/drivers/hwtracing/coresight/coresight-trace-id.c
> new file mode 100644
> index 000000000000..dac9c89ae00d
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-trace-id.c
> @@ -0,0 +1,230 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022, Linaro Limited, All rights reserved.
> + * Author: Mike Leach <[email protected]>
> + */
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/spinlock.h>
> +
> +#include "coresight-trace-id.h"
> +
> +/* need to keep data on ids & association with cpus. */
> +struct cpu_id_info {
> + int id;
> + bool pend_rel;
> +};
> +
> +/* default trace ID map. Used for systems that do not require per sink mappings */
> +static struct coresight_trace_id_map id_map_default;
> +
> +/* maintain a record of the current mapping of cpu IDs */
> +static DEFINE_PER_CPU(struct cpu_id_info, cpu_ids);
> +
> +/* perf session active flag */
> +static int perf_cs_etm_session_active;
> +
> +/* lock to protect id_map and cpu data */
> +static DEFINE_SPINLOCK(id_map_lock);
> +
> +/* ID 0 is reserved */
> +#define CORESIGHT_TRACE_ID_RES_0 0
> +
> +/* ID 0x70 onwards are reserved */
> +#define CORESIGHT_TRACE_ID_RES_RANGE_LO 0x70
> +#define CORESIGHT_TRACE_ID_RES_RANGE_HI 0x7F
Since this range is at the end of top, we could clip the
MAX_IDS to 0x70 and skip all these unnecessary checks and reservations.
Also, by modifying the find_bit and for_each_bit slightly we could
get away with this reservation scheme and the IS_VALID(id) checks.
> +#define IS_VALID_ID(id) \
> + ((id > CORESIGHT_TRACE_ID_RES_0) && (id < CORESIGHT_TRACE_ID_RES_RANGE_LO))
> +
> +static void coresight_trace_id_set_inuse(int id, struct coresight_trace_id_map *id_map)
> +{
> + if (IS_VALID_ID(id))
> + set_bit(id, id_map->avail_ids);
> +}
Please see my comment around the definition of avail_ids.
> +
> +static void coresight_trace_id_clear_inuse(int id, struct coresight_trace_id_map *id_map)
> +{
> + if (IS_VALID_ID(id))
> + clear_bit(id, id_map->avail_ids);
> +}
This could be :
coresight_trace_id_free_id()
> +
> +static void coresight_trace_id_set_pend_rel(int id, struct coresight_trace_id_map *id_map)
> +{
> + if (IS_VALID_ID(id))
> + set_bit(id, id_map->pend_rel_ids);
> +}
> +
> +static void coresight_trace_id_clear_pend_rel(int id, struct coresight_trace_id_map *id_map)
> +{
> + if (IS_VALID_ID(id))
> + clear_bit(id, id_map->pend_rel_ids);
> +}
> +
> +static int coresight_trace_id_find_new_id(struct coresight_trace_id_map *id_map)
minor nit: Could we call this :
coresight_trace_id_alloc_new_id(id_map) and
> +{
> + int id;
> +
> + id = find_first_zero_bit(id_map->avail_ids, CORESIGHT_TRACE_IDS_MAX);
minor nit: You could also do, to explicitly skip 0.
id = find_next_zero_bit(id_map->avail_ids, 1, CORESIGHT_TRACE_IDS_MAX);
> + if (id >= CORESIGHT_TRACE_IDS_MAX)
> + id = -EINVAL;
Could we also mark the id as in use here itself ? All callers of this
function have to do that explicitly, anyways.
> + return id;
> +}
> +
> +/* release all pending IDs for all current maps & clear CPU associations */
> +static void coresight_trace_id_release_all_pending(void)
> +{
> + struct coresight_trace_id_map *id_map = &id_map_default;
> + int cpu, bit;
> +
int cpu, bit = 1;
> + for_each_set_bit(bit, id_map->pend_rel_ids, CORESIGHT_TRACE_IDS_MAX) {
for_each_set_bit_from(bit, id_map...)
> + clear_bit(bit, id_map->avail_ids);
> + clear_bit(bit, id_map->pend_rel_ids);
> + }
> +
> + for_each_possible_cpu(cpu) {
> + if (per_cpu(cpu_ids, cpu).pend_rel) {
> + per_cpu(cpu_ids, cpu).pend_rel = false;
> + per_cpu(cpu_ids, cpu).id = 0;
> + }
> + }
> +}
> +
> +static void coresight_trace_id_init_id_map(struct coresight_trace_id_map *id_map)
> +{
> + int bit;
> +
> + /* set all reserved bits as in-use */
> + set_bit(CORESIGHT_TRACE_ID_RES_0, id_map->avail_ids);
> + for (bit = CORESIGHT_TRACE_ID_RES_RANGE_LO;
> + bit <= CORESIGHT_TRACE_ID_RES_RANGE_HI; bit++)
> + set_bit(bit, id_map->avail_ids);
> +}
> +
> +static int coresight_trace_id_map_get_cpu_id(int cpu, struct coresight_trace_id_map *id_map)
> +{
> + unsigned long flags;
> + int id;
> +
> + spin_lock_irqsave(&id_map_lock, flags);
> +
> + /* check for existing allocation for this CPU */
> + id = per_cpu(cpu_ids, cpu).id;
> + if (id)
> + goto get_cpu_id_out;
> +
> + /* find a new ID */
> + id = coresight_trace_id_find_new_id(id_map);
> + if (id < 0)
> + goto get_cpu_id_out;
> +
> + /* got a valid new ID - save details */
> + per_cpu(cpu_ids, cpu).id = id;
> + per_cpu(cpu_ids, cpu).pend_rel = false;
> + coresight_trace_id_set_inuse(id, id_map);
> + coresight_trace_id_clear_pend_rel(id, id_map);
> +
> +get_cpu_id_out:
> + spin_unlock_irqrestore(&id_map_lock, flags);
> + return id;
> +}
> +
> +static void coresight_trace_id_map_put_cpu_id(int cpu, struct coresight_trace_id_map *id_map)
> +{
> + unsigned long flags;
> + int id;
> +
> + spin_lock_irqsave(&id_map_lock, flags);
> + id = per_cpu(cpu_ids, cpu).id;
> + if (!id)
> + goto put_cpu_id_out;
> +
> + if (perf_cs_etm_session_active) {
> + /* set release at pending if perf still active */
> + coresight_trace_id_set_pend_rel(id, id_map);
> + per_cpu(cpu_ids, cpu).pend_rel = true;
> + } else {
> + /* otherwise clear id */
> + coresight_trace_id_clear_inuse(id, id_map);
> + per_cpu(cpu_ids, cpu).id = 0;
> + }
> +
> + put_cpu_id_out:
> + spin_unlock_irqrestore(&id_map_lock, flags);
> +}
> +
> +static int coresight_trace_id_map_get_system_id(struct coresight_trace_id_map *id_map)
> +{
> + unsigned long flags;
> + int id;
> +
> + spin_lock_irqsave(&id_map_lock, flags);
> + id = coresight_trace_id_find_new_id(id_map);
> + if (id > 0)
> + coresight_trace_id_set_inuse(id, id_map);
Please see my suggestion above on moving this to the place where we find
the bit.
> + spin_unlock_irqrestore(&id_map_lock, flags);
> +
> + return id;
> +}
> +
> +static void coresight_trace_id_map_put_system_id(struct coresight_trace_id_map *id_map, int id)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&id_map_lock, flags);
> + coresight_trace_id_clear_inuse(id, id_map);
> + spin_unlock_irqrestore(&id_map_lock, flags);
> +}
> +
> +/* API functions */
> +int coresight_trace_id_get_cpu_id(int cpu)
> +{
> + return coresight_trace_id_map_get_cpu_id(cpu, &id_map_default);
> +}
> +EXPORT_SYMBOL_GPL(coresight_trace_id_get_cpu_id);
> +
> +void coresight_trace_id_put_cpu_id(int cpu)
> +{
> + coresight_trace_id_map_put_cpu_id(cpu, &id_map_default);
> +}
> +EXPORT_SYMBOL_GPL(coresight_trace_id_put_cpu_id);
> +
> +int coresight_trace_id_get_system_id(void)
> +{
> + return coresight_trace_id_map_get_system_id(&id_map_default);
> +}
> +EXPORT_SYMBOL_GPL(coresight_trace_id_get_system_id);
> +
> +void coresight_trace_id_put_system_id(int id)
> +{
> + coresight_trace_id_map_put_system_id(&id_map_default, id);
> +}
> +EXPORT_SYMBOL_GPL(coresight_trace_id_put_system_id);
> +
> +void coresight_trace_id_perf_start(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&id_map_lock, flags);
> + perf_cs_etm_session_active++;
> + spin_unlock_irqrestore(&id_map_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(coresight_trace_id_perf_start);
> +
> +void coresight_trace_id_perf_stop(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&id_map_lock, flags);
> + perf_cs_etm_session_active--;
> + if (!perf_cs_etm_session_active)
> + coresight_trace_id_release_all_pending();
> + spin_unlock_irqrestore(&id_map_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(coresight_trace_id_perf_stop);
> +
> +void coresight_trace_id_init_default_map(void)
> +{
> + coresight_trace_id_init_id_map(&id_map_default);
> +}
> +EXPORT_SYMBOL_GPL(coresight_trace_id_init_default_map);
We may be able to get rid of this init. Otherwise we may convert this to
a module_initcall() in the worst case. No need to export this.
> diff --git a/drivers/hwtracing/coresight/coresight-trace-id.h b/drivers/hwtracing/coresight/coresight-trace-id.h
> new file mode 100644
> index 000000000000..63950087edf6
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-trace-id.h
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright(C) 2022 Linaro Limited. All rights reserved.
> + * Author: Mike Leach <[email protected]>
> + */
> +
> +#ifndef _CORESIGHT_TRACE_ID_H
> +#define _CORESIGHT_TRACE_ID_H
> +
> +/*
> + * Coresight trace ID allocation API
> + *
> + * With multi cpu systems, and more additional trace sources a scalable
> + * trace ID reservation system is required.
> + *
> + * The system will allocate Ids on a demand basis, and allow them to be
> + * released when done.
> + *
> + * In order to ensure that a consistent cpu / ID matching is maintained
> + * throughout a perf cs_etm event session - a session in progress flag will
> + * be maintained, and released IDs not cleared until the perf session is
> + * complete. This allows the same CPU to be re-allocated its prior ID.
> + *
> + *
> + * Trace ID maps will be created and initialised to prevent architecturally
> + * reserved IDs from being allocated.
> + *
> + * API permits multiple maps to be maintained - for large systems where
> + * different sets of cpus trace into different independent sinks.
> + */
Thanks for the detailed comment above.
> +
> +#include <linux/bitops.h>
> +#include <linux/types.h>
> +
> +
> +/* architecturally we have 128 IDs some of which are reserved */
> +#define CORESIGHT_TRACE_IDS_MAX 128
Could we restrict the CORESIGHT_TRACE_IDS_MAX to 0x70, clipping the
upper range of reserved ids ? That way, we could skip bothering about
checking it everywhere.
> +
> +/**
> + * Trace ID map.
> + *
> + * @avail_ids: Bitmap to register available (bit = 0) and in use (bit = 1) IDs.
> + * Initialised so that the reserved IDs are permanently marked as in use.
To be honest this inverses the intution. Could we instead name this
used_ids ?
i.e BIT(i) = 1 => implies trace id is in use.
> + * @pend_rel_ids: CPU IDs that have been released by the trace source but not yet marked
> + * as available, to allow re-allocation to the same CPU during a perf session.
> + */
> +struct coresight_trace_id_map {
> + DECLARE_BITMAP(avail_ids, CORESIGHT_TRACE_IDS_MAX);
> + DECLARE_BITMAP(pend_rel_ids, CORESIGHT_TRACE_IDS_MAX);
> +};
Also, the definitions are split between the .c and .h. Could we keep all
of them at one place, .h preferrably ? Or if this is not at all needed
for the consumers of the API, we should keep all of this in the .c file.
I guess in the future, with the sink specific scheme, we may need to
expose the helpers which accept an id_map. So may be even move it here.
Thanks
Suzuki
On 04/07/2022 09:11, Mike Leach wrote:
> CoreSight trace being updated to use the perf_report_aux_output_id()
> in a similar way to intel-pt.
>
> This function in needs export visibility to allow it to be called from
> kernel loadable modules, which CoreSight may configured to be built as.
>
> Signed-off-by: Mike Leach <[email protected]>
> ---
> kernel/events/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 80782cddb1da..f5835e5833cd 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9117,6 +9117,7 @@ void perf_report_aux_output_id(struct perf_event *event, u64 hw_id)
>
> perf_output_end(&handle);
> }
> +EXPORT_SYMBOL_GPL(perf_report_aux_output_id);
>
> static int
> __perf_event_account_interrupt(struct perf_event *event, int throttle)
Acked-by: Suzuki K Poulose <[email protected]>
On 04/07/2022 09:11, Mike Leach wrote:
> Use the perf_report_aux_output_id() call to output the CoreSight trace ID
> and associated CPU as a PERF_RECORD_AUX_OUTPUT_HW_ID record in the
> perf.data file.
>
> Signed-off-by: Mike Leach <[email protected]>
> ---
> drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++
> include/linux/coresight-pmu.h | 14 ++++++++++++++
> 2 files changed, 24 insertions(+)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index ad3fdc07c60b..531f5d42272b 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -4,6 +4,7 @@
> * Author: Mathieu Poirier <[email protected]>
> */
>
> +#include <linux/bitfield.h>
> #include <linux/coresight.h>
> #include <linux/coresight-pmu.h>
> #include <linux/cpumask.h>
> @@ -437,6 +438,7 @@ static void etm_event_start(struct perf_event *event, int flags)
> struct perf_output_handle *handle = &ctxt->handle;
> struct coresight_device *sink, *csdev = per_cpu(csdev_src, cpu);
> struct list_head *path;
> + u64 hw_id;
>
> if (!csdev)
> goto fail;
> @@ -482,6 +484,11 @@ static void etm_event_start(struct perf_event *event, int flags)
> if (source_ops(csdev)->enable(csdev, event, CS_MODE_PERF))
> goto fail_disable_path;
>
> + /* output cpu / trace ID in perf record */
> + hw_id = FIELD_PREP(CS_AUX_HW_ID_VERSION_MASK, CS_AUX_HW_ID_CURR_VERSION) |
> + FIELD_PREP(CS_AUX_HW_ID_TRACE_ID_MASK, coresight_trace_id_get_cpu_id(cpu));
> + perf_report_aux_output_id(event, hw_id);
> +
> out:
> /* Tell the perf core the event is alive */
> event->hw.state = 0;
> @@ -600,6 +607,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
>
> /* Disabling the path make its elements available to other sessions */
> coresight_disable_path(path);
> +
> + /* release the trace ID we read on event start */
> + coresight_trace_id_put_cpu_id(cpu);
> }
>
> static int etm_event_add(struct perf_event *event, int mode)
> diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
> index 9f7ee380266b..5572d0e10822 100644
> --- a/include/linux/coresight-pmu.h
> +++ b/include/linux/coresight-pmu.h
> @@ -7,6 +7,8 @@
> #ifndef _LINUX_CORESIGHT_PMU_H
> #define _LINUX_CORESIGHT_PMU_H
>
> +#include <linux/bits.h>
> +
> #define CORESIGHT_ETM_PMU_NAME "cs_etm"
>
> /*
> @@ -38,4 +40,16 @@
> #define ETM4_CFG_BIT_RETSTK 12
> #define ETM4_CFG_BIT_VMID_OPT 15
>
> +/*
> + * Interpretation of the PERF_RECORD_AUX_OUTPUT_HW_ID payload.
> + * Used to associate a CPU with the CoreSight Trace ID.
> + * [63:16] - unused SBZ
> + * [15:08] - Trace ID
> + * [07:00] - Version
Could we please re-arrange the fields, such that it is easier to
comprehend the TraceID looking at the raw trace dump ? Also to
accommodate the future changes.
e.g,
[15:00] - Trace ID /* For future expansion, if at all */
[59:16] - RES0
[63:60] - Trace_ID_Version
I think we *might* (not sure yet) end up adding "sinkid" when we have
sink specific allocation, so that we can associate the HW_ID of an event
to the "AUXTRACE" record (i.e., trace buffer).
So if we need to do that we could:
[15:00] - Trace ID /* For future expansion, if at all */
[47:16] - Trace Pool ID( == 0 if global, == sink_id if sink based)
[59:48] - RES0
[63:60] - Trace_ID_Version == 1
Or we could adopt the above straight away.
Thoughts ?
Suzuki
> + */
> +#define CS_AUX_HW_ID_VERSION_MASK GENMASK_ULL(7, 0)
> +#define CS_AUX_HW_ID_TRACE_ID_MASK GENMASK_ULL(15, 8)
> +
> +#define CS_AUX_HW_ID_CURR_VERSION 0
> +
> #endif
Hi Suzuki,
On Wed, 20 Jul 2022 at 10:30, Suzuki K Poulose <[email protected]> wrote:
>
> On 04/07/2022 09:11, Mike Leach wrote:
> > Use the perf_report_aux_output_id() call to output the CoreSight trace ID
> > and associated CPU as a PERF_RECORD_AUX_OUTPUT_HW_ID record in the
> > perf.data file.
> >
> > Signed-off-by: Mike Leach <[email protected]>
> > ---
> > drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++
> > include/linux/coresight-pmu.h | 14 ++++++++++++++
> > 2 files changed, 24 insertions(+)
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > index ad3fdc07c60b..531f5d42272b 100644
> > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > @@ -4,6 +4,7 @@
> > * Author: Mathieu Poirier <[email protected]>
> > */
> >
> > +#include <linux/bitfield.h>
> > #include <linux/coresight.h>
> > #include <linux/coresight-pmu.h>
> > #include <linux/cpumask.h>
> > @@ -437,6 +438,7 @@ static void etm_event_start(struct perf_event *event, int flags)
> > struct perf_output_handle *handle = &ctxt->handle;
> > struct coresight_device *sink, *csdev = per_cpu(csdev_src, cpu);
> > struct list_head *path;
> > + u64 hw_id;
> >
> > if (!csdev)
> > goto fail;
> > @@ -482,6 +484,11 @@ static void etm_event_start(struct perf_event *event, int flags)
> > if (source_ops(csdev)->enable(csdev, event, CS_MODE_PERF))
> > goto fail_disable_path;
> >
> > + /* output cpu / trace ID in perf record */
> > + hw_id = FIELD_PREP(CS_AUX_HW_ID_VERSION_MASK, CS_AUX_HW_ID_CURR_VERSION) |
> > + FIELD_PREP(CS_AUX_HW_ID_TRACE_ID_MASK, coresight_trace_id_get_cpu_id(cpu));
> > + perf_report_aux_output_id(event, hw_id);
> > +
> > out:
> > /* Tell the perf core the event is alive */
> > event->hw.state = 0;
> > @@ -600,6 +607,9 @@ static void etm_event_stop(struct perf_event *event, int mode)
> >
> > /* Disabling the path make its elements available to other sessions */
> > coresight_disable_path(path);
> > +
> > + /* release the trace ID we read on event start */
> > + coresight_trace_id_put_cpu_id(cpu);
> > }
> >
> > static int etm_event_add(struct perf_event *event, int mode)
> > diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
> > index 9f7ee380266b..5572d0e10822 100644
> > --- a/include/linux/coresight-pmu.h
> > +++ b/include/linux/coresight-pmu.h
> > @@ -7,6 +7,8 @@
> > #ifndef _LINUX_CORESIGHT_PMU_H
> > #define _LINUX_CORESIGHT_PMU_H
> >
> > +#include <linux/bits.h>
> > +
> > #define CORESIGHT_ETM_PMU_NAME "cs_etm"
> >
> > /*
> > @@ -38,4 +40,16 @@
> > #define ETM4_CFG_BIT_RETSTK 12
> > #define ETM4_CFG_BIT_VMID_OPT 15
> >
> > +/*
> > + * Interpretation of the PERF_RECORD_AUX_OUTPUT_HW_ID payload.
> > + * Used to associate a CPU with the CoreSight Trace ID.
> > + * [63:16] - unused SBZ
> > + * [15:08] - Trace ID
> > + * [07:00] - Version
>
> Could we please re-arrange the fields, such that it is easier to
> comprehend the TraceID looking at the raw trace dump ? Also to
> accommodate the future changes.
>
> e.g,
> [15:00] - Trace ID /* For future expansion, if at all */
> [59:16] - RES0
> [63:60] - Trace_ID_Version
>
> I think we *might* (not sure yet) end up adding "sinkid" when we have
> sink specific allocation, so that we can associate the HW_ID of an event
> to the "AUXTRACE" record (i.e., trace buffer).
>
If we go to per sink trace ID maps, then I can't see how we could
avoid needing some sort of ID in here, unless we can determine some
other method of specifying which CPUs traced into which trace buffer.
> So if we need to do that we could:
>
> [15:00] - Trace ID /* For future expansion, if at all */
> [47:16] - Trace Pool ID( == 0 if global, == sink_id if sink based)
> [59:48] - RES0
> [63:60] - Trace_ID_Version == 1
>
> Or we could adopt the above straight away.
>
I wouldn't want to commit to a size for the sink ID yet. And I would
leave trace ID at what it is for now (8 bits).
Make the fields represent what is and up-version and update when
changes are actually required.
I think this packet may be a candidate for delivering other trace
related info we may need in future - such as the timestamp source that
is being worked on?
Mike
> Thoughts ?
>
> Suzuki
>
> > + */
> > +#define CS_AUX_HW_ID_VERSION_MASK GENMASK_ULL(7, 0)
> > +#define CS_AUX_HW_ID_TRACE_ID_MASK GENMASK_ULL(15, 8)
> > +
> > +#define CS_AUX_HW_ID_CURR_VERSION 0
>
>
> > +
> > #endif
>
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
On 04/07/2022 09:11, Mike Leach wrote:
> The current method for allocating trace source ID values to sources is
> to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10).
> The STM is allocated ID 0x1.
>
> This fixed algorithm is used in both the CoreSight driver code, and by
> perf when writing the trace metadata in the AUXTRACE_INFO record.
>
> The method needs replacing as currently:-
> 1. It is inefficient in using available IDs.
> 2. Does not scale to larger systems with many cores and the algorithm
> has no limits so will generate invalid trace IDs for cpu number > 44.
>
> Additionally requirements to allocate additional system IDs on some
> systems have been seen.
>
> This patch set introduces an API that allows the allocation of trace IDs
> in a dynamic manner.
I've tested this with various commands like with per-thread mode, attaching,
running the tests and also Carsten's new tests. Apart from the possible
backwards compatibility issue and the minor code comments it looks good to
me.
>
> Architecturally reserved IDs are never allocated, and the system is
> limited to allocating only valid IDs.
>
> Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use
> the new API.
>
> For the ETMx.x devices IDs are allocated on certain events
> a) When using sysfs, an ID will be allocated on hardware enable, or a read of
> sysfs TRCTRACEID register and freed when the sysfs reset is written.
>
> b) When using perf, ID is allocated on hardware enable, and freed on
> hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet.
> The ID allocator is notified when perf sessions start and stop
> so CPU based IDs are kept constant throughout any perf session.
>
>
> Note: This patchset breaks backward compatibility for perf record and
> perf report.
>
> Because the method for generating the AUXTRACE_INFO meta data has
> changed, using an older perf record will result in metadata that
> does not match the trace IDs used in the recorded trace data.
> This mismatch will cause subsequent decode to fail.
>
> The version of the AUXTRACE_INFO has been updated to reflect the fact that
> the trace source IDs are no longer present in the metadata. This will
> mean older versions of perf report cannot decode the file.
>
> Applies to coresight/next [c06475910b52]
> Tested on DB410c
>
> Changes since v1:
> (after feedback & discussion with Mathieu & Suzuki).
>
> 1) API has changed. The global trace ID map is managed internally, so it
> is no longer passed in to the API functions.
>
> 2) perf record does not use sysfs to find the trace IDs. These are now
> output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report
> have been updated accordingly to generate and handle these events.
>
> Mike Leach (13):
> coresight: trace-id: Add API to dynamically assign Trace ID values
> coresight: trace-id: update CoreSight core to use Trace ID API
> coresight: stm: Update STM driver to use Trace ID API
> coresight: etm4x: Update ETM4 driver to use Trace ID API
> coresight: etm3x: Update ETM3 driver to use Trace ID API
> coresight: etmX.X: stm: Remove unused legacy source Trace ID ops
> coresight: perf: traceid: Add perf notifiers for Trace ID
> perf: cs-etm: Move mapping of Trace ID and cpu into helper function
> perf: cs-etm: Update record event to use new Trace ID protocol
> kernel: events: Export perf_report_aux_output_id()
> perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet
> coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID
> coresight: trace-id: Add debug & test macros to Trace ID allocation
>
> drivers/hwtracing/coresight/Makefile | 2 +-
> drivers/hwtracing/coresight/coresight-core.c | 49 +---
> .../hwtracing/coresight/coresight-etm-perf.c | 17 ++
> drivers/hwtracing/coresight/coresight-etm.h | 3 +-
> .../coresight/coresight-etm3x-core.c | 85 +++---
> .../coresight/coresight-etm3x-sysfs.c | 28 +-
> .../coresight/coresight-etm4x-core.c | 65 ++++-
> .../coresight/coresight-etm4x-sysfs.c | 32 ++-
> drivers/hwtracing/coresight/coresight-etm4x.h | 3 +
> drivers/hwtracing/coresight/coresight-stm.c | 49 +---
> .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++
> .../hwtracing/coresight/coresight-trace-id.h | 65 +++++
> include/linux/coresight-pmu.h | 31 ++-
> include/linux/coresight.h | 3 -
> kernel/events/core.c | 1 +
> tools/include/linux/coresight-pmu.h | 31 ++-
> tools/perf/arch/arm/util/cs-etm.c | 21 +-
> .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 +
> tools/perf/util/cs-etm.c | 220 +++++++++++++--
> tools/perf/util/cs-etm.h | 14 +-
> 20 files changed, 784 insertions(+), 207 deletions(-)
> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c
> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
>
Hi James,
Thanks for looking at this.
On Thu, 21 Jul 2022 at 11:27, James Clark <[email protected]> wrote:
>
>
>
> On 04/07/2022 09:11, Mike Leach wrote:
> > The current method for allocating trace source ID values to sources is
> > to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10).
> > The STM is allocated ID 0x1.
> >
> > This fixed algorithm is used in both the CoreSight driver code, and by
> > perf when writing the trace metadata in the AUXTRACE_INFO record.
> >
> > The method needs replacing as currently:-
> > 1. It is inefficient in using available IDs.
> > 2. Does not scale to larger systems with many cores and the algorithm
> > has no limits so will generate invalid trace IDs for cpu number > 44.
> >
> > Additionally requirements to allocate additional system IDs on some
> > systems have been seen.
> >
> > This patch set introduces an API that allows the allocation of trace IDs
> > in a dynamic manner.
>
> I've tested this with various commands like with per-thread mode, attaching,
> running the tests and also Carsten's new tests. Apart from the possible
> backwards compatibility issue and the minor code comments it looks good to
> me.
>
I've looked at the backwards compatibility issue. At present with the
current set
(K = kernel drivers, P-rec = perf record, P-rep = perf report)
::
K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK)
K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (fail)
K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (fail)
K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
So, with a P-rec generating v2 metadata, P rep will cleanly error out.
Where the Kernel ID version and the perf report ID version differ,
even the P rep v2 will fail, due to the IDs being different in the
file and actual drivers. These fails will simply look like no data
present.
There are two possible fixes that improve this:-
A) if the v2 kernel uses a sysfs flag to indicate new ID usage, then
if this is missing the new perf record can degrade to using the old
algorithm to put IDs directly into metadata as it assumes it is
running on a v1 kernel.
This fixes things then for the P-rep v2 that can look for this & we
know there will be no incoming ID packets.
B) P-rep v2 can look for new packets irrespective of incoming metadata
version, and if it sees them, override them
Compatibility matrix then looks like::
K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK)
K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (OK)
K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
There is no solution to using an old version of perf record on a new
kernel and getting the old version of perf report to correctly decode
the file.
Thoughts?
Mike
> >
> > Architecturally reserved IDs are never allocated, and the system is
> > limited to allocating only valid IDs.
> >
> > Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use
> > the new API.
> >
> > For the ETMx.x devices IDs are allocated on certain events
> > a) When using sysfs, an ID will be allocated on hardware enable, or a read of
> > sysfs TRCTRACEID register and freed when the sysfs reset is written.
> >
> > b) When using perf, ID is allocated on hardware enable, and freed on
> > hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet.
> > The ID allocator is notified when perf sessions start and stop
> > so CPU based IDs are kept constant throughout any perf session.
> >
> >
> > Note: This patchset breaks backward compatibility for perf record and
> > perf report.
> >
> > Because the method for generating the AUXTRACE_INFO meta data has
> > changed, using an older perf record will result in metadata that
> > does not match the trace IDs used in the recorded trace data.
> > This mismatch will cause subsequent decode to fail.
> >
> > The version of the AUXTRACE_INFO has been updated to reflect the fact that
> > the trace source IDs are no longer present in the metadata. This will
> > mean older versions of perf report cannot decode the file.
> >
> > Applies to coresight/next [c06475910b52]
> > Tested on DB410c
> >
> > Changes since v1:
> > (after feedback & discussion with Mathieu & Suzuki).
> >
> > 1) API has changed. The global trace ID map is managed internally, so it
> > is no longer passed in to the API functions.
> >
> > 2) perf record does not use sysfs to find the trace IDs. These are now
> > output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report
> > have been updated accordingly to generate and handle these events.
> >
> > Mike Leach (13):
> > coresight: trace-id: Add API to dynamically assign Trace ID values
> > coresight: trace-id: update CoreSight core to use Trace ID API
> > coresight: stm: Update STM driver to use Trace ID API
> > coresight: etm4x: Update ETM4 driver to use Trace ID API
> > coresight: etm3x: Update ETM3 driver to use Trace ID API
> > coresight: etmX.X: stm: Remove unused legacy source Trace ID ops
> > coresight: perf: traceid: Add perf notifiers for Trace ID
> > perf: cs-etm: Move mapping of Trace ID and cpu into helper function
> > perf: cs-etm: Update record event to use new Trace ID protocol
> > kernel: events: Export perf_report_aux_output_id()
> > perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet
> > coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID
> > coresight: trace-id: Add debug & test macros to Trace ID allocation
> >
> > drivers/hwtracing/coresight/Makefile | 2 +-
> > drivers/hwtracing/coresight/coresight-core.c | 49 +---
> > .../hwtracing/coresight/coresight-etm-perf.c | 17 ++
> > drivers/hwtracing/coresight/coresight-etm.h | 3 +-
> > .../coresight/coresight-etm3x-core.c | 85 +++---
> > .../coresight/coresight-etm3x-sysfs.c | 28 +-
> > .../coresight/coresight-etm4x-core.c | 65 ++++-
> > .../coresight/coresight-etm4x-sysfs.c | 32 ++-
> > drivers/hwtracing/coresight/coresight-etm4x.h | 3 +
> > drivers/hwtracing/coresight/coresight-stm.c | 49 +---
> > .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++
> > .../hwtracing/coresight/coresight-trace-id.h | 65 +++++
> > include/linux/coresight-pmu.h | 31 ++-
> > include/linux/coresight.h | 3 -
> > kernel/events/core.c | 1 +
> > tools/include/linux/coresight-pmu.h | 31 ++-
> > tools/perf/arch/arm/util/cs-etm.c | 21 +-
> > .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 +
> > tools/perf/util/cs-etm.c | 220 +++++++++++++--
> > tools/perf/util/cs-etm.h | 14 +-
> > 20 files changed, 784 insertions(+), 207 deletions(-)
> > create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c
> > create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
> >
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
On 21/07/2022 14:54, Mike Leach wrote:
> Hi James,
>
> Thanks for looking at this.
>
> On Thu, 21 Jul 2022 at 11:27, James Clark <[email protected]> wrote:
>>
>>
>>
>> On 04/07/2022 09:11, Mike Leach wrote:
>>> The current method for allocating trace source ID values to sources is
>>> to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10).
>>> The STM is allocated ID 0x1.
>>>
>>> This fixed algorithm is used in both the CoreSight driver code, and by
>>> perf when writing the trace metadata in the AUXTRACE_INFO record.
>>>
>>> The method needs replacing as currently:-
>>> 1. It is inefficient in using available IDs.
>>> 2. Does not scale to larger systems with many cores and the algorithm
>>> has no limits so will generate invalid trace IDs for cpu number > 44.
>>>
>>> Additionally requirements to allocate additional system IDs on some
>>> systems have been seen.
>>>
>>> This patch set introduces an API that allows the allocation of trace IDs
>>> in a dynamic manner.
>>
>> I've tested this with various commands like with per-thread mode, attaching,
>> running the tests and also Carsten's new tests. Apart from the possible
>> backwards compatibility issue and the minor code comments it looks good to
>> me.
>>
>
> I've looked at the backwards compatibility issue. At present with the
> current set
> (K = kernel drivers, P-rec = perf record, P-rep = perf report)
> ::
>
> K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK)
> K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (fail)
> K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (fail)
> K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
>
> So, with a P-rec generating v2 metadata, P rep will cleanly error out.
> Where the Kernel ID version and the perf report ID version differ,
> even the P rep v2 will fail, due to the IDs being different in the
> file and actual drivers. These fails will simply look like no data
> present.
>
> There are two possible fixes that improve this:-
> A) if the v2 kernel uses a sysfs flag to indicate new ID usage, then
> if this is missing the new perf record can degrade to using the old
> algorithm to put IDs directly into metadata as it assumes it is
> running on a v1 kernel.
> This fixes things then for the P-rep v2 that can look for this & we
> know there will be no incoming ID packets.
> B) P-rep v2 can look for new packets irrespective of incoming metadata
> version, and if it sees them, override them
>
> Compatibility matrix then looks like::
> K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK)
> K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
> K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (OK)
> K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
>
> There is no solution to using an old version of perf record on a new
> kernel and getting the old version of perf report to correctly decode
> the file.
>
We had a discussion about this last point on the Friday AutoFDO call.
Do you think it's possible to keep the old static ID allocations if
num_possible_cpus() < Max Trace ID? This is especially important for
simple perf because Android doesn't even have the more than 128 CPUs
issue, so technically shouldn't have to have any changes made to it.
Making the dynamic traceID allocation use the same IDs as before
whenever possible should allow both old Perf and simpleperf to open
the file as before and ignore the AUX_OUTPUT_HW_ID packets.
James
> Thoughts?
>
> Mike
>
>>>
>>> Architecturally reserved IDs are never allocated, and the system is
>>> limited to allocating only valid IDs.
>>>
>>> Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use
>>> the new API.
>>>
>>> For the ETMx.x devices IDs are allocated on certain events
>>> a) When using sysfs, an ID will be allocated on hardware enable, or a read of
>>> sysfs TRCTRACEID register and freed when the sysfs reset is written.
>>>
>>> b) When using perf, ID is allocated on hardware enable, and freed on
>>> hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet.
>>> The ID allocator is notified when perf sessions start and stop
>>> so CPU based IDs are kept constant throughout any perf session.
>>>
>>>
>>> Note: This patchset breaks backward compatibility for perf record and
>>> perf report.
>>>
>>> Because the method for generating the AUXTRACE_INFO meta data has
>>> changed, using an older perf record will result in metadata that
>>> does not match the trace IDs used in the recorded trace data.
>>> This mismatch will cause subsequent decode to fail.
>>>
>>> The version of the AUXTRACE_INFO has been updated to reflect the fact that
>>> the trace source IDs are no longer present in the metadata. This will
>>> mean older versions of perf report cannot decode the file.
>>>
>>> Applies to coresight/next [c06475910b52]
>>> Tested on DB410c
>>>
>>> Changes since v1:
>>> (after feedback & discussion with Mathieu & Suzuki).
>>>
>>> 1) API has changed. The global trace ID map is managed internally, so it
>>> is no longer passed in to the API functions.
>>>
>>> 2) perf record does not use sysfs to find the trace IDs. These are now
>>> output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report
>>> have been updated accordingly to generate and handle these events.
>>>
>>> Mike Leach (13):
>>> coresight: trace-id: Add API to dynamically assign Trace ID values
>>> coresight: trace-id: update CoreSight core to use Trace ID API
>>> coresight: stm: Update STM driver to use Trace ID API
>>> coresight: etm4x: Update ETM4 driver to use Trace ID API
>>> coresight: etm3x: Update ETM3 driver to use Trace ID API
>>> coresight: etmX.X: stm: Remove unused legacy source Trace ID ops
>>> coresight: perf: traceid: Add perf notifiers for Trace ID
>>> perf: cs-etm: Move mapping of Trace ID and cpu into helper function
>>> perf: cs-etm: Update record event to use new Trace ID protocol
>>> kernel: events: Export perf_report_aux_output_id()
>>> perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet
>>> coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID
>>> coresight: trace-id: Add debug & test macros to Trace ID allocation
>>>
>>> drivers/hwtracing/coresight/Makefile | 2 +-
>>> drivers/hwtracing/coresight/coresight-core.c | 49 +---
>>> .../hwtracing/coresight/coresight-etm-perf.c | 17 ++
>>> drivers/hwtracing/coresight/coresight-etm.h | 3 +-
>>> .../coresight/coresight-etm3x-core.c | 85 +++---
>>> .../coresight/coresight-etm3x-sysfs.c | 28 +-
>>> .../coresight/coresight-etm4x-core.c | 65 ++++-
>>> .../coresight/coresight-etm4x-sysfs.c | 32 ++-
>>> drivers/hwtracing/coresight/coresight-etm4x.h | 3 +
>>> drivers/hwtracing/coresight/coresight-stm.c | 49 +---
>>> .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++
>>> .../hwtracing/coresight/coresight-trace-id.h | 65 +++++
>>> include/linux/coresight-pmu.h | 31 ++-
>>> include/linux/coresight.h | 3 -
>>> kernel/events/core.c | 1 +
>>> tools/include/linux/coresight-pmu.h | 31 ++-
>>> tools/perf/arch/arm/util/cs-etm.c | 21 +-
>>> .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 +
>>> tools/perf/util/cs-etm.c | 220 +++++++++++++--
>>> tools/perf/util/cs-etm.h | 14 +-
>>> 20 files changed, 784 insertions(+), 207 deletions(-)
>>> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c
>>> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
>>>
>
>
>
Hi James
On Fri, 22 Jul 2022 at 13:10, James Clark <[email protected]> wrote:
>
>
>
> On 21/07/2022 14:54, Mike Leach wrote:
> > Hi James,
> >
> > Thanks for looking at this.
> >
> > On Thu, 21 Jul 2022 at 11:27, James Clark <[email protected]> wrote:
> >>
> >>
> >>
> >> On 04/07/2022 09:11, Mike Leach wrote:
> >>> The current method for allocating trace source ID values to sources is
> >>> to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10).
> >>> The STM is allocated ID 0x1.
> >>>
> >>> This fixed algorithm is used in both the CoreSight driver code, and by
> >>> perf when writing the trace metadata in the AUXTRACE_INFO record.
> >>>
> >>> The method needs replacing as currently:-
> >>> 1. It is inefficient in using available IDs.
> >>> 2. Does not scale to larger systems with many cores and the algorithm
> >>> has no limits so will generate invalid trace IDs for cpu number > 44.
> >>>
> >>> Additionally requirements to allocate additional system IDs on some
> >>> systems have been seen.
> >>>
> >>> This patch set introduces an API that allows the allocation of trace IDs
> >>> in a dynamic manner.
> >>
> >> I've tested this with various commands like with per-thread mode, attaching,
> >> running the tests and also Carsten's new tests. Apart from the possible
> >> backwards compatibility issue and the minor code comments it looks good to
> >> me.
> >>
> >
> > I've looked at the backwards compatibility issue. At present with the
> > current set
> > (K = kernel drivers, P-rec = perf record, P-rep = perf report)
> > ::
> >
> > K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK)
> > K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (fail)
> > K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (fail)
> > K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
> >
> > So, with a P-rec generating v2 metadata, P rep will cleanly error out.
> > Where the Kernel ID version and the perf report ID version differ,
> > even the P rep v2 will fail, due to the IDs being different in the
> > file and actual drivers. These fails will simply look like no data
> > present.
> >
> > There are two possible fixes that improve this:-
> > A) if the v2 kernel uses a sysfs flag to indicate new ID usage, then
> > if this is missing the new perf record can degrade to using the old
> > algorithm to put IDs directly into metadata as it assumes it is
> > running on a v1 kernel.
> > This fixes things then for the P-rep v2 that can look for this & we
> > know there will be no incoming ID packets.
> > B) P-rep v2 can look for new packets irrespective of incoming metadata
> > version, and if it sees them, override them
> >
> > Compatibility matrix then looks like::
> > K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK)
> > K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
> > K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (OK)
> > K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
> >
> > There is no solution to using an old version of perf record on a new
> > kernel and getting the old version of perf report to correctly decode
> > the file.
> >
>
> We had a discussion about this last point on the Friday AutoFDO call.
Sorry I missed that - I was on holiday.
> Do you think it's possible to keep the old static ID allocations if
> num_possible_cpus() < Max Trace ID? This is especially important for
> simple perf because Android doesn't even have the more than 128 CPUs
> issue, so technically shouldn't have to have any changes made to it.
>
If android never runs high core count hardware, then that could work.
The actual CPU limit is in fact 47, after which point the static
algorithm fails.
The question arises what do the kernel drivers do then?
The old perf -record will not realise things are about to go wrong,
and continue to blindly allocate incorrect trace IDs.
Realistically the new drivers will then switch to use the previously
unused IDs, whereby they will mismatch with the blindly allocated perf
IDs and the old perf decode process will silently fail to decode any
data with IDs that do not match.
If we also removed the metadata version update that goes alongside the
ID changes, then old perf-reports would continue to try to decode
newer files - again silently failing once the static algorithm has
failed.
Legacy ID allocation support must be added as a kernel CONFIG option -
so that it is up front an obvious to users what is being selected. And
we can output appropriate error messages.
This would be a temporary solution at best as there are upcoming
issues that will need attention:-
1 ) We need to deal with the fact that customers are adding new CS
compatible hardware to their systems, some of which they have
hardcoded trace IDs. These hardware allocations will become
reservations in the dynamic allocator, with no guarantee they will not
clash with the static algorithm.
2) There may be a point in the future where we need to use per Sink ID
allocation.
3) We have an outstanding perf issue with ETE + TRBE which never use
trace IDs - at present decode works here because all the ETE
capabilities on the current systems are identical. Once that changes,
perf will need updating to look at the trace metadata on a CPU number
basis, not on a trace ID basis.
4) Future architecture updates will render newer trace un-decodable by
old perf versions.
The question here is why would Android build an up to date kernel
version with the updated CoreSight drivers, but insist on using an
outdated perf / simpleperf version?
Regards
Mike
> Making the dynamic traceID allocation use the same IDs as before
> whenever possible should allow both old Perf and simpleperf to open
> the file as before and ignore the AUX_OUTPUT_HW_ID packets.
>
> James
>
> > Thoughts?
> >
> > Mike
> >
> >>>
> >>> Architecturally reserved IDs are never allocated, and the system is
> >>> limited to allocating only valid IDs.
> >>>
> >>> Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use
> >>> the new API.
> >>>
> >>> For the ETMx.x devices IDs are allocated on certain events
> >>> a) When using sysfs, an ID will be allocated on hardware enable, or a read of
> >>> sysfs TRCTRACEID register and freed when the sysfs reset is written.
> >>>
> >>> b) When using perf, ID is allocated on hardware enable, and freed on
> >>> hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet.
> >>> The ID allocator is notified when perf sessions start and stop
> >>> so CPU based IDs are kept constant throughout any perf session.
> >>>
> >>>
> >>> Note: This patchset breaks backward compatibility for perf record and
> >>> perf report.
> >>>
> >>> Because the method for generating the AUXTRACE_INFO meta data has
> >>> changed, using an older perf record will result in metadata that
> >>> does not match the trace IDs used in the recorded trace data.
> >>> This mismatch will cause subsequent decode to fail.
> >>>
> >>> The version of the AUXTRACE_INFO has been updated to reflect the fact that
> >>> the trace source IDs are no longer present in the metadata. This will
> >>> mean older versions of perf report cannot decode the file.
> >>>
> >>> Applies to coresight/next [c06475910b52]
> >>> Tested on DB410c
> >>>
> >>> Changes since v1:
> >>> (after feedback & discussion with Mathieu & Suzuki).
> >>>
> >>> 1) API has changed. The global trace ID map is managed internally, so it
> >>> is no longer passed in to the API functions.
> >>>
> >>> 2) perf record does not use sysfs to find the trace IDs. These are now
> >>> output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report
> >>> have been updated accordingly to generate and handle these events.
> >>>
> >>> Mike Leach (13):
> >>> coresight: trace-id: Add API to dynamically assign Trace ID values
> >>> coresight: trace-id: update CoreSight core to use Trace ID API
> >>> coresight: stm: Update STM driver to use Trace ID API
> >>> coresight: etm4x: Update ETM4 driver to use Trace ID API
> >>> coresight: etm3x: Update ETM3 driver to use Trace ID API
> >>> coresight: etmX.X: stm: Remove unused legacy source Trace ID ops
> >>> coresight: perf: traceid: Add perf notifiers for Trace ID
> >>> perf: cs-etm: Move mapping of Trace ID and cpu into helper function
> >>> perf: cs-etm: Update record event to use new Trace ID protocol
> >>> kernel: events: Export perf_report_aux_output_id()
> >>> perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet
> >>> coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID
> >>> coresight: trace-id: Add debug & test macros to Trace ID allocation
> >>>
> >>> drivers/hwtracing/coresight/Makefile | 2 +-
> >>> drivers/hwtracing/coresight/coresight-core.c | 49 +---
> >>> .../hwtracing/coresight/coresight-etm-perf.c | 17 ++
> >>> drivers/hwtracing/coresight/coresight-etm.h | 3 +-
> >>> .../coresight/coresight-etm3x-core.c | 85 +++---
> >>> .../coresight/coresight-etm3x-sysfs.c | 28 +-
> >>> .../coresight/coresight-etm4x-core.c | 65 ++++-
> >>> .../coresight/coresight-etm4x-sysfs.c | 32 ++-
> >>> drivers/hwtracing/coresight/coresight-etm4x.h | 3 +
> >>> drivers/hwtracing/coresight/coresight-stm.c | 49 +---
> >>> .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++
> >>> .../hwtracing/coresight/coresight-trace-id.h | 65 +++++
> >>> include/linux/coresight-pmu.h | 31 ++-
> >>> include/linux/coresight.h | 3 -
> >>> kernel/events/core.c | 1 +
> >>> tools/include/linux/coresight-pmu.h | 31 ++-
> >>> tools/perf/arch/arm/util/cs-etm.c | 21 +-
> >>> .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 +
> >>> tools/perf/util/cs-etm.c | 220 +++++++++++++--
> >>> tools/perf/util/cs-etm.h | 14 +-
> >>> 20 files changed, 784 insertions(+), 207 deletions(-)
> >>> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c
> >>> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
> >>>
> >
> >
> >
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
On 25/07/2022 09:19, Mike Leach wrote:
> Hi James
>
> On Fri, 22 Jul 2022 at 13:10, James Clark <[email protected]> wrote:
>>
>>
>>
>> On 21/07/2022 14:54, Mike Leach wrote:
>>> Hi James,
>>>
>>> Thanks for looking at this.
>>>
>>> On Thu, 21 Jul 2022 at 11:27, James Clark <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 04/07/2022 09:11, Mike Leach wrote:
>>>>> The current method for allocating trace source ID values to sources is
>>>>> to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10).
>>>>> The STM is allocated ID 0x1.
>>>>>
>>>>> This fixed algorithm is used in both the CoreSight driver code, and by
>>>>> perf when writing the trace metadata in the AUXTRACE_INFO record.
>>>>>
>>>>> The method needs replacing as currently:-
>>>>> 1. It is inefficient in using available IDs.
>>>>> 2. Does not scale to larger systems with many cores and the algorithm
>>>>> has no limits so will generate invalid trace IDs for cpu number > 44.
>>>>>
>>>>> Additionally requirements to allocate additional system IDs on some
>>>>> systems have been seen.
>>>>>
>>>>> This patch set introduces an API that allows the allocation of trace IDs
>>>>> in a dynamic manner.
>>>>
>>>> I've tested this with various commands like with per-thread mode, attaching,
>>>> running the tests and also Carsten's new tests. Apart from the possible
>>>> backwards compatibility issue and the minor code comments it looks good to
>>>> me.
>>>>
>>>
>>> I've looked at the backwards compatibility issue. At present with the
>>> current set
>>> (K = kernel drivers, P-rec = perf record, P-rep = perf report)
>>> ::
>>>
>>> K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK)
>>> K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (fail)
>>> K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (fail)
>>> K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
>>>
>>> So, with a P-rec generating v2 metadata, P rep will cleanly error out.
>>> Where the Kernel ID version and the perf report ID version differ,
>>> even the P rep v2 will fail, due to the IDs being different in the
>>> file and actual drivers. These fails will simply look like no data
>>> present.
>>>
>>> There are two possible fixes that improve this:-
>>> A) if the v2 kernel uses a sysfs flag to indicate new ID usage, then
>>> if this is missing the new perf record can degrade to using the old
>>> algorithm to put IDs directly into metadata as it assumes it is
>>> running on a v1 kernel.
>>> This fixes things then for the P-rep v2 that can look for this & we
>>> know there will be no incoming ID packets.
>>> B) P-rep v2 can look for new packets irrespective of incoming metadata
>>> version, and if it sees them, override them
>>>
>>> Compatibility matrix then looks like::
>>> K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK)
>>> K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
>>> K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (OK)
>>> K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
>>>
>>> There is no solution to using an old version of perf record on a new
>>> kernel and getting the old version of perf report to correctly decode
>>> the file.
>>>
>>
>> We had a discussion about this last point on the Friday AutoFDO call.
>
> Sorry I missed that - I was on holiday.
>
I think I didn't realise it at the time but I was thinking of two
separate requirements relating to this, rather than one. So I will
list them here first to avoid confusion:
1. New perfs fall back to the legacy ID mappings if they don't see
any HW_IDs.
This is to support the AutoFDO workflow when using new (fixed) perfs
on old kernels. This only affects the Perf side changes in this
patch, not any of the kernel changes.
2. Wherever possible (absent of any reserved ID clashes or CPUs > 47),
the new driver continues to use the old static ID allocation.
This is to support not making any changes to simpleperf (or any
other tools if they exist) until there is an actual need to. As you
say, this is only a temporary measure. This requirement can also be
dropped if we make the simpleperf changes at the same time as these
driver updates. But it would buy some time. But we can't fix any
tools that we don't know about.
There is no requirement to support old perfs on new kernels as far as I
can see.
>> Do you think it's possible to keep the old static ID allocations if
>> num_possible_cpus() < Max Trace ID? This is especially important for
>> simple perf because Android doesn't even have the more than 128 CPUs
>> issue, so technically shouldn't have to have any changes made to it.
>>
>
> If android never runs high core count hardware, then that could work.
> The actual CPU limit is in fact 47, after which point the static
> algorithm fails.
>
> The question arises what do the kernel drivers do then?
>
> The old perf -record will not realise things are about to go wrong,
> and continue to blindly allocate incorrect trace IDs.
> Realistically the new drivers will then switch to use the previously
> unused IDs, whereby they will mismatch with the blindly allocated perf
> IDs and the old perf decode process will silently fail to decode any
> data with IDs that do not match.
>
Do you mean this situation occurs if there are more than 47 cores?
I think it's fine for things to go wrong in this case because it's
already broken. Regardless of whether the perf and kernel versions match
or don't match.
The user would have to upgrade both parts in that case no matter what
we do.
> If we also removed the metadata version update that goes alongside the
> ID changes, then old perf-reports would continue to try to decode
> newer files - again silently failing once the static algorithm has
> failed.
That's true, but I don't think we need to drop the metaversion update.
There's no requirement for an old perf-report to open new files, so we
can still make this change.
> Legacy ID allocation support must be added as a kernel CONFIG option -
> so that it is up front an obvious to users what is being selected. And
> we can output appropriate error messages.
>
> This would be a temporary solution at best as there are upcoming
> issues that will need attention:-
> 1 ) We need to deal with the fact that customers are adding new CS
> compatible hardware to their systems, some of which they have
> hardcoded trace IDs. These hardware allocations will become
> reservations in the dynamic allocator, with no guarantee they will not
> clash with the static algorithm.
>
Maybe instead of the temporary solution we can just make the change to
simpleperf at the same time. The only reason to do this would be to buy
some time or make the transition period smoother.
But does it need to be a CONFIG option if it only happens when CPUs < 47
or if there is a clash? We can still output the AUX_OUTPUT_HW_ID, but
use the old ID allocation scheme. So it would appear to be the new
scheme for anyone looking for HW_IDs, but is also compatible with old
simpleperf until there is a clash.
> 2) There may be a point in the future where we need to use per Sink ID
> allocation.
>
> 3) We have an outstanding perf issue with ETE + TRBE which never use
> trace IDs - at present decode works here because all the ETE
> capabilities on the current systems are identical. Once that changes,
> perf will need updating to look at the trace metadata on a CPU number
> basis, not on a trace ID basis.
That's true, I have this one on my list but didn't get to it yet.
>
> 4) Future architecture updates will render newer trace un-decodable by
> old perf versions.
>
> The question here is why would Android build an up to date kernel
> version with the updated CoreSight drivers, but insist on using an
> outdated perf / simpleperf version?
I suppose I was thinking that it might be convenient to not have to make
any changes to simpleperf because it will always run on low core counts.
But with the other issues about clashes, it looks like changing it is
unavoidable.
But for the opposite (old kernel, new perf), supporting that should be
pretty easy and the reason for using that combo is to get a perf with
decode fixes and run it somewhere that the kernel can't be easily updated.
James
>
> Regards
>
> Mike
>
>
>> Making the dynamic traceID allocation use the same IDs as before
>> whenever possible should allow both old Perf and simpleperf to open
>> the file as before and ignore the AUX_OUTPUT_HW_ID packets.
>>
>> James
>>
>>> Thoughts?
>>>
>>> Mike
>>>
>>>>>
>>>>> Architecturally reserved IDs are never allocated, and the system is
>>>>> limited to allocating only valid IDs.
>>>>>
>>>>> Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use
>>>>> the new API.
>>>>>
>>>>> For the ETMx.x devices IDs are allocated on certain events
>>>>> a) When using sysfs, an ID will be allocated on hardware enable, or a read of
>>>>> sysfs TRCTRACEID register and freed when the sysfs reset is written.
>>>>>
>>>>> b) When using perf, ID is allocated on hardware enable, and freed on
>>>>> hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet.
>>>>> The ID allocator is notified when perf sessions start and stop
>>>>> so CPU based IDs are kept constant throughout any perf session.
>>>>>
>>>>>
>>>>> Note: This patchset breaks backward compatibility for perf record and
>>>>> perf report.
>>>>>
>>>>> Because the method for generating the AUXTRACE_INFO meta data has
>>>>> changed, using an older perf record will result in metadata that
>>>>> does not match the trace IDs used in the recorded trace data.
>>>>> This mismatch will cause subsequent decode to fail.
>>>>>
>>>>> The version of the AUXTRACE_INFO has been updated to reflect the fact that
>>>>> the trace source IDs are no longer present in the metadata. This will
>>>>> mean older versions of perf report cannot decode the file.
>>>>>
>>>>> Applies to coresight/next [c06475910b52]
>>>>> Tested on DB410c
>>>>>
>>>>> Changes since v1:
>>>>> (after feedback & discussion with Mathieu & Suzuki).
>>>>>
>>>>> 1) API has changed. The global trace ID map is managed internally, so it
>>>>> is no longer passed in to the API functions.
>>>>>
>>>>> 2) perf record does not use sysfs to find the trace IDs. These are now
>>>>> output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report
>>>>> have been updated accordingly to generate and handle these events.
>>>>>
>>>>> Mike Leach (13):
>>>>> coresight: trace-id: Add API to dynamically assign Trace ID values
>>>>> coresight: trace-id: update CoreSight core to use Trace ID API
>>>>> coresight: stm: Update STM driver to use Trace ID API
>>>>> coresight: etm4x: Update ETM4 driver to use Trace ID API
>>>>> coresight: etm3x: Update ETM3 driver to use Trace ID API
>>>>> coresight: etmX.X: stm: Remove unused legacy source Trace ID ops
>>>>> coresight: perf: traceid: Add perf notifiers for Trace ID
>>>>> perf: cs-etm: Move mapping of Trace ID and cpu into helper function
>>>>> perf: cs-etm: Update record event to use new Trace ID protocol
>>>>> kernel: events: Export perf_report_aux_output_id()
>>>>> perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet
>>>>> coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID
>>>>> coresight: trace-id: Add debug & test macros to Trace ID allocation
>>>>>
>>>>> drivers/hwtracing/coresight/Makefile | 2 +-
>>>>> drivers/hwtracing/coresight/coresight-core.c | 49 +---
>>>>> .../hwtracing/coresight/coresight-etm-perf.c | 17 ++
>>>>> drivers/hwtracing/coresight/coresight-etm.h | 3 +-
>>>>> .../coresight/coresight-etm3x-core.c | 85 +++---
>>>>> .../coresight/coresight-etm3x-sysfs.c | 28 +-
>>>>> .../coresight/coresight-etm4x-core.c | 65 ++++-
>>>>> .../coresight/coresight-etm4x-sysfs.c | 32 ++-
>>>>> drivers/hwtracing/coresight/coresight-etm4x.h | 3 +
>>>>> drivers/hwtracing/coresight/coresight-stm.c | 49 +---
>>>>> .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++
>>>>> .../hwtracing/coresight/coresight-trace-id.h | 65 +++++
>>>>> include/linux/coresight-pmu.h | 31 ++-
>>>>> include/linux/coresight.h | 3 -
>>>>> kernel/events/core.c | 1 +
>>>>> tools/include/linux/coresight-pmu.h | 31 ++-
>>>>> tools/perf/arch/arm/util/cs-etm.c | 21 +-
>>>>> .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 +
>>>>> tools/perf/util/cs-etm.c | 220 +++++++++++++--
>>>>> tools/perf/util/cs-etm.h | 14 +-
>>>>> 20 files changed, 784 insertions(+), 207 deletions(-)
>>>>> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c
>>>>> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
>>>>>
>>>
>>>
>>>
>
>
>
Hi James,
On Tue, 26 Jul 2022 at 14:53, James Clark <[email protected]> wrote:
>
>
>
> On 25/07/2022 09:19, Mike Leach wrote:
> > Hi James
> >
> > On Fri, 22 Jul 2022 at 13:10, James Clark <[email protected]> wrote:
> >>
> >>
> >>
> >> On 21/07/2022 14:54, Mike Leach wrote:
> >>> Hi James,
> >>>
> >>> Thanks for looking at this.
> >>>
> >>> On Thu, 21 Jul 2022 at 11:27, James Clark <[email protected]> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 04/07/2022 09:11, Mike Leach wrote:
> >>>>> The current method for allocating trace source ID values to sources is
> >>>>> to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10).
> >>>>> The STM is allocated ID 0x1.
> >>>>>
> >>>>> This fixed algorithm is used in both the CoreSight driver code, and by
> >>>>> perf when writing the trace metadata in the AUXTRACE_INFO record.
> >>>>>
> >>>>> The method needs replacing as currently:-
> >>>>> 1. It is inefficient in using available IDs.
> >>>>> 2. Does not scale to larger systems with many cores and the algorithm
> >>>>> has no limits so will generate invalid trace IDs for cpu number > 44.
> >>>>>
> >>>>> Additionally requirements to allocate additional system IDs on some
> >>>>> systems have been seen.
> >>>>>
> >>>>> This patch set introduces an API that allows the allocation of trace IDs
> >>>>> in a dynamic manner.
> >>>>
> >>>> I've tested this with various commands like with per-thread mode, attaching,
> >>>> running the tests and also Carsten's new tests. Apart from the possible
> >>>> backwards compatibility issue and the minor code comments it looks good to
> >>>> me.
> >>>>
> >>>
> >>> I've looked at the backwards compatibility issue. At present with the
> >>> current set
> >>> (K = kernel drivers, P-rec = perf record, P-rep = perf report)
> >>> ::
> >>>
> >>> K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK)
> >>> K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (fail)
> >>> K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (fail)
> >>> K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
> >>>
> >>> So, with a P-rec generating v2 metadata, P rep will cleanly error out.
> >>> Where the Kernel ID version and the perf report ID version differ,
> >>> even the P rep v2 will fail, due to the IDs being different in the
> >>> file and actual drivers. These fails will simply look like no data
> >>> present.
> >>>
> >>> There are two possible fixes that improve this:-
> >>> A) if the v2 kernel uses a sysfs flag to indicate new ID usage, then
> >>> if this is missing the new perf record can degrade to using the old
> >>> algorithm to put IDs directly into metadata as it assumes it is
> >>> running on a v1 kernel.
> >>> This fixes things then for the P-rep v2 that can look for this & we
> >>> know there will be no incoming ID packets.
> >>> B) P-rep v2 can look for new packets irrespective of incoming metadata
> >>> version, and if it sees them, override them
> >>>
> >>> Compatibility matrix then looks like::
> >>> K-v1-ids + P-rec-v1-ids => P-rep-v1 (OK) P-rep-v2 (OK)
> >>> K-v1-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
> >>> K-v2-ids + P-rec-v1-ids => P-rep-v1 (fail) P-rep-v2 (OK)
> >>> K-v2-ids + P-rec-v2-ids => P-rep-v1 (error message) P-rep-v2 (OK)
> >>>
> >>> There is no solution to using an old version of perf record on a new
> >>> kernel and getting the old version of perf report to correctly decode
> >>> the file.
> >>>
> >>
> >> We had a discussion about this last point on the Friday AutoFDO call.
> >
> > Sorry I missed that - I was on holiday.
> >
>
> I think I didn't realise it at the time but I was thinking of two
> separate requirements relating to this, rather than one. So I will
> list them here first to avoid confusion:
>
> 1. New perfs fall back to the legacy ID mappings if they don't see
> any HW_IDs.
>
> This is to support the AutoFDO workflow when using new (fixed) perfs
> on old kernels. This only affects the Perf side changes in this
> patch, not any of the kernel changes.
>
Agreed - if the new perf record fills in the trace ID metadata as it
did before using the old static algorithm, then the file generated on
an old kernel can be correctly interpreted by the new perf report, as
the absence of the new HW_ID packets can trigger it to use the
metadata instead.
> 2. Wherever possible (absent of any reserved ID clashes or CPUs > 47),
> the new driver continues to use the old static ID allocation.
>
> This is to support not making any changes to simpleperf (or any
> other tools if they exist) until there is an actual need to. As you
> say, this is only a temporary measure. This requirement can also be
> dropped if we make the simpleperf changes at the same time as these
> driver updates. But it would buy some time. But we can't fix any
> tools that we don't know about.
>
> There is no requirement to support old perfs on new kernels as far as I
> can see.
>
The _only_ reason to get the ID allocator in the driver to mimic the
old allocation numbers is if you _are_ using an old perf to record and
then read the data generated on a new kernel.
The ID allocator is only visible to the drivers, not perf record. perf
record simply makes assumptions about what the ID values are when
filling in the file metadata. The old version uses a static
calculation on the cpu number, the new version assumes that
responsibility has been passed on to the HW_ID packets.
You state below that the version of the metadata should remain updated
(@2) so old versions of perf / simpleperf can never read a file
generated by new versions of perf.
You state above that old versions of perf a not needed to be supported
on new kernels, so will never run on a system that uses the new
allocation mechanism and thereby never generate an old version of file
that mis-matches the new ID allocation mechanism.
So I am confused about precisely what the requirements are here.
Regards
Mike
> >> Do you think it's possible to keep the old static ID allocations if
> >> num_possible_cpus() < Max Trace ID? This is especially important for
> >> simple perf because Android doesn't even have the more than 128 CPUs
> >> issue, so technically shouldn't have to have any changes made to it.
> >>
> >
> > If android never runs high core count hardware, then that could work.
> > The actual CPU limit is in fact 47, after which point the static
> > algorithm fails.
> >
> > The question arises what do the kernel drivers do then?
> >
> > The old perf -record will not realise things are about to go wrong,
> > and continue to blindly allocate incorrect trace IDs.
> > Realistically the new drivers will then switch to use the previously
> > unused IDs, whereby they will mismatch with the blindly allocated perf
> > IDs and the old perf decode process will silently fail to decode any
> > data with IDs that do not match.
> >
>
> Do you mean this situation occurs if there are more than 47 cores?
> I think it's fine for things to go wrong in this case because it's
> already broken. Regardless of whether the perf and kernel versions match
> or don't match.
>
> The user would have to upgrade both parts in that case no matter what
> we do.
>
> > If we also removed the metadata version update that goes alongside the
> > ID changes, then old perf-reports would continue to try to decode
> > newer files - again silently failing once the static algorithm has
> > failed.
>
> That's true, but I don't think we need to drop the metaversion update.
> There's no requirement for an old perf-report to open new files, so we
> can still make this change.
>
> > Legacy ID allocation support must be added as a kernel CONFIG option -
> > so that it is up front an obvious to users what is being selected. And
> > we can output appropriate error messages.
> >
> > This would be a temporary solution at best as there are upcoming
> > issues that will need attention:-
> > 1 ) We need to deal with the fact that customers are adding new CS
> > compatible hardware to their systems, some of which they have
> > hardcoded trace IDs. These hardware allocations will become
> > reservations in the dynamic allocator, with no guarantee they will not
> > clash with the static algorithm.
> >
>
> Maybe instead of the temporary solution we can just make the change to
> simpleperf at the same time. The only reason to do this would be to buy
> some time or make the transition period smoother.
>
> But does it need to be a CONFIG option if it only happens when CPUs < 47
> or if there is a clash? We can still output the AUX_OUTPUT_HW_ID, but
> use the old ID allocation scheme. So it would appear to be the new
> scheme for anyone looking for HW_IDs, but is also compatible with old
> simpleperf until there is a clash.
>
> > 2) There may be a point in the future where we need to use per Sink ID
> > allocation.
> >
> > 3) We have an outstanding perf issue with ETE + TRBE which never use
> > trace IDs - at present decode works here because all the ETE
> > capabilities on the current systems are identical. Once that changes,
> > perf will need updating to look at the trace metadata on a CPU number
> > basis, not on a trace ID basis.
>
> That's true, I have this one on my list but didn't get to it yet.
>
> >
> > 4) Future architecture updates will render newer trace un-decodable by
> > old perf versions.
> >
> > The question here is why would Android build an up to date kernel
> > version with the updated CoreSight drivers, but insist on using an
> > outdated perf / simpleperf version?
>
> I suppose I was thinking that it might be convenient to not have to make
> any changes to simpleperf because it will always run on low core counts.
> But with the other issues about clashes, it looks like changing it is
> unavoidable.
>
> But for the opposite (old kernel, new perf), supporting that should be
> pretty easy and the reason for using that combo is to get a perf with
> decode fixes and run it somewhere that the kernel can't be easily updated.
>
> James
>
> >
> > Regards
> >
> > Mike
> >
> >
> >> Making the dynamic traceID allocation use the same IDs as before
> >> whenever possible should allow both old Perf and simpleperf to open
> >> the file as before and ignore the AUX_OUTPUT_HW_ID packets.
> >>
> >> James
> >>
> >>> Thoughts?
> >>>
> >>> Mike
> >>>
> >>>>>
> >>>>> Architecturally reserved IDs are never allocated, and the system is
> >>>>> limited to allocating only valid IDs.
> >>>>>
> >>>>> Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use
> >>>>> the new API.
> >>>>>
> >>>>> For the ETMx.x devices IDs are allocated on certain events
> >>>>> a) When using sysfs, an ID will be allocated on hardware enable, or a read of
> >>>>> sysfs TRCTRACEID register and freed when the sysfs reset is written.
> >>>>>
> >>>>> b) When using perf, ID is allocated on hardware enable, and freed on
> >>>>> hardware disable. IDs are communicated using the AUX_OUTPUT_HW_ID packet.
> >>>>> The ID allocator is notified when perf sessions start and stop
> >>>>> so CPU based IDs are kept constant throughout any perf session.
> >>>>>
> >>>>>
> >>>>> Note: This patchset breaks backward compatibility for perf record and
> >>>>> perf report.
> >>>>>
> >>>>> Because the method for generating the AUXTRACE_INFO meta data has
> >>>>> changed, using an older perf record will result in metadata that
> >>>>> does not match the trace IDs used in the recorded trace data.
> >>>>> This mismatch will cause subsequent decode to fail.
> >>>>>
> >>>>> The version of the AUXTRACE_INFO has been updated to reflect the fact that
> >>>>> the trace source IDs are no longer present in the metadata. This will
> >>>>> mean older versions of perf report cannot decode the file.
> >>>>>
> >>>>> Applies to coresight/next [c06475910b52]
> >>>>> Tested on DB410c
> >>>>>
> >>>>> Changes since v1:
> >>>>> (after feedback & discussion with Mathieu & Suzuki).
> >>>>>
> >>>>> 1) API has changed. The global trace ID map is managed internally, so it
> >>>>> is no longer passed in to the API functions.
> >>>>>
> >>>>> 2) perf record does not use sysfs to find the trace IDs. These are now
> >>>>> output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report
> >>>>> have been updated accordingly to generate and handle these events.
> >>>>>
> >>>>> Mike Leach (13):
> >>>>> coresight: trace-id: Add API to dynamically assign Trace ID values
> >>>>> coresight: trace-id: update CoreSight core to use Trace ID API
> >>>>> coresight: stm: Update STM driver to use Trace ID API
> >>>>> coresight: etm4x: Update ETM4 driver to use Trace ID API
> >>>>> coresight: etm3x: Update ETM3 driver to use Trace ID API
> >>>>> coresight: etmX.X: stm: Remove unused legacy source Trace ID ops
> >>>>> coresight: perf: traceid: Add perf notifiers for Trace ID
> >>>>> perf: cs-etm: Move mapping of Trace ID and cpu into helper function
> >>>>> perf: cs-etm: Update record event to use new Trace ID protocol
> >>>>> kernel: events: Export perf_report_aux_output_id()
> >>>>> perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet
> >>>>> coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID
> >>>>> coresight: trace-id: Add debug & test macros to Trace ID allocation
> >>>>>
> >>>>> drivers/hwtracing/coresight/Makefile | 2 +-
> >>>>> drivers/hwtracing/coresight/coresight-core.c | 49 +---
> >>>>> .../hwtracing/coresight/coresight-etm-perf.c | 17 ++
> >>>>> drivers/hwtracing/coresight/coresight-etm.h | 3 +-
> >>>>> .../coresight/coresight-etm3x-core.c | 85 +++---
> >>>>> .../coresight/coresight-etm3x-sysfs.c | 28 +-
> >>>>> .../coresight/coresight-etm4x-core.c | 65 ++++-
> >>>>> .../coresight/coresight-etm4x-sysfs.c | 32 ++-
> >>>>> drivers/hwtracing/coresight/coresight-etm4x.h | 3 +
> >>>>> drivers/hwtracing/coresight/coresight-stm.c | 49 +---
> >>>>> .../hwtracing/coresight/coresight-trace-id.c | 263 ++++++++++++++++++
> >>>>> .../hwtracing/coresight/coresight-trace-id.h | 65 +++++
> >>>>> include/linux/coresight-pmu.h | 31 ++-
> >>>>> include/linux/coresight.h | 3 -
> >>>>> kernel/events/core.c | 1 +
> >>>>> tools/include/linux/coresight-pmu.h | 31 ++-
> >>>>> tools/perf/arch/arm/util/cs-etm.c | 21 +-
> >>>>> .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 9 +
> >>>>> tools/perf/util/cs-etm.c | 220 +++++++++++++--
> >>>>> tools/perf/util/cs-etm.h | 14 +-
> >>>>> 20 files changed, 784 insertions(+), 207 deletions(-)
> >>>>> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c
> >>>>> create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
> >>>>>
> >>>
> >>>
> >>>
> >
> >
> >
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Hi Suzuki
On Tue, 19 Jul 2022 at 18:30, Suzuki K Poulose <[email protected]> wrote:
>
>
> Hi Mike,
>
> Thanks for the patch, please find my comments inline.
>
>
> On 04/07/2022 09:11, Mike Leach wrote:
> > The existing mechanism to assign Trace ID values to sources is limited
> > and does not scale for larger multicore / multi trace source systems.
> >
> > The API introduces functions that reserve IDs based on availabilty
> > represented by a coresight_trace_id_map structure. This records the
> > used and free IDs in a bitmap.
> >
> > CPU bound sources such as ETMs use the coresight_trace_id_get_cpu_id /
> > coresight_trace_id_put_cpu_id pair of functions. The API will record
> > the ID associated with the CPU. This ensures that the same ID will be
> > re-used while perf events are active on the CPU. The put_cpu_id function
> > will pend release of the ID until all perf cs_etm sessions are complete.
> >
> > Non-cpu sources, such as the STM can use coresight_trace_id_get_system_id /
> > coresight_trace_id_put_system_id.
> >
> > Signed-off-by: Mike Leach <[email protected]>
> > ---
> > drivers/hwtracing/coresight/Makefile | 2 +-
> > .../hwtracing/coresight/coresight-trace-id.c | 230 ++++++++++++++++++
> > .../hwtracing/coresight/coresight-trace-id.h | 65 +++++
> > 3 files changed, 296 insertions(+), 1 deletion(-)
> > create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c
> > create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
> >
> > diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
> > index b6c4a48140ec..329a0c704b87 100644
> > --- a/drivers/hwtracing/coresight/Makefile
> > +++ b/drivers/hwtracing/coresight/Makefile
> > @@ -6,7 +6,7 @@ obj-$(CONFIG_CORESIGHT) += coresight.o
> > coresight-y := coresight-core.o coresight-etm-perf.o coresight-platform.o \
> > coresight-sysfs.o coresight-syscfg.o coresight-config.o \
> > coresight-cfg-preload.o coresight-cfg-afdo.o \
> > - coresight-syscfg-configfs.o
> > + coresight-syscfg-configfs.o coresight-trace-id.o
> > obj-$(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) += coresight-tmc.o
> > coresight-tmc-y := coresight-tmc-core.o coresight-tmc-etf.o \
> > coresight-tmc-etr.o
> > diff --git a/drivers/hwtracing/coresight/coresight-trace-id.c b/drivers/hwtracing/coresight/coresight-trace-id.c
> > new file mode 100644
> > index 000000000000..dac9c89ae00d
> > --- /dev/null
> > +++ b/drivers/hwtracing/coresight/coresight-trace-id.c
> > @@ -0,0 +1,230 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (c) 2022, Linaro Limited, All rights reserved.
> > + * Author: Mike Leach <[email protected]>
> > + */
> > +#include <linux/kernel.h>
> > +#include <linux/types.h>
> > +#include <linux/spinlock.h>
> > +
> > +#include "coresight-trace-id.h"
> > +
> > +/* need to keep data on ids & association with cpus. */
> > +struct cpu_id_info {
> > + int id;
> > + bool pend_rel;
> > +};
> > +
> > +/* default trace ID map. Used for systems that do not require per sink mappings */
> > +static struct coresight_trace_id_map id_map_default;
> > +
> > +/* maintain a record of the current mapping of cpu IDs */
> > +static DEFINE_PER_CPU(struct cpu_id_info, cpu_ids);
> > +
> > +/* perf session active flag */
> > +static int perf_cs_etm_session_active;
> > +
> > +/* lock to protect id_map and cpu data */
> > +static DEFINE_SPINLOCK(id_map_lock);
> > +
> > +/* ID 0 is reserved */
> > +#define CORESIGHT_TRACE_ID_RES_0 0
> > +
> > +/* ID 0x70 onwards are reserved */
> > +#define CORESIGHT_TRACE_ID_RES_RANGE_LO 0x70
> > +#define CORESIGHT_TRACE_ID_RES_RANGE_HI 0x7F
>
> Since this range is at the end of top, we could clip the
> MAX_IDS to 0x70 and skip all these unnecessary checks and reservations.
> Also, by modifying the find_bit and for_each_bit slightly we could
> get away with this reservation scheme and the IS_VALID(id) checks.
>
> > +#define IS_VALID_ID(id) \
> > + ((id > CORESIGHT_TRACE_ID_RES_0) && (id < CORESIGHT_TRACE_ID_RES_RANGE_LO))
> > +
> > +static void coresight_trace_id_set_inuse(int id, struct coresight_trace_id_map *id_map)
> > +{
> > + if (IS_VALID_ID(id))
> > + set_bit(id, id_map->avail_ids);
> > +}
>
> Please see my comment around the definition of avail_ids.
>
> > +
> > +static void coresight_trace_id_clear_inuse(int id, struct coresight_trace_id_map *id_map)
> > +{
> > + if (IS_VALID_ID(id))
> > + clear_bit(id, id_map->avail_ids);
> > +}
>
> This could be :
>
> coresight_trace_id_free_id()
>
> > +
> > +static void coresight_trace_id_set_pend_rel(int id, struct coresight_trace_id_map *id_map)
> > +{
> > + if (IS_VALID_ID(id))
> > + set_bit(id, id_map->pend_rel_ids);
> > +}
> > +
> > +static void coresight_trace_id_clear_pend_rel(int id, struct coresight_trace_id_map *id_map)
> > +{
> > + if (IS_VALID_ID(id))
> > + clear_bit(id, id_map->pend_rel_ids);
> > +}
> > +
>
>
> > +static int coresight_trace_id_find_new_id(struct coresight_trace_id_map *id_map)
>
> minor nit: Could we call this :
>
> coresight_trace_id_alloc_new_id(id_map) and
>
> > +{
> > + int id;
> > +
> > + id = find_first_zero_bit(id_map->avail_ids, CORESIGHT_TRACE_IDS_MAX);
>
> minor nit: You could also do, to explicitly skip 0.
>
> id = find_next_zero_bit(id_map->avail_ids, 1, CORESIGHT_TRACE_IDS_MAX);
>
>
> > + if (id >= CORESIGHT_TRACE_IDS_MAX)
> > + id = -EINVAL;
>
> Could we also mark the id as in use here itself ? All callers of this
> function have to do that explicitly, anyways.
>
> > + return id;
> > +}
> > +
> > +/* release all pending IDs for all current maps & clear CPU associations */
> > +static void coresight_trace_id_release_all_pending(void)
> > +{
> > + struct coresight_trace_id_map *id_map = &id_map_default;
> > + int cpu, bit;
> > +
> int cpu, bit = 1;
>
> > + for_each_set_bit(bit, id_map->pend_rel_ids, CORESIGHT_TRACE_IDS_MAX) {
>
> for_each_set_bit_from(bit, id_map...)
>
> > + clear_bit(bit, id_map->avail_ids);
> > + clear_bit(bit, id_map->pend_rel_ids);
> > + }
> > +
> > + for_each_possible_cpu(cpu) {
> > + if (per_cpu(cpu_ids, cpu).pend_rel) {
> > + per_cpu(cpu_ids, cpu).pend_rel = false;
> > + per_cpu(cpu_ids, cpu).id = 0;
> > + }
> > + }
> > +}
> > +
> > +static void coresight_trace_id_init_id_map(struct coresight_trace_id_map *id_map)
> > +{
> > + int bit;
> > +
> > + /* set all reserved bits as in-use */
> > + set_bit(CORESIGHT_TRACE_ID_RES_0, id_map->avail_ids);
>
> > + for (bit = CORESIGHT_TRACE_ID_RES_RANGE_LO;
> > + bit <= CORESIGHT_TRACE_ID_RES_RANGE_HI; bit++)
> > + set_bit(bit, id_map->avail_ids);
>
>
> > +}
> > +
> > +static int coresight_trace_id_map_get_cpu_id(int cpu, struct coresight_trace_id_map *id_map)
> > +{
> > + unsigned long flags;
> > + int id;
> > +
> > + spin_lock_irqsave(&id_map_lock, flags);
> > +
> > + /* check for existing allocation for this CPU */
> > + id = per_cpu(cpu_ids, cpu).id;
> > + if (id)
> > + goto get_cpu_id_out;
> > +
> > + /* find a new ID */
> > + id = coresight_trace_id_find_new_id(id_map);
> > + if (id < 0)
> > + goto get_cpu_id_out;
> > +
> > + /* got a valid new ID - save details */
> > + per_cpu(cpu_ids, cpu).id = id;
> > + per_cpu(cpu_ids, cpu).pend_rel = false;
> > + coresight_trace_id_set_inuse(id, id_map);
> > + coresight_trace_id_clear_pend_rel(id, id_map);
> > +
> > +get_cpu_id_out:
> > + spin_unlock_irqrestore(&id_map_lock, flags);
> > + return id;
> > +}
> > +
> > +static void coresight_trace_id_map_put_cpu_id(int cpu, struct coresight_trace_id_map *id_map)
> > +{
> > + unsigned long flags;
> > + int id;
> > +
> > + spin_lock_irqsave(&id_map_lock, flags);
> > + id = per_cpu(cpu_ids, cpu).id;
> > + if (!id)
> > + goto put_cpu_id_out;
> > +
> > + if (perf_cs_etm_session_active) {
> > + /* set release at pending if perf still active */
> > + coresight_trace_id_set_pend_rel(id, id_map);
> > + per_cpu(cpu_ids, cpu).pend_rel = true;
> > + } else {
> > + /* otherwise clear id */
> > + coresight_trace_id_clear_inuse(id, id_map);
> > + per_cpu(cpu_ids, cpu).id = 0;
> > + }
> > +
> > + put_cpu_id_out:
> > + spin_unlock_irqrestore(&id_map_lock, flags);
> > +}
> > +
> > +static int coresight_trace_id_map_get_system_id(struct coresight_trace_id_map *id_map)
> > +{
> > + unsigned long flags;
> > + int id;
> > +
> > + spin_lock_irqsave(&id_map_lock, flags);
> > + id = coresight_trace_id_find_new_id(id_map);
> > + if (id > 0)
> > + coresight_trace_id_set_inuse(id, id_map);
>
> Please see my suggestion above on moving this to the place where we find
> the bit.
>
> > + spin_unlock_irqrestore(&id_map_lock, flags);
> > +
> > + return id;
> > +}
> > +
> > +static void coresight_trace_id_map_put_system_id(struct coresight_trace_id_map *id_map, int id)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&id_map_lock, flags);
> > + coresight_trace_id_clear_inuse(id, id_map);
> > + spin_unlock_irqrestore(&id_map_lock, flags);
> > +}
> > +
> > +/* API functions */
> > +int coresight_trace_id_get_cpu_id(int cpu)
> > +{
> > + return coresight_trace_id_map_get_cpu_id(cpu, &id_map_default);
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_trace_id_get_cpu_id);
> > +
> > +void coresight_trace_id_put_cpu_id(int cpu)
> > +{
> > + coresight_trace_id_map_put_cpu_id(cpu, &id_map_default);
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_trace_id_put_cpu_id);
> > +
> > +int coresight_trace_id_get_system_id(void)
> > +{
> > + return coresight_trace_id_map_get_system_id(&id_map_default);
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_trace_id_get_system_id);
> > +
> > +void coresight_trace_id_put_system_id(int id)
> > +{
> > + coresight_trace_id_map_put_system_id(&id_map_default, id);
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_trace_id_put_system_id);
> > +
> > +void coresight_trace_id_perf_start(void)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&id_map_lock, flags);
> > + perf_cs_etm_session_active++;
> > + spin_unlock_irqrestore(&id_map_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_trace_id_perf_start);
> > +
> > +void coresight_trace_id_perf_stop(void)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&id_map_lock, flags);
> > + perf_cs_etm_session_active--;
> > + if (!perf_cs_etm_session_active)
> > + coresight_trace_id_release_all_pending();
> > + spin_unlock_irqrestore(&id_map_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_trace_id_perf_stop);
> > +
> > +void coresight_trace_id_init_default_map(void)
> > +{
> > + coresight_trace_id_init_id_map(&id_map_default);
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_trace_id_init_default_map);
>
> We may be able to get rid of this init. Otherwise we may convert this to
> a module_initcall() in the worst case. No need to export this.
>
> > diff --git a/drivers/hwtracing/coresight/coresight-trace-id.h b/drivers/hwtracing/coresight/coresight-trace-id.h
> > new file mode 100644
> > index 000000000000..63950087edf6
> > --- /dev/null
> > +++ b/drivers/hwtracing/coresight/coresight-trace-id.h
> > @@ -0,0 +1,65 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright(C) 2022 Linaro Limited. All rights reserved.
> > + * Author: Mike Leach <[email protected]>
> > + */
> > +
> > +#ifndef _CORESIGHT_TRACE_ID_H
> > +#define _CORESIGHT_TRACE_ID_H
> > +
> > +/*
> > + * Coresight trace ID allocation API
> > + *
> > + * With multi cpu systems, and more additional trace sources a scalable
> > + * trace ID reservation system is required.
> > + *
> > + * The system will allocate Ids on a demand basis, and allow them to be
> > + * released when done.
> > + *
> > + * In order to ensure that a consistent cpu / ID matching is maintained
> > + * throughout a perf cs_etm event session - a session in progress flag will
> > + * be maintained, and released IDs not cleared until the perf session is
> > + * complete. This allows the same CPU to be re-allocated its prior ID.
> > + *
> > + *
> > + * Trace ID maps will be created and initialised to prevent architecturally
> > + * reserved IDs from being allocated.
> > + *
> > + * API permits multiple maps to be maintained - for large systems where
> > + * different sets of cpus trace into different independent sinks.
> > + */
>
> Thanks for the detailed comment above.
>
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/types.h>
> > +
> > +
> > +/* architecturally we have 128 IDs some of which are reserved */
> > +#define CORESIGHT_TRACE_IDS_MAX 128
>
> Could we restrict the CORESIGHT_TRACE_IDS_MAX to 0x70, clipping the
> upper range of reserved ids ? That way, we could skip bothering about
> checking it everywhere.
>
> > +
> > +/**
> > + * Trace ID map.
> > + *
> > + * @avail_ids: Bitmap to register available (bit = 0) and in use (bit = 1) IDs.
> > + * Initialised so that the reserved IDs are permanently marked as in use.
>
> To be honest this inverses the intution. Could we instead name this
> used_ids ?
>
> i.e BIT(i) = 1 => implies trace id is in use.
>
>
> > + * @pend_rel_ids: CPU IDs that have been released by the trace source but not yet marked
> > + * as available, to allow re-allocation to the same CPU during a perf session.
> > + */
> > +struct coresight_trace_id_map {
> > + DECLARE_BITMAP(avail_ids, CORESIGHT_TRACE_IDS_MAX);
> > + DECLARE_BITMAP(pend_rel_ids, CORESIGHT_TRACE_IDS_MAX);
> > +};
>
> Also, the definitions are split between the .c and .h. Could we keep all
> of them at one place, .h preferrably ? Or if this is not at all needed
> for the consumers of the API, we should keep all of this in the .c file.
>
> I guess in the future, with the sink specific scheme, we may need to
> expose the helpers which accept an id_map. So may be even move it here.
>
I have updated the set pretty much along the lines you suggested.
However there have been some changes to cope with issues thrown up by
lockdep as ever, so the new set has a slightly different approach
depending on perf or sysfs
Thanks for the review. New set to follow shortly.
Mike
>
> Thanks
> Suzuki
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK