2022-01-13 09:11:08

by James Clark

[permalink] [raw]
Subject: [PATCH v2 0/6] coresight: Add config flag to enable branch broadcast

This allows enabling branch broadcast for Perf hosted sessions (the option
is currently only available for the sysfs interface). Hopefully this could
be useful for testing the decode in perf, for example does a determinisitic
run with branch broadcast enabled look the same as with it disabled? It
could also be used for scenarios like OpenJ9's support for JIT code:

java -Xjit:perfTool hello.java

Currently this is not working and you get the usual errors of a missing
DSO, but branch broadcast would have to be enabled anyway before working
through this next issue:

CS ETM Trace: Debug data not found for address 0xffff7b94b058 in /tmp/perf-29360.map

Address range support in Perf for branch broadcast has also not been added
here, but could be added later.

The documentation has been refactored slightly to allow updates to be made
that link the Perf format arguments with the existing documentation.

For Suzuki's comment, I will do it as a separate change that converts all
the other hard coded values to a more consistent sysreg.h style:

nit: While at this, please could you change the hard coded value
to ETM4_CFG_BIT_RETSTK ?

Changes since v1:

* Added Leo's reviewed by on patch 3
* Fix bracket styling
* Add documentation

Applies on top of coresight/next efa56eddf5d5c. But this docs fix is also
required to get the links to work:
https://marc.info/?l=linux-doc&m=164139331923986&w=2

Also available at: https://gitlab.arm.com/linux-arm/linux-jc/-/tree/james-branch-broadcast-v2

James Clark (6):
coresight: Add config flag to enable branch broadcast
coresight: Fail to open with return stacks if they are unavailable
perf cs-etm: Update deduction of TRCCONFIGR register for branch
broadcast
Documentation: coresight: Turn numbered subsections into real
subsections
Documentation: coresight: Link config options to existing
documentation
Documentation: coresight: Expand branch broadcast documentation

.../coresight/coresight-etm4x-reference.rst | 14 ++++-
Documentation/trace/coresight/coresight.rst | 56 +++++++++++++++++--
.../hwtracing/coresight/coresight-etm-perf.c | 2 +
.../coresight/coresight-etm4x-core.c | 23 ++++++--
include/linux/coresight-pmu.h | 2 +
tools/include/linux/coresight-pmu.h | 2 +
tools/perf/arch/arm/util/cs-etm.c | 3 +
7 files changed, 92 insertions(+), 10 deletions(-)

--
2.28.0



2022-01-13 09:11:14

by James Clark

[permalink] [raw]
Subject: [PATCH v2 1/6] coresight: Add config flag to enable branch broadcast

When enabled, all taken branch addresses are output, even if the branch
was because of a direct branch instruction. This enables reconstruction
of the program flow without having access to the memory image of the
code being executed.

Use bit 8 for the config option which would be the correct bit for
programming ETMv3. Although branch broadcast can't be enabled on ETMv3
because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the
correct bit might help prevent future collisions or allow it to be
enabled if needed.

Signed-off-by: James Clark <[email protected]>
---
drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++
include/linux/coresight-pmu.h | 2 ++
3 files changed, 14 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index c039b6ae206f..43bbd5dc3d3b 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
* The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
* now take them as general formats and apply on all ETMs.
*/
+PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST));
PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC));
/* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */
PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID));
@@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = {
&format_attr_sinkid.attr,
&format_attr_preset.attr,
&format_attr_configid.attr,
+ &format_attr_branch_broadcast.attr,
NULL,
};

diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index bf18128cf5de..04669ecc0efa 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset);
}

+ /* branch broadcast - enable if selected and supported */
+ if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
+ if (!drvdata->trcbb) {
+ ret = -EINVAL;
+ goto out;
+ } else {
+ config->cfg |= BIT(ETM4_CFG_BIT_BB);
+ }
+ }
+
out:
return ret;
}
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
index 4ac5c081af93..6c2fd6cc5a98 100644
--- a/include/linux/coresight-pmu.h
+++ b/include/linux/coresight-pmu.h
@@ -18,6 +18,7 @@
* ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
* directly use below macros as config bits.
*/
+#define ETM_OPT_BRANCH_BROADCAST 8
#define ETM_OPT_CYCACC 12
#define ETM_OPT_CTXTID 14
#define ETM_OPT_CTXTID2 15
@@ -25,6 +26,7 @@
#define ETM_OPT_RETSTK 29

/* ETMv4 CONFIGR programming bits for the ETM OPTs */
+#define ETM4_CFG_BIT_BB 3
#define ETM4_CFG_BIT_CYCACC 4
#define ETM4_CFG_BIT_CTXTID 6
#define ETM4_CFG_BIT_VMID 7
--
2.28.0


2022-01-13 09:11:18

by James Clark

[permalink] [raw]
Subject: [PATCH v2 2/6] coresight: Fail to open with return stacks if they are unavailable

Maintain consistency with the other options by failing to open when they
aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly
added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are
requested but not supported by hardware.

The consequence of not doing this is that the user may not be
aware that they are not enabling the feature as it is silently disabled.

Signed-off-by: James Clark <[email protected]>
---
drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index 04669ecc0efa..a93c1a5fe045 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
}

/* return stack - enable if selected and supported */
- if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
- /* bit[12], Return stack enable bit */
- config->cfg |= BIT(12);
-
+ if (attr->config & BIT(ETM_OPT_RETSTK)) {
+ if (!drvdata->retstack) {
+ ret = -EINVAL;
+ goto out;
+ } else {
+ /* bit[12], Return stack enable bit */
+ config->cfg |= BIT(12);
+ }
+ }
/*
* Set any selected configuration and preset.
*
--
2.28.0


2022-01-13 09:11:22

by James Clark

[permalink] [raw]
Subject: [PATCH v2 3/6] perf cs-etm: Update deduction of TRCCONFIGR register for branch broadcast

Now that a config flag for branch broadcast has been added, take it into
account when trying to deduce what the driver would have programmed the
TRCCONFIGR register to.

Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
---
tools/include/linux/coresight-pmu.h | 2 ++
tools/perf/arch/arm/util/cs-etm.c | 3 +++
2 files changed, 5 insertions(+)

diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h
index 4ac5c081af93..6c2fd6cc5a98 100644
--- a/tools/include/linux/coresight-pmu.h
+++ b/tools/include/linux/coresight-pmu.h
@@ -18,6 +18,7 @@
* ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
* directly use below macros as config bits.
*/
+#define ETM_OPT_BRANCH_BROADCAST 8
#define ETM_OPT_CYCACC 12
#define ETM_OPT_CTXTID 14
#define ETM_OPT_CTXTID2 15
@@ -25,6 +26,7 @@
#define ETM_OPT_RETSTK 29

/* ETMv4 CONFIGR programming bits for the ETM OPTs */
+#define ETM4_CFG_BIT_BB 3
#define ETM4_CFG_BIT_CYCACC 4
#define ETM4_CFG_BIT_CTXTID 6
#define ETM4_CFG_BIT_VMID 7
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index 293a23bf8be3..c7ef4e9b4a3a 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -527,6 +527,9 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr)
if (config_opts & BIT(ETM_OPT_CTXTID2))
config |= BIT(ETM4_CFG_BIT_VMID) |
BIT(ETM4_CFG_BIT_VMID_OPT);
+ if (config_opts & BIT(ETM_OPT_BRANCH_BROADCAST))
+ config |= BIT(ETM4_CFG_BIT_BB);
+
return config;
}

--
2.28.0


2022-01-13 09:11:24

by James Clark

[permalink] [raw]
Subject: [PATCH v2 4/6] Documentation: coresight: Turn numbered subsections into real subsections

This is to allow them to be referenced in a later commit. There was
also a mistake where sysFS was introduced as section 2, but numbered
as section 1. And vice versa for 'Using perf framework'. This can't
happen with unnumbered sections.

Signed-off-by: James Clark <[email protected]>
---
Documentation/trace/coresight/coresight.rst | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst
index a15571d96cc8..db66ff45ff4c 100644
--- a/Documentation/trace/coresight/coresight.rst
+++ b/Documentation/trace/coresight/coresight.rst
@@ -339,7 +339,8 @@ Preference is given to the former as using the sysFS interface
requires a deep understanding of the Coresight HW. The following sections
provide details on using both methods.

-1) Using the sysFS interface:
+Using the sysFS interface
+~~~~~~~~~~~~~~~~~~~~~~~~~

Before trace collection can start, a coresight sink needs to be identified.
There is no limit on the amount of sinks (nor sources) that can be enabled at
@@ -446,7 +447,8 @@ wealth of possibilities that coresight provides.
Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc}
Timestamp Timestamp: 17107041535

-2) Using perf framework:
+Using perf framework
+~~~~~~~~~~~~~~~~~~~~

Coresight tracers are represented using the Perf framework's Performance
Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of
@@ -495,7 +497,11 @@ More information on the above and other example on how to use Coresight with
the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
repository [#third]_.

-2.1) AutoFDO analysis using the perf tools:
+Advanced perf framework usage
+-----------------------------
+
+AutoFDO analysis using the perf tools
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

perf can be used to record and analyze trace of programs.

@@ -513,7 +519,8 @@ The --itrace option controls the type and frequency of synthesized events
Note that only 64-bit programs are currently supported - further work is
required to support instruction decode of 32-bit Arm programs.

-2.2) Tracing PID
+Tracing PID
+~~~~~~~~~~~

The kernel can be built to write the PID value into the PE ContextID registers.
For a kernel running at EL1, the PID is stored in CONTEXTIDR_EL1. A PE may
@@ -547,7 +554,7 @@ wants to trace PIDs for both host and guest, the two configs "contextid1" and


Generating coverage files for Feedback Directed Optimization: AutoFDO
----------------------------------------------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

'perf inject' accepts the --itrace option in which case tracing data is
removed and replaced with the synthesized events. e.g.
--
2.28.0


2022-01-13 09:11:35

by James Clark

[permalink] [raw]
Subject: [PATCH v2 6/6] Documentation: coresight: Expand branch broadcast documentation

Now that there is a way of enabling branch broadcast via perf, mention
the possible use cases and known limitations.

Signed-off-by: James Clark <[email protected]>
---
.../trace/coresight/coresight-etm4x-reference.rst | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst
index 0439b4006227..ec336575919c 100644
--- a/Documentation/trace/coresight/coresight-etm4x-reference.rst
+++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst
@@ -656,7 +656,15 @@ Bit assignments shown below:-
ETM_MODE_BB

**description:**
- Set to enable branch broadcast if supported in hardware [IDR0].
+ Set to enable branch broadcast if supported in hardware [IDR0]. The primary use for this feature
+ is when code is patched dynamically at run time and the full program flow may not be able to be
+ reconstructed using only conditional branches.
+
+ Choosing this option will result in a significant increase in the amount of trace generated -
+ possible danger of overflows, or fewer instructions covered. Note, that this option also
+ overrides any setting of :ref:`ETM_MODE_RETURNSTACK <coresight-return-stack>`, so where a branch
+ broadcast range overlaps a return stack range, return stacks will not be available for that
+ range.

.. _coresight-cycle-accurate:

--
2.28.0


2022-01-13 09:11:34

by James Clark

[permalink] [raw]
Subject: [PATCH v2 5/6] Documentation: coresight: Link config options to existing documentation

In order to document the newly added branch_broadcast option, create a
table that links all of the config option formats to any existing docs.
That way when the branch broadcast docs are expanded they are accessible
from both places.

Signed-off-by: James Clark <[email protected]>
---
.../coresight/coresight-etm4x-reference.rst | 4 ++
Documentation/trace/coresight/coresight.rst | 39 +++++++++++++++++++
2 files changed, 43 insertions(+)

diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst
index d25dfe86af9b..0439b4006227 100644
--- a/Documentation/trace/coresight/coresight-etm4x-reference.rst
+++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst
@@ -650,6 +650,7 @@ Bit assignments shown below:-
parameter is set this value is applied to the currently indexed
address range.

+.. _coresight-branch-broadcast:

**bit (4):**
ETM_MODE_BB
@@ -657,6 +658,7 @@ Bit assignments shown below:-
**description:**
Set to enable branch broadcast if supported in hardware [IDR0].

+.. _coresight-cycle-accurate:

**bit (5):**
ETMv4_MODE_CYCACC
@@ -678,6 +680,7 @@ Bit assignments shown below:-
**description:**
Set to enable virtual machine ID tracing if supported [IDR2].

+.. _coresight-timestamp:

**bit (11):**
ETMv4_MODE_TIMESTAMP
@@ -685,6 +688,7 @@ Bit assignments shown below:-
**description:**
Set to enable timestamp generation if supported [IDR0].

+.. _coresight-return-stack:

**bit (12):**
ETM_MODE_RETURNSTACK
diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst
index db66ff45ff4c..803a224dbb0e 100644
--- a/Documentation/trace/coresight/coresight.rst
+++ b/Documentation/trace/coresight/coresight.rst
@@ -585,6 +585,45 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto
Bubble sorting array of 30000 elements
5806 ms

+Config option formats
+~~~~~~~~~~~~~~~~~~~~~
+
+The following strings can be provided between // on the perf command line to enable various options.
+They are also listed in the folder /sys/bus/event_source/devices/cs_etm/format/
+
+.. list-table::
+ :header-rows: 1
+
+ * - Option
+ - Description
+ * - branch_broadcast
+ - Session local version of the system wide setting:
+ :ref:`ETM_MODE_BB <coresight-branch-broadcast>`
+ * - contextid
+ - See `Tracing PID`_
+ * - contextid1
+ - See `Tracing PID`_
+ * - contextid2
+ - See `Tracing PID`_
+ * - configid
+ - Selection for a custom configuration. This is an implementation detail and not used directly,
+ see :ref:`trace/coresight/coresight-config:Using Configurations in perf`
+ * - preset
+ - Override for parameters in a custom configuration, see
+ :ref:`trace/coresight/coresight-config:Using Configurations in perf`
+ * - sinkid
+ - Hashed version of the string to select a sink, automatically set when using the @ notation.
+ This is an internal implementation detail and is not used directly, see `Using perf
+ framework`_.
+ * - cycacc
+ - Session local version of the system wide setting: :ref:`ETMv4_MODE_CYCACC
+ <coresight-cycle-accurate>`
+ * - retstack
+ - Session local version of the system wide setting: :ref:`ETM_MODE_RETURNSTACK
+ <coresight-return-stack>`
+ * - timestamp
+ - Session local version of the system wide setting: :ref:`ETMv4_MODE_TIMESTAMP
+ <coresight-timestamp>`

How to use the STM module
-------------------------
--
2.28.0


2022-01-22 01:10:20

by Mike Leach

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] coresight: Fail to open with return stacks if they are unavailable

Reviewed-by: Mike Leach <[email protected]>

On Thu, 13 Jan 2022 at 09:11, James Clark <[email protected]> wrote:
>
> Maintain consistency with the other options by failing to open when they
> aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly
> added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are
> requested but not supported by hardware.
>
> The consequence of not doing this is that the user may not be
> aware that they are not enabling the feature as it is silently disabled.
>
> Signed-off-by: James Clark <[email protected]>
> ---
> drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index 04669ecc0efa..a93c1a5fe045 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
> }
>
> /* return stack - enable if selected and supported */
> - if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
> - /* bit[12], Return stack enable bit */
> - config->cfg |= BIT(12);
> -
> + if (attr->config & BIT(ETM_OPT_RETSTK)) {
> + if (!drvdata->retstack) {
> + ret = -EINVAL;
> + goto out;
> + } else {
> + /* bit[12], Return stack enable bit */
> + config->cfg |= BIT(12);
> + }
> + }
> /*
> * Set any selected configuration and preset.
> *
> --
> 2.28.0
>


--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

2022-01-22 01:12:41

by Mike Leach

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] coresight: Add config flag to enable branch broadcast

On Thu, 13 Jan 2022 at 09:11, James Clark <[email protected]> wrote:
>
> When enabled, all taken branch addresses are output, even if the branch
> was because of a direct branch instruction. This enables reconstruction
> of the program flow without having access to the memory image of the
> code being executed.
>
> Use bit 8 for the config option which would be the correct bit for
> programming ETMv3. Although branch broadcast can't be enabled on ETMv3
> because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the
> correct bit might help prevent future collisions or allow it to be
> enabled if needed.
>
> Signed-off-by: James Clark <[email protected]>
> ---
> drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
> drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++
> include/linux/coresight-pmu.h | 2 ++
> 3 files changed, 14 insertions(+)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index c039b6ae206f..43bbd5dc3d3b 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
> * The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
> * now take them as general formats and apply on all ETMs.
> */
> +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST));
> PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC));
> /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */
> PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID));
> @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = {
> &format_attr_sinkid.attr,
> &format_attr_preset.attr,
> &format_attr_configid.attr,
> + &format_attr_branch_broadcast.attr,
> NULL,
> };
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index bf18128cf5de..04669ecc0efa 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
> ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset);
> }
>
> + /* branch broadcast - enable if selected and supported */
> + if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
> + if (!drvdata->trcbb) {
> + ret = -EINVAL;
> + goto out;
> + } else {
> + config->cfg |= BIT(ETM4_CFG_BIT_BB);
> + }
> + }
> +
> out:
> return ret;
> }
> diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
> index 4ac5c081af93..6c2fd6cc5a98 100644
> --- a/include/linux/coresight-pmu.h
> +++ b/include/linux/coresight-pmu.h
> @@ -18,6 +18,7 @@
> * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
> * directly use below macros as config bits.
> */
> +#define ETM_OPT_BRANCH_BROADCAST 8
> #define ETM_OPT_CYCACC 12
> #define ETM_OPT_CTXTID 14
> #define ETM_OPT_CTXTID2 15
> @@ -25,6 +26,7 @@
> #define ETM_OPT_RETSTK 29
>
> /* ETMv4 CONFIGR programming bits for the ETM OPTs */
> +#define ETM4_CFG_BIT_BB 3
> #define ETM4_CFG_BIT_CYCACC 4
> #define ETM4_CFG_BIT_CTXTID 6
> #define ETM4_CFG_BIT_VMID 7
> --
> 2.28.0
>
Reviewed-by: Mike Leach <[email protected]>

--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

2022-01-22 01:13:54

by Mike Leach

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] perf cs-etm: Update deduction of TRCCONFIGR register for branch broadcast

On Thu, 13 Jan 2022 at 09:11, James Clark <[email protected]> wrote:
>
> Now that a config flag for branch broadcast has been added, take it into
> account when trying to deduce what the driver would have programmed the
> TRCCONFIGR register to.
>
> Reviewed-by: Leo Yan <[email protected]>
> Signed-off-by: James Clark <[email protected]>
> ---
> tools/include/linux/coresight-pmu.h | 2 ++
> tools/perf/arch/arm/util/cs-etm.c | 3 +++
> 2 files changed, 5 insertions(+)
>
> diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h
> index 4ac5c081af93..6c2fd6cc5a98 100644
> --- a/tools/include/linux/coresight-pmu.h
> +++ b/tools/include/linux/coresight-pmu.h
> @@ -18,6 +18,7 @@
> * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
> * directly use below macros as config bits.
> */
> +#define ETM_OPT_BRANCH_BROADCAST 8
> #define ETM_OPT_CYCACC 12
> #define ETM_OPT_CTXTID 14
> #define ETM_OPT_CTXTID2 15
> @@ -25,6 +26,7 @@
> #define ETM_OPT_RETSTK 29
>
> /* ETMv4 CONFIGR programming bits for the ETM OPTs */
> +#define ETM4_CFG_BIT_BB 3
> #define ETM4_CFG_BIT_CYCACC 4
> #define ETM4_CFG_BIT_CTXTID 6
> #define ETM4_CFG_BIT_VMID 7
> diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
> index 293a23bf8be3..c7ef4e9b4a3a 100644
> --- a/tools/perf/arch/arm/util/cs-etm.c
> +++ b/tools/perf/arch/arm/util/cs-etm.c
> @@ -527,6 +527,9 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr)
> if (config_opts & BIT(ETM_OPT_CTXTID2))
> config |= BIT(ETM4_CFG_BIT_VMID) |
> BIT(ETM4_CFG_BIT_VMID_OPT);
> + if (config_opts & BIT(ETM_OPT_BRANCH_BROADCAST))
> + config |= BIT(ETM4_CFG_BIT_BB);
> +
> return config;
> }
>
> --
> 2.28.0
>

Reviewed-by: Mike Leach <[email protected]>
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

2022-01-22 01:14:33

by Mike Leach

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] Documentation: coresight: Turn numbered subsections into real subsections

Reviewed-by: Mike Leach <[email protected]>

On Thu, 13 Jan 2022 at 09:11, James Clark <[email protected]> wrote:
>
> This is to allow them to be referenced in a later commit. There was
> also a mistake where sysFS was introduced as section 2, but numbered
> as section 1. And vice versa for 'Using perf framework'. This can't
> happen with unnumbered sections.
>
> Signed-off-by: James Clark <[email protected]>
> ---
> Documentation/trace/coresight/coresight.rst | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst
> index a15571d96cc8..db66ff45ff4c 100644
> --- a/Documentation/trace/coresight/coresight.rst
> +++ b/Documentation/trace/coresight/coresight.rst
> @@ -339,7 +339,8 @@ Preference is given to the former as using the sysFS interface
> requires a deep understanding of the Coresight HW. The following sections
> provide details on using both methods.
>
> -1) Using the sysFS interface:
> +Using the sysFS interface
> +~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Before trace collection can start, a coresight sink needs to be identified.
> There is no limit on the amount of sinks (nor sources) that can be enabled at
> @@ -446,7 +447,8 @@ wealth of possibilities that coresight provides.
> Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc}
> Timestamp Timestamp: 17107041535
>
> -2) Using perf framework:
> +Using perf framework
> +~~~~~~~~~~~~~~~~~~~~
>
> Coresight tracers are represented using the Perf framework's Performance
> Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of
> @@ -495,7 +497,11 @@ More information on the above and other example on how to use Coresight with
> the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
> repository [#third]_.
>
> -2.1) AutoFDO analysis using the perf tools:
> +Advanced perf framework usage
> +-----------------------------
> +
> +AutoFDO analysis using the perf tools
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> perf can be used to record and analyze trace of programs.
>
> @@ -513,7 +519,8 @@ The --itrace option controls the type and frequency of synthesized events
> Note that only 64-bit programs are currently supported - further work is
> required to support instruction decode of 32-bit Arm programs.
>
> -2.2) Tracing PID
> +Tracing PID
> +~~~~~~~~~~~
>
> The kernel can be built to write the PID value into the PE ContextID registers.
> For a kernel running at EL1, the PID is stored in CONTEXTIDR_EL1. A PE may
> @@ -547,7 +554,7 @@ wants to trace PIDs for both host and guest, the two configs "contextid1" and
>
>
> Generating coverage files for Feedback Directed Optimization: AutoFDO
> ----------------------------------------------------------------------
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> 'perf inject' accepts the --itrace option in which case tracing data is
> removed and replaced with the synthesized events. e.g.
> --
> 2.28.0
>


--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

2022-01-22 01:15:19

by Mike Leach

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] Documentation: coresight: Link config options to existing documentation

Reviewed-by: Mike Leach <[email protected]>

On Thu, 13 Jan 2022 at 09:11, James Clark <[email protected]> wrote:
>
> In order to document the newly added branch_broadcast option, create a
> table that links all of the config option formats to any existing docs.
> That way when the branch broadcast docs are expanded they are accessible
> from both places.
>
> Signed-off-by: James Clark <[email protected]>
> ---
> .../coresight/coresight-etm4x-reference.rst | 4 ++
> Documentation/trace/coresight/coresight.rst | 39 +++++++++++++++++++
> 2 files changed, 43 insertions(+)
>
> diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst
> index d25dfe86af9b..0439b4006227 100644
> --- a/Documentation/trace/coresight/coresight-etm4x-reference.rst
> +++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst
> @@ -650,6 +650,7 @@ Bit assignments shown below:-
> parameter is set this value is applied to the currently indexed
> address range.
>
> +.. _coresight-branch-broadcast:
>
> **bit (4):**
> ETM_MODE_BB
> @@ -657,6 +658,7 @@ Bit assignments shown below:-
> **description:**
> Set to enable branch broadcast if supported in hardware [IDR0].
>
> +.. _coresight-cycle-accurate:
>
> **bit (5):**
> ETMv4_MODE_CYCACC
> @@ -678,6 +680,7 @@ Bit assignments shown below:-
> **description:**
> Set to enable virtual machine ID tracing if supported [IDR2].
>
> +.. _coresight-timestamp:
>
> **bit (11):**
> ETMv4_MODE_TIMESTAMP
> @@ -685,6 +688,7 @@ Bit assignments shown below:-
> **description:**
> Set to enable timestamp generation if supported [IDR0].
>
> +.. _coresight-return-stack:
>
> **bit (12):**
> ETM_MODE_RETURNSTACK
> diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst
> index db66ff45ff4c..803a224dbb0e 100644
> --- a/Documentation/trace/coresight/coresight.rst
> +++ b/Documentation/trace/coresight/coresight.rst
> @@ -585,6 +585,45 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto
> Bubble sorting array of 30000 elements
> 5806 ms
>
> +Config option formats
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +The following strings can be provided between // on the perf command line to enable various options.
> +They are also listed in the folder /sys/bus/event_source/devices/cs_etm/format/
> +
> +.. list-table::
> + :header-rows: 1
> +
> + * - Option
> + - Description
> + * - branch_broadcast
> + - Session local version of the system wide setting:
> + :ref:`ETM_MODE_BB <coresight-branch-broadcast>`
> + * - contextid
> + - See `Tracing PID`_
> + * - contextid1
> + - See `Tracing PID`_
> + * - contextid2
> + - See `Tracing PID`_
> + * - configid
> + - Selection for a custom configuration. This is an implementation detail and not used directly,
> + see :ref:`trace/coresight/coresight-config:Using Configurations in perf`
> + * - preset
> + - Override for parameters in a custom configuration, see
> + :ref:`trace/coresight/coresight-config:Using Configurations in perf`
> + * - sinkid
> + - Hashed version of the string to select a sink, automatically set when using the @ notation.
> + This is an internal implementation detail and is not used directly, see `Using perf
> + framework`_.
> + * - cycacc
> + - Session local version of the system wide setting: :ref:`ETMv4_MODE_CYCACC
> + <coresight-cycle-accurate>`
> + * - retstack
> + - Session local version of the system wide setting: :ref:`ETM_MODE_RETURNSTACK
> + <coresight-return-stack>`
> + * - timestamp
> + - Session local version of the system wide setting: :ref:`ETMv4_MODE_TIMESTAMP
> + <coresight-timestamp>`
>
> How to use the STM module
> -------------------------
> --
> 2.28.0
>


--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

2022-01-22 01:17:31

by Mike Leach

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] Documentation: coresight: Expand branch broadcast documentation

On Thu, 13 Jan 2022 at 09:11, James Clark <[email protected]> wrote:
>
> Now that there is a way of enabling branch broadcast via perf, mention
> the possible use cases and known limitations.
>
> Signed-off-by: James Clark <[email protected]>
> ---
> .../trace/coresight/coresight-etm4x-reference.rst | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst
> index 0439b4006227..ec336575919c 100644
> --- a/Documentation/trace/coresight/coresight-etm4x-reference.rst
> +++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst
> @@ -656,7 +656,15 @@ Bit assignments shown below:-
> ETM_MODE_BB
>
> **description:**
> - Set to enable branch broadcast if supported in hardware [IDR0].
> + Set to enable branch broadcast if supported in hardware [IDR0]. The primary use for this feature
> + is when code is patched dynamically at run time and the full program flow may not be able to be
> + reconstructed using only conditional branches.
> +
> + Choosing this option will result in a significant increase in the amount of trace generated -
> + possible danger of overflows, or fewer instructions covered. Note, that this option also
> + overrides any setting of :ref:`ETM_MODE_RETURNSTACK <coresight-return-stack>`, so where a branch
> + broadcast range overlaps a return stack range, return stacks will not be available for that
> + range.
>
> .. _coresight-cycle-accurate:
>
> --
> 2.28.0
>

Reviewed-by: Mike Leach <[email protected]>
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

2022-01-28 17:20:36

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] coresight: Add config flag to enable branch broadcast

Hi James,

I am seriously back-logged in my patch reviews and as such will not be
able to get to your work in the usual 14 days. At this time and if
everything goes well, I should be able to start reviewing this
patchset during the week of February 7th.

Thanks,
Mathieu

On Thu, 13 Jan 2022 at 02:11, James Clark <[email protected]> wrote:
>
> This allows enabling branch broadcast for Perf hosted sessions (the option
> is currently only available for the sysfs interface). Hopefully this could
> be useful for testing the decode in perf, for example does a determinisitic
> run with branch broadcast enabled look the same as with it disabled? It
> could also be used for scenarios like OpenJ9's support for JIT code:
>
> java -Xjit:perfTool hello.java
>
> Currently this is not working and you get the usual errors of a missing
> DSO, but branch broadcast would have to be enabled anyway before working
> through this next issue:
>
> CS ETM Trace: Debug data not found for address 0xffff7b94b058 in /tmp/perf-29360.map
>
> Address range support in Perf for branch broadcast has also not been added
> here, but could be added later.
>
> The documentation has been refactored slightly to allow updates to be made
> that link the Perf format arguments with the existing documentation.
>
> For Suzuki's comment, I will do it as a separate change that converts all
> the other hard coded values to a more consistent sysreg.h style:
>
> nit: While at this, please could you change the hard coded value
> to ETM4_CFG_BIT_RETSTK ?
>
> Changes since v1:
>
> * Added Leo's reviewed by on patch 3
> * Fix bracket styling
> * Add documentation
>
> Applies on top of coresight/next efa56eddf5d5c. But this docs fix is also
> required to get the links to work:
> https://marc.info/?l=linux-doc&m=164139331923986&w=2
>
> Also available at: https://gitlab.arm.com/linux-arm/linux-jc/-/tree/james-branch-broadcast-v2
>
> James Clark (6):
> coresight: Add config flag to enable branch broadcast
> coresight: Fail to open with return stacks if they are unavailable
> perf cs-etm: Update deduction of TRCCONFIGR register for branch
> broadcast
> Documentation: coresight: Turn numbered subsections into real
> subsections
> Documentation: coresight: Link config options to existing
> documentation
> Documentation: coresight: Expand branch broadcast documentation
>
> .../coresight/coresight-etm4x-reference.rst | 14 ++++-
> Documentation/trace/coresight/coresight.rst | 56 +++++++++++++++++--
> .../hwtracing/coresight/coresight-etm-perf.c | 2 +
> .../coresight/coresight-etm4x-core.c | 23 ++++++--
> include/linux/coresight-pmu.h | 2 +
> tools/include/linux/coresight-pmu.h | 2 +
> tools/perf/arch/arm/util/cs-etm.c | 3 +
> 7 files changed, 92 insertions(+), 10 deletions(-)
>
> --
> 2.28.0
>

2022-01-31 07:29:39

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] coresight: Add config flag to enable branch broadcast

On 13/01/2022 09:10, James Clark wrote:
> When enabled, all taken branch addresses are output, even if the branch
> was because of a direct branch instruction. This enables reconstruction
> of the program flow without having access to the memory image of the
> code being executed.
>
> Use bit 8 for the config option which would be the correct bit for
> programming ETMv3. Although branch broadcast can't be enabled on ETMv3
> because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the
> correct bit might help prevent future collisions or allow it to be
> enabled if needed.
>
> Signed-off-by: James Clark <[email protected]>
> ---
> drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
> drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++
> include/linux/coresight-pmu.h | 2 ++
> 3 files changed, 14 insertions(+)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index c039b6ae206f..43bbd5dc3d3b 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
> * The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
> * now take them as general formats and apply on all ETMs.
> */
> +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST));
> PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC));
> /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */
> PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID));
> @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = {
> &format_attr_sinkid.attr,
> &format_attr_preset.attr,
> &format_attr_configid.attr,
> + &format_attr_branch_broadcast.attr,

Does it make sense to hide the option if the bb is not supported ? I
guess it will be tricky as we don't track the common feature set. So,
that said...

> NULL,
> };
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index bf18128cf5de..04669ecc0efa 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
> ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset);
> }
>
> + /* branch broadcast - enable if selected and supported */
> + if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
> + if (!drvdata->trcbb) {
> + ret = -EINVAL;

Should we fail here ? We could simply ignore this and generate the trace
normally. This would work on a big.LITTLE system with one set missing
the branch broadcast, while the others support.

Mike,

Does this affect the trace decoding ? As such the OpenCSD should be able
to decode the packets as they appear in the stream. Correct ?

Suzuki


> + goto out;
> + } else {
> + config->cfg |= BIT(ETM4_CFG_BIT_BB);
> + }
> + }
> +
> out:
> return ret;
> }
> diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
> index 4ac5c081af93..6c2fd6cc5a98 100644
> --- a/include/linux/coresight-pmu.h
> +++ b/include/linux/coresight-pmu.h
> @@ -18,6 +18,7 @@
> * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
> * directly use below macros as config bits.
> */
> +#define ETM_OPT_BRANCH_BROADCAST 8
> #define ETM_OPT_CYCACC 12
> #define ETM_OPT_CTXTID 14
> #define ETM_OPT_CTXTID2 15
> @@ -25,6 +26,7 @@
> #define ETM_OPT_RETSTK 29
>
> /* ETMv4 CONFIGR programming bits for the ETM OPTs */
> +#define ETM4_CFG_BIT_BB 3
> #define ETM4_CFG_BIT_CYCACC 4
> #define ETM4_CFG_BIT_CTXTID 6
> #define ETM4_CFG_BIT_VMID 7

2022-01-31 08:17:05

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] coresight: Fail to open with return stacks if they are unavailable

Hi James

On 13/01/2022 09:10, James Clark wrote:
> Maintain consistency with the other options by failing to open when they
> aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly
> added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are
> requested but not supported by hardware.

Looking at this again (with similar comment to the Branch Broadcast),
won't it disable using retstack on all CPUs, even when some of them
support it ?

i.e., CPU0 - supports retstack, CPU1 - doesn't

A perf run with retstack will fail, as CPU1 doesn't support it (even
though we advertise it, unconditionally).

So, if we ignore the failure, this would still allow CPU0 to use
the feature and as long as the OpenCSD is able to decode the trace
we should ignore the failure ?

I think we may also need to tune the etm4x_enable_hw() to skip
updating the TRCCONFIGR with features not supported by the ETM

Suzuki


>
> The consequence of not doing this is that the user may not be
> aware that they are not enabling the feature as it is silently disabled.
>
> Signed-off-by: James Clark <[email protected]>
> ---
> drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index 04669ecc0efa..a93c1a5fe045 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
> }
>
> /* return stack - enable if selected and supported */
> - if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
> - /* bit[12], Return stack enable bit */
> - config->cfg |= BIT(12);
> -
> + if (attr->config & BIT(ETM_OPT_RETSTK)) {
> + if (!drvdata->retstack) {
> + ret = -EINVAL;
> + goto out;
> + } else {
> + /* bit[12], Return stack enable bit */
> + config->cfg |= BIT(12);
> + }
> + }
> /*
> * Set any selected configuration and preset.
> *

2022-01-31 08:44:11

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] perf cs-etm: Update deduction of TRCCONFIGR register for branch broadcast

On 13/01/2022 09:10, James Clark wrote:
> Now that a config flag for branch broadcast has been added, take it into
> account when trying to deduce what the driver would have programmed the
> TRCCONFIGR register to.
>
> Reviewed-by: Leo Yan <[email protected]>
> Signed-off-by: James Clark <[email protected]>
> ---
> tools/include/linux/coresight-pmu.h | 2 ++
> tools/perf/arch/arm/util/cs-etm.c | 3 +++
> 2 files changed, 5 insertions(+)
>
> diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h
> index 4ac5c081af93..6c2fd6cc5a98 100644
> --- a/tools/include/linux/coresight-pmu.h
> +++ b/tools/include/linux/coresight-pmu.h
> @@ -18,6 +18,7 @@
> * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
> * directly use below macros as config bits.
> */
> +#define ETM_OPT_BRANCH_BROADCAST 8
> #define ETM_OPT_CYCACC 12
> #define ETM_OPT_CTXTID 14
> #define ETM_OPT_CTXTID2 15
> @@ -25,6 +26,7 @@
> #define ETM_OPT_RETSTK 29
>
> /* ETMv4 CONFIGR programming bits for the ETM OPTs */
> +#define ETM4_CFG_BIT_BB 3
> #define ETM4_CFG_BIT_CYCACC 4
> #define ETM4_CFG_BIT_CTXTID 6
> #define ETM4_CFG_BIT_VMID 7
> diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
> index 293a23bf8be3..c7ef4e9b4a3a 100644
> --- a/tools/perf/arch/arm/util/cs-etm.c
> +++ b/tools/perf/arch/arm/util/cs-etm.c
> @@ -527,6 +527,9 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr)
> if (config_opts & BIT(ETM_OPT_CTXTID2))
> config |= BIT(ETM4_CFG_BIT_VMID) |
> BIT(ETM4_CFG_BIT_VMID_OPT);
> + if (config_opts & BIT(ETM_OPT_BRANCH_BROADCAST))
> + config |= BIT(ETM4_CFG_BIT_BB);
> +
> return config;

Reviewed-by: Suzuki K Poulose <[email protected]>

> }
>

2022-01-31 08:44:11

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] Documentation: coresight: Turn numbered subsections into real subsections

On 13/01/2022 09:10, James Clark wrote:
> This is to allow them to be referenced in a later commit. There was
> also a mistake where sysFS was introduced as section 2, but numbered
> as section 1. And vice versa for 'Using perf framework'. This can't
> happen with unnumbered sections.
>
> Signed-off-by: James Clark <[email protected]>

Looks good to me

2022-02-03 19:39:34

by Mike Leach

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] coresight: Add config flag to enable branch broadcast

Hi James, Suzuki

On Fri, 28 Jan 2022 at 11:19, Suzuki K Poulose <[email protected]> wrote:
>
> On 13/01/2022 09:10, James Clark wrote:
> > When enabled, all taken branch addresses are output, even if the branch
> > was because of a direct branch instruction. This enables reconstruction
> > of the program flow without having access to the memory image of the
> > code being executed.
> >
> > Use bit 8 for the config option which would be the correct bit for
> > programming ETMv3. Although branch broadcast can't be enabled on ETMv3
> > because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the
> > correct bit might help prevent future collisions or allow it to be
> > enabled if needed.
> >
> > Signed-off-by: James Clark <[email protected]>
> > ---
> > drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
> > drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++
> > include/linux/coresight-pmu.h | 2 ++
> > 3 files changed, 14 insertions(+)
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > index c039b6ae206f..43bbd5dc3d3b 100644
> > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
> > * The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
> > * now take them as general formats and apply on all ETMs.
> > */
> > +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST));
> > PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC));
> > /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */
> > PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID));
> > @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = {
> > &format_attr_sinkid.attr,
> > &format_attr_preset.attr,
> > &format_attr_configid.attr,
> > + &format_attr_branch_broadcast.attr,
>
> Does it make sense to hide the option if the bb is not supported ? I
> guess it will be tricky as we don't track the common feature set. So,
> that said...
>
> > NULL,
> > };
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> > index bf18128cf5de..04669ecc0efa 100644
> > --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> > +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> > @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
> > ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset);
> > }
> >
> > + /* branch broadcast - enable if selected and supported */
> > + if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
> > + if (!drvdata->trcbb) {
> > + ret = -EINVAL;
>
> Should we fail here ? We could simply ignore this and generate the trace
> normally. This would work on a big.LITTLE system with one set missing
> the branch broadcast, while the others support.
>
> Mike,
>
> Does this affect the trace decoding ? As such the OpenCSD should be able
> to decode the packets as they appear in the stream. Correct ?
>

Depends on what you mean by affect the trace decoding!
From the simplest perspective - no - there is no issue as the packets
will be decode as seen. THE decoder does not know that BB exists - nor
if it is enabled.

However, the reason that a user may engage BB is that the code is in
some way self modifying - so that following the code static image and
calculating addresses will result in a different path taken.

e.g. imagine an opcode:-

B <address0>

Without BB, this will trace as a single E atom, the decoder will
calculate address0 from the opcode in the static image and continue
from there as the next trace address.

Now look at the case where this is changed on the fly to

B <address1>

With BB, This will trace to
E
TGT_ADDR<address1>

The decoder will initially extract address0 from the static image,
but the immediately following target address packet will alter the
next address traced to address1
This is why we have BB.

So if the user has a reason to engage BB - we should really fail if
it is not present - as the outcome of the trace can be affected.

> Suzuki
>
>
> > + goto out;
> > + } else {
> > + config->cfg |= BIT(ETM4_CFG_BIT_BB);
> > + }
> > + }
> > +
> > out:
> > return ret;
> > }
> > diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
> > index 4ac5c081af93..6c2fd6cc5a98 100644
> > --- a/include/linux/coresight-pmu.h
> > +++ b/include/linux/coresight-pmu.h
> > @@ -18,6 +18,7 @@
> > * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
> > * directly use below macros as config bits.
> > */
> > +#define ETM_OPT_BRANCH_BROADCAST 8
> > #define ETM_OPT_CYCACC 12
> > #define ETM_OPT_CTXTID 14
> > #define ETM_OPT_CTXTID2 15
> > @@ -25,6 +26,7 @@
> > #define ETM_OPT_RETSTK 29
> >
> > /* ETMv4 CONFIGR programming bits for the ETM OPTs */
> > +#define ETM4_CFG_BIT_BB 3
> > #define ETM4_CFG_BIT_CYCACC 4
> > #define ETM4_CFG_BIT_CTXTID 6
> > #define ETM4_CFG_BIT_VMID 7
>


--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

2022-02-15 16:58:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] perf cs-etm: Update deduction of TRCCONFIGR register for branch broadcast

Em Fri, Jan 28, 2022 at 11:25:24AM +0000, Suzuki K Poulose escreveu:
> On 13/01/2022 09:10, James Clark wrote:
> > Now that a config flag for branch broadcast has been added, take it into
> > account when trying to deduce what the driver would have programmed the
> > TRCCONFIGR register to.

Thanks, applied this one, the tools/ part.

- Arnaldo


> > Reviewed-by: Leo Yan <[email protected]>
> > Signed-off-by: James Clark <[email protected]>
> > ---
> > tools/include/linux/coresight-pmu.h | 2 ++
> > tools/perf/arch/arm/util/cs-etm.c | 3 +++
> > 2 files changed, 5 insertions(+)
> >
> > diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h
> > index 4ac5c081af93..6c2fd6cc5a98 100644
> > --- a/tools/include/linux/coresight-pmu.h
> > +++ b/tools/include/linux/coresight-pmu.h
> > @@ -18,6 +18,7 @@
> > * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
> > * directly use below macros as config bits.
> > */
> > +#define ETM_OPT_BRANCH_BROADCAST 8
> > #define ETM_OPT_CYCACC 12
> > #define ETM_OPT_CTXTID 14
> > #define ETM_OPT_CTXTID2 15
> > @@ -25,6 +26,7 @@
> > #define ETM_OPT_RETSTK 29
> > /* ETMv4 CONFIGR programming bits for the ETM OPTs */
> > +#define ETM4_CFG_BIT_BB 3
> > #define ETM4_CFG_BIT_CYCACC 4
> > #define ETM4_CFG_BIT_CTXTID 6
> > #define ETM4_CFG_BIT_VMID 7
> > diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
> > index 293a23bf8be3..c7ef4e9b4a3a 100644
> > --- a/tools/perf/arch/arm/util/cs-etm.c
> > +++ b/tools/perf/arch/arm/util/cs-etm.c
> > @@ -527,6 +527,9 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr)
> > if (config_opts & BIT(ETM_OPT_CTXTID2))
> > config |= BIT(ETM4_CFG_BIT_VMID) |
> > BIT(ETM4_CFG_BIT_VMID_OPT);
> > + if (config_opts & BIT(ETM_OPT_BRANCH_BROADCAST))
> > + config |= BIT(ETM4_CFG_BIT_BB);
> > +
> > return config;
>
> Reviewed-by: Suzuki K Poulose <[email protected]>
>
> > }

--

- Arnaldo

2022-03-11 20:35:43

by Mike Leach

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] coresight: Add config flag to enable branch broadcast

Hi James,

On Fri, 11 Mar 2022 at 14:58, James Clark <[email protected]> wrote:
>
>
>
> On 02/02/2022 20:25, Mike Leach wrote:
> > Hi James, Suzuki
> >
> > On Fri, 28 Jan 2022 at 11:19, Suzuki K Poulose <[email protected]> wrote:
> >>
> >> On 13/01/2022 09:10, James Clark wrote:
> >>> When enabled, all taken branch addresses are output, even if the branch
> >>> was because of a direct branch instruction. This enables reconstruction
> >>> of the program flow without having access to the memory image of the
> >>> code being executed.
> >>>
> >>> Use bit 8 for the config option which would be the correct bit for
> >>> programming ETMv3. Although branch broadcast can't be enabled on ETMv3
> >>> because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the
> >>> correct bit might help prevent future collisions or allow it to be
> >>> enabled if needed.
> >>>
> >>> Signed-off-by: James Clark <[email protected]>
> >>> ---
> >>> drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
> >>> drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++
> >>> include/linux/coresight-pmu.h | 2 ++
> >>> 3 files changed, 14 insertions(+)
> >>>
> >>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> >>> index c039b6ae206f..43bbd5dc3d3b 100644
> >>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> >>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> >>> @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
> >>> * The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
> >>> * now take them as general formats and apply on all ETMs.
> >>> */
> >>> +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST));
> >>> PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC));
> >>> /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */
> >>> PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID));
> >>> @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = {
> >>> &format_attr_sinkid.attr,
> >>> &format_attr_preset.attr,
> >>> &format_attr_configid.attr,
> >>> + &format_attr_branch_broadcast.attr,
> >>
> >> Does it make sense to hide the option if the bb is not supported ? I
> >> guess it will be tricky as we don't track the common feature set. So,
> >> that said...
> >>
> >>> NULL,
> >>> };
> >>>
> >>> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> >>> index bf18128cf5de..04669ecc0efa 100644
> >>> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> >>> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> >>> @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
> >>> ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset);
> >>> }
> >>>
> >>> + /* branch broadcast - enable if selected and supported */
> >>> + if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
> >>> + if (!drvdata->trcbb) {
> >>> + ret = -EINVAL;
> >>
> >> Should we fail here ? We could simply ignore this and generate the trace
> >> normally. This would work on a big.LITTLE system with one set missing
> >> the branch broadcast, while the others support.
> >>
> >> Mike,
> >>
> >> Does this affect the trace decoding ? As such the OpenCSD should be able
> >> to decode the packets as they appear in the stream. Correct ?
> >>
> >
> > Depends on what you mean by affect the trace decoding!
> > From the simplest perspective - no - there is no issue as the packets
> > will be decode as seen. THE decoder does not know that BB exists - nor
> > if it is enabled.
> >
> > However, the reason that a user may engage BB is that the code is in
> > some way self modifying - so that following the code static image and
> > calculating addresses will result in a different path taken.
> >
> > e.g. imagine an opcode:-
> >
> > B <address0>
> >
> > Without BB, this will trace as a single E atom, the decoder will
> > calculate address0 from the opcode in the static image and continue
> > from there as the next trace address.
> >
> > Now look at the case where this is changed on the fly to
> >
> > B <address1>
> >
> > With BB, This will trace to
> > E
> > TGT_ADDR<address1>
> >
> > The decoder will initially extract address0 from the static image,
> > but the immediately following target address packet will alter the
> > next address traced to address1
> > This is why we have BB.
> >
> > So if the user has a reason to engage BB - we should really fail if
> > it is not present - as the outcome of the trace can be affected.
>
> Hi Mike,
>
> Now I'm starting to wonder if it's best to have some kind of non-binary
> image mode for BB where you'd just get a list of addresses output by
> perf script and it doesn't attempt to do any lookups.

Not at all sure what you mean by this

Mike

> Although I think
> that would require a change in OpenCSD if it's not aware of the mode?
>
> I also need to go back to my JVM profiling test and see what's
> going on again. But I think I understand your points a bit more now.
>
> Thanks
> James
>
> >
> >> Suzuki
> >>
> >>
> >>> + goto out;
> >>> + } else {
> >>> + config->cfg |= BIT(ETM4_CFG_BIT_BB);
> >>> + }
> >>> + }
> >>> +
> >>> out:
> >>> return ret;
> >>> }
> >>> diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
> >>> index 4ac5c081af93..6c2fd6cc5a98 100644
> >>> --- a/include/linux/coresight-pmu.h
> >>> +++ b/include/linux/coresight-pmu.h
> >>> @@ -18,6 +18,7 @@
> >>> * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
> >>> * directly use below macros as config bits.
> >>> */
> >>> +#define ETM_OPT_BRANCH_BROADCAST 8
> >>> #define ETM_OPT_CYCACC 12
> >>> #define ETM_OPT_CTXTID 14
> >>> #define ETM_OPT_CTXTID2 15
> >>> @@ -25,6 +26,7 @@
> >>> #define ETM_OPT_RETSTK 29
> >>>
> >>> /* ETMv4 CONFIGR programming bits for the ETM OPTs */
> >>> +#define ETM4_CFG_BIT_BB 3
> >>> #define ETM4_CFG_BIT_CYCACC 4
> >>> #define ETM4_CFG_BIT_CTXTID 6
> >>> #define ETM4_CFG_BIT_VMID 7
> >>
> >
> >



--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

2022-03-11 21:24:51

by Mike Leach

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] coresight: Fail to open with return stacks if they are unavailable

Hi,


On Fri, 11 Mar 2022 at 14:52, James Clark <[email protected]> wrote:
>
>
>
> On 28/01/2022 11:24, Suzuki K Poulose wrote:
> > Hi James
> >
> > On 13/01/2022 09:10, James Clark wrote:
> >> Maintain consistency with the other options by failing to open when they
> >> aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly
> >> added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are
> >> requested but not supported by hardware.
> >
> > Looking at this again (with similar comment to the Branch Broadcast),
> > won't it disable using retstack on all CPUs, even when some of them
> > support it ?
> >
> > i.e., CPU0 - supports retstack, CPU1 - doesn't
> >
> > A perf run with retstack will fail, as CPU1 doesn't support it (even
> > though we advertise it, unconditionally).
> >
> > So, if we ignore the failure, this would still allow CPU0 to use
> > the feature and as long as the OpenCSD is able to decode the trace
> > we should ignore the failure ?
> >
> > I think we may also need to tune the etm4x_enable_hw() to skip
> > updating the TRCCONFIGR with features not supported by the ETM
> >
>
> Hi Suzuki,
>
> I'm picking up this branch broadcast change again after the haitus.
>
> For this point, do you think it would be worth distinguishing between "no
> known CPUs that support the feature" vs "not currently running on a
> CPU that supports it but there are others that do"?
>
> Also would we want to distinguish between per-CPU or per-process events?
> For the former it actually is possible to fail to open because all of
> the information is known.
>
> I'm just thinking of the case where someone asks for a load of flags
> and thinks that they're getting them but get no feedback that they won't.
> But I understand having some complicated solution like I'm suggesting
> might be even more surprising to users.
>
> Maybe the cleanest solution is to ask users to supply a config that
> can work on anywhere the event could possibly be scheduled. It doesn't
> really make sense to have retstack on a per-process event on big-little
> and then getting half of one type of data and half of another. It would
> make more sense to fail to open in that case and they have the choice of
> either doing per-CPU events or disabling retstacks altogether.
>

return stack has no effect on the decoder output whatsoever. The only
effect is to reduce the amount of traced addresses at the input
(leaving more space for other trace),
so it is irrelevant if CPU0 supports it but CPU1 doesn't.

sequence:

BL r0 (return stack is used only on link instructions)
...
RET

will output trace:-
ATOM E (BL r0)
...
ADDR_ELEM <ret addr>
ATOM E (RET)

for no return stack,

ATOM E (BL r0)
...
ATOM E (RET)

fior return stack.

In both cases the decoder will push the address after BL r0 onto its
return stack.

In the first case the decoder will use the supplied address, in the
second will pop the top of its return stack.

The decode output in both cases will be "branched to r0, ran code,
returned via link register"

The outcome is identical for the client. So the case for not tracing
on a core that does not have return stack if specified is weak.

Perhaps a warning will be sufficient?

Mike



> This seems like a similar problem to the issue causing the Coresight self
> test failure where a certain sink was picked that couldn't be reached and
> the test failed.
>
> In that case the change we made doesn't quite match up to my suggestion here:
>
> * Per-cpu but an unreachable sink -> fail
> * Per-process and potentially reachable sink in the future -> pass
>
> Maybe it would have been better to say that the sink always has to be
> reachable otherwise is the outcome predicatable?
>
> James
>
> > Suzuki
> >
> >
> >>
> >> The consequence of not doing this is that the user may not be
> >> aware that they are not enabling the feature as it is silently disabled.
> >>
> >> Signed-off-by: James Clark <[email protected]>
> >> ---
> >> drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++----
> >> 1 file changed, 9 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> >> index 04669ecc0efa..a93c1a5fe045 100644
> >> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> >> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> >> @@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
> >> }
> >> /* return stack - enable if selected and supported */
> >> - if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
> >> - /* bit[12], Return stack enable bit */
> >> - config->cfg |= BIT(12);
> >> -
> >> + if (attr->config & BIT(ETM_OPT_RETSTK)) {
> >> + if (!drvdata->retstack) {
> >> + ret = -EINVAL;
> >> + goto out;
> >> + } else {
> >> + /* bit[12], Return stack enable bit */
> >> + config->cfg |= BIT(12);
> >> + }
> >> + }
> >> /*
> >> * Set any selected configuration and preset.
> >> *
> >



--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

2022-03-11 22:25:01

by James Clark

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] coresight: Fail to open with return stacks if they are unavailable



On 28/01/2022 11:24, Suzuki K Poulose wrote:
> Hi James
>
> On 13/01/2022 09:10, James Clark wrote:
>> Maintain consistency with the other options by failing to open when they
>> aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly
>> added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are
>> requested but not supported by hardware.
>
> Looking at this again (with similar comment to the Branch Broadcast),
> won't it disable using retstack on all CPUs, even when some of them
> support it ?
>
> i.e., CPU0 - supports retstack, CPU1 - doesn't
>
> A perf run with retstack will fail, as CPU1 doesn't support it (even
> though we advertise it, unconditionally).
>
> So, if we ignore the failure, this would still allow CPU0 to use
> the feature and as long as the OpenCSD is able to decode the trace
> we should ignore the failure ?
>
> I think we may also need to tune the etm4x_enable_hw() to skip
> updating the TRCCONFIGR with features not supported by the ETM
>

Hi Suzuki,

I'm picking up this branch broadcast change again after the haitus.

For this point, do you think it would be worth distinguishing between "no
known CPUs that support the feature" vs "not currently running on a
CPU that supports it but there are others that do"?

Also would we want to distinguish between per-CPU or per-process events?
For the former it actually is possible to fail to open because all of
the information is known.

I'm just thinking of the case where someone asks for a load of flags
and thinks that they're getting them but get no feedback that they won't.
But I understand having some complicated solution like I'm suggesting
might be even more surprising to users.

Maybe the cleanest solution is to ask users to supply a config that
can work on anywhere the event could possibly be scheduled. It doesn't
really make sense to have retstack on a per-process event on big-little
and then getting half of one type of data and half of another. It would
make more sense to fail to open in that case and they have the choice of
either doing per-CPU events or disabling retstacks altogether.

This seems like a similar problem to the issue causing the Coresight self
test failure where a certain sink was picked that couldn't be reached and
the test failed.

In that case the change we made doesn't quite match up to my suggestion here:

* Per-cpu but an unreachable sink -> fail
* Per-process and potentially reachable sink in the future -> pass

Maybe it would have been better to say that the sink always has to be
reachable otherwise is the outcome predicatable?

James

> Suzuki
>
>
>>
>> The consequence of not doing this is that the user may not be
>> aware that they are not enabling the feature as it is silently disabled.
>>
>> Signed-off-by: James Clark <[email protected]>
>> ---
>>   drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++----
>>   1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>> index 04669ecc0efa..a93c1a5fe045 100644
>> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
>> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>> @@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
>>       }
>>         /* return stack - enable if selected and supported */
>> -    if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
>> -        /* bit[12], Return stack enable bit */
>> -        config->cfg |= BIT(12);
>> -
>> +    if (attr->config & BIT(ETM_OPT_RETSTK)) {
>> +        if (!drvdata->retstack) {
>> +            ret = -EINVAL;
>> +            goto out;
>> +        } else {
>> +            /* bit[12], Return stack enable bit */
>> +            config->cfg |= BIT(12);
>> +        }
>> +    }
>>       /*
>>        * Set any selected configuration and preset.
>>        *
>

2022-03-11 23:08:37

by James Clark

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] coresight: Add config flag to enable branch broadcast



On 02/02/2022 20:25, Mike Leach wrote:
> Hi James, Suzuki
>
> On Fri, 28 Jan 2022 at 11:19, Suzuki K Poulose <[email protected]> wrote:
>>
>> On 13/01/2022 09:10, James Clark wrote:
>>> When enabled, all taken branch addresses are output, even if the branch
>>> was because of a direct branch instruction. This enables reconstruction
>>> of the program flow without having access to the memory image of the
>>> code being executed.
>>>
>>> Use bit 8 for the config option which would be the correct bit for
>>> programming ETMv3. Although branch broadcast can't be enabled on ETMv3
>>> because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the
>>> correct bit might help prevent future collisions or allow it to be
>>> enabled if needed.
>>>
>>> Signed-off-by: James Clark <[email protected]>
>>> ---
>>> drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>>> drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++
>>> include/linux/coresight-pmu.h | 2 ++
>>> 3 files changed, 14 insertions(+)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> index c039b6ae206f..43bbd5dc3d3b 100644
>>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
>>> * The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
>>> * now take them as general formats and apply on all ETMs.
>>> */
>>> +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST));
>>> PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC));
>>> /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */
>>> PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID));
>>> @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = {
>>> &format_attr_sinkid.attr,
>>> &format_attr_preset.attr,
>>> &format_attr_configid.attr,
>>> + &format_attr_branch_broadcast.attr,
>>
>> Does it make sense to hide the option if the bb is not supported ? I
>> guess it will be tricky as we don't track the common feature set. So,
>> that said...
>>
>>> NULL,
>>> };
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>> index bf18128cf5de..04669ecc0efa 100644
>>> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>> @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
>>> ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset);
>>> }
>>>
>>> + /* branch broadcast - enable if selected and supported */
>>> + if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
>>> + if (!drvdata->trcbb) {
>>> + ret = -EINVAL;
>>
>> Should we fail here ? We could simply ignore this and generate the trace
>> normally. This would work on a big.LITTLE system with one set missing
>> the branch broadcast, while the others support.
>>
>> Mike,
>>
>> Does this affect the trace decoding ? As such the OpenCSD should be able
>> to decode the packets as they appear in the stream. Correct ?
>>
>
> Depends on what you mean by affect the trace decoding!
> From the simplest perspective - no - there is no issue as the packets
> will be decode as seen. THE decoder does not know that BB exists - nor
> if it is enabled.
>
> However, the reason that a user may engage BB is that the code is in
> some way self modifying - so that following the code static image and
> calculating addresses will result in a different path taken.
>
> e.g. imagine an opcode:-
>
> B <address0>
>
> Without BB, this will trace as a single E atom, the decoder will
> calculate address0 from the opcode in the static image and continue
> from there as the next trace address.
>
> Now look at the case where this is changed on the fly to
>
> B <address1>
>
> With BB, This will trace to
> E
> TGT_ADDR<address1>
>
> The decoder will initially extract address0 from the static image,
> but the immediately following target address packet will alter the
> next address traced to address1
> This is why we have BB.
>
> So if the user has a reason to engage BB - we should really fail if
> it is not present - as the outcome of the trace can be affected.

Hi Mike,

Now I'm starting to wonder if it's best to have some kind of non-binary
image mode for BB where you'd just get a list of addresses output by
perf script and it doesn't attempt to do any lookups. Although I think
that would require a change in OpenCSD if it's not aware of the mode?

I also need to go back to my JVM profiling test and see what's
going on again. But I think I understand your points a bit more now.

Thanks
James

>
>> Suzuki
>>
>>
>>> + goto out;
>>> + } else {
>>> + config->cfg |= BIT(ETM4_CFG_BIT_BB);
>>> + }
>>> + }
>>> +
>>> out:
>>> return ret;
>>> }
>>> diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
>>> index 4ac5c081af93..6c2fd6cc5a98 100644
>>> --- a/include/linux/coresight-pmu.h
>>> +++ b/include/linux/coresight-pmu.h
>>> @@ -18,6 +18,7 @@
>>> * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
>>> * directly use below macros as config bits.
>>> */
>>> +#define ETM_OPT_BRANCH_BROADCAST 8
>>> #define ETM_OPT_CYCACC 12
>>> #define ETM_OPT_CTXTID 14
>>> #define ETM_OPT_CTXTID2 15
>>> @@ -25,6 +26,7 @@
>>> #define ETM_OPT_RETSTK 29
>>>
>>> /* ETMv4 CONFIGR programming bits for the ETM OPTs */
>>> +#define ETM4_CFG_BIT_BB 3
>>> #define ETM4_CFG_BIT_CYCACC 4
>>> #define ETM4_CFG_BIT_CTXTID 6
>>> #define ETM4_CFG_BIT_VMID 7
>>
>
>

2022-04-22 17:53:57

by James Clark

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] coresight: Add config flag to enable branch broadcast



On 11/03/2022 15:56, Mike Leach wrote:
> Hi James,
>
> On Fri, 11 Mar 2022 at 14:58, James Clark <[email protected]> wrote:
>>
>>
>>
>> On 02/02/2022 20:25, Mike Leach wrote:
>>> Hi James, Suzuki
>>>
>>> On Fri, 28 Jan 2022 at 11:19, Suzuki K Poulose <[email protected]> wrote:
>>>>
>>>> On 13/01/2022 09:10, James Clark wrote:
>>>>> When enabled, all taken branch addresses are output, even if the branch
>>>>> was because of a direct branch instruction. This enables reconstruction
>>>>> of the program flow without having access to the memory image of the
>>>>> code being executed.
>>>>>
>>>>> Use bit 8 for the config option which would be the correct bit for
>>>>> programming ETMv3. Although branch broadcast can't be enabled on ETMv3
>>>>> because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the
>>>>> correct bit might help prevent future collisions or allow it to be
>>>>> enabled if needed.
>>>>>
>>>>> Signed-off-by: James Clark <[email protected]>
>>>>> ---
>>>>> drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>>>>> drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++
>>>>> include/linux/coresight-pmu.h | 2 ++
>>>>> 3 files changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>>>> index c039b6ae206f..43bbd5dc3d3b 100644
>>>>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>>>>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>>>> @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
>>>>> * The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
>>>>> * now take them as general formats and apply on all ETMs.
>>>>> */
>>>>> +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST));
>>>>> PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC));
>>>>> /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */
>>>>> PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID));
>>>>> @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = {
>>>>> &format_attr_sinkid.attr,
>>>>> &format_attr_preset.attr,
>>>>> &format_attr_configid.attr,
>>>>> + &format_attr_branch_broadcast.attr,
>>>>
>>>> Does it make sense to hide the option if the bb is not supported ? I
>>>> guess it will be tricky as we don't track the common feature set. So,
>>>> that said...>>>>
>>>>> NULL,
>>>>> };
>>>>>
>>>>> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>>>> index bf18128cf5de..04669ecc0efa 100644
>>>>> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>>>> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>>>> @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
>>>>> ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset);
>>>>> }
>>>>>
>>>>> + /* branch broadcast - enable if selected and supported */
>>>>> + if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
>>>>> + if (!drvdata->trcbb) {
>>>>> + ret = -EINVAL;
>>>>
>>>> Should we fail here ? We could simply ignore this and generate the trace
>>>> normally. This would work on a big.LITTLE system with one set missing
>>>> the branch broadcast, while the others support.

Hi Suzuki,

I think if the user really needs branch broadcast because they've modified their
binaries then they would want feedback that one of the cores doesn't support it.
Otherwise their decode would silently go wrong and they could get stuck.

Perf already opens an event per core, so it wouldn't be much effort to manually
modify the command line to only select cores where it is supported.

I think for that scenario the current patch already works that way because the
feature is checked on the active core when the event is enabled?

James

>>>>
>>>> Mike,
>>>>
>>>> Does this affect the trace decoding ? As such the OpenCSD should be able
>>>> to decode the packets as they appear in the stream. Correct ?
>>>>
>>>
>>> Depends on what you mean by affect the trace decoding!
>>> From the simplest perspective - no - there is no issue as the packets
>>> will be decode as seen. THE decoder does not know that BB exists - nor
>>> if it is enabled.
>>>
>>> However, the reason that a user may engage BB is that the code is in
>>> some way self modifying - so that following the code static image and
>>> calculating addresses will result in a different path taken.
>>>
>>> e.g. imagine an opcode:-
>>>
>>> B <address0>
>>>
>>> Without BB, this will trace as a single E atom, the decoder will
>>> calculate address0 from the opcode in the static image and continue
>>> from there as the next trace address.
>>>
>>> Now look at the case where this is changed on the fly to
>>>
>>> B <address1>
>>>
>>> With BB, This will trace to
>>> E
>>> TGT_ADDR<address1>
>>>
>>> The decoder will initially extract address0 from the static image,
>>> but the immediately following target address packet will alter the
>>> next address traced to address1
>>> This is why we have BB.
>>>
>>> So if the user has a reason to engage BB - we should really fail if
>>> it is not present - as the outcome of the trace can be affected.
>>
>> Hi Mike,
>>
>> Now I'm starting to wonder if it's best to have some kind of non-binary
>> image mode for BB where you'd just get a list of addresses output by
>> perf script and it doesn't attempt to do any lookups.
>
> Not at all sure what you mean by this

Hi Mike,

I'm not sure if I really understood it either! Now I see what you mean by
the decoding issues in Perf and how it doesn't make sense if there is no
method to supply modified images.

Are you ok with adding this if I add to the new documentation that there is
currently no support in Perf and the behavior will be wrong if the binary
was modified? But maybe other users of perf_event_open might find it useful?

Thanks
James

>
> Mike
>
>> Although I think
>> that would require a change in OpenCSD if it's not aware of the mode?
>>
>> I also need to go back to my JVM profiling test and see what's
>> going on again. But I think I understand your points a bit more now.
>>
>> Thanks
>> James
>>
>>>
>>>> Suzuki
>>>>
>>>>
>>>>> + goto out;
>>>>> + } else {
>>>>> + config->cfg |= BIT(ETM4_CFG_BIT_BB);
>>>>> + }
>>>>> + }
>>>>> +
>>>>> out:
>>>>> return ret;
>>>>> }
>>>>> diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
>>>>> index 4ac5c081af93..6c2fd6cc5a98 100644
>>>>> --- a/include/linux/coresight-pmu.h
>>>>> +++ b/include/linux/coresight-pmu.h
>>>>> @@ -18,6 +18,7 @@
>>>>> * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and
>>>>> * directly use below macros as config bits.
>>>>> */
>>>>> +#define ETM_OPT_BRANCH_BROADCAST 8
>>>>> #define ETM_OPT_CYCACC 12
>>>>> #define ETM_OPT_CTXTID 14
>>>>> #define ETM_OPT_CTXTID2 15
>>>>> @@ -25,6 +26,7 @@
>>>>> #define ETM_OPT_RETSTK 29
>>>>>
>>>>> /* ETMv4 CONFIGR programming bits for the ETM OPTs */
>>>>> +#define ETM4_CFG_BIT_BB 3
>>>>> #define ETM4_CFG_BIT_CYCACC 4
>>>>> #define ETM4_CFG_BIT_CTXTID 6
>>>>> #define ETM4_CFG_BIT_VMID 7
>>>>
>>>
>>>
>
>
>

2022-04-22 21:18:32

by James Clark

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] coresight: Fail to open with return stacks if they are unavailable



On 11/03/2022 15:53, Mike Leach wrote:
> Hi,
>
>
> On Fri, 11 Mar 2022 at 14:52, James Clark <[email protected]> wrote:
>>
>>
>>
>> On 28/01/2022 11:24, Suzuki K Poulose wrote:
>>> Hi James
>>>
>>> On 13/01/2022 09:10, James Clark wrote:
>>>> Maintain consistency with the other options by failing to open when they
>>>> aren't supported. For example ETM_OPT_TS, ETM_OPT_CTXTID2 and the newly
>>>> added ETM_OPT_BRANCH_BROADCAST all return with -EINVAL if they are
>>>> requested but not supported by hardware.
>>>
>>> Looking at this again (with similar comment to the Branch Broadcast),
>>> won't it disable using retstack on all CPUs, even when some of them
>>> support it ?
>>>
>>> i.e., CPU0 - supports retstack, CPU1 - doesn't
>>>
>>> A perf run with retstack will fail, as CPU1 doesn't support it (even
>>> though we advertise it, unconditionally).
>>>
>>> So, if we ignore the failure, this would still allow CPU0 to use
>>> the feature and as long as the OpenCSD is able to decode the trace
>>> we should ignore the failure ?
>>>
>>> I think we may also need to tune the etm4x_enable_hw() to skip
>>> updating the TRCCONFIGR with features not supported by the ETM
>>>
>>
>> Hi Suzuki,
>>
>> I'm picking up this branch broadcast change again after the haitus.
>>
>> For this point, do you think it would be worth distinguishing between "no
>> known CPUs that support the feature" vs "not currently running on a
>> CPU that supports it but there are others that do"?
>>
>> Also would we want to distinguish between per-CPU or per-process events?
>> For the former it actually is possible to fail to open because all of
>> the information is known.
>>
>> I'm just thinking of the case where someone asks for a load of flags
>> and thinks that they're getting them but get no feedback that they won't.
>> But I understand having some complicated solution like I'm suggesting
>> might be even more surprising to users.
>>
>> Maybe the cleanest solution is to ask users to supply a config that
>> can work on anywhere the event could possibly be scheduled. It doesn't
>> really make sense to have retstack on a per-process event on big-little
>> and then getting half of one type of data and half of another. It would
>> make more sense to fail to open in that case and they have the choice of
>> either doing per-CPU events or disabling retstacks altogether.
>>
>
> return stack has no effect on the decoder output whatsoever. The only
> effect is to reduce the amount of traced addresses at the input
> (leaving more space for other trace),
> so it is irrelevant if CPU0 supports it but CPU1 doesn't.
>
> sequence:
>
> BL r0 (return stack is used only on link instructions)
> ...
> RET
>
> will output trace:-
> ATOM E (BL r0)
> ...
> ADDR_ELEM <ret addr>
> ATOM E (RET)
>
> for no return stack,
>
> ATOM E (BL r0)
> ...
> ATOM E (RET)
>
> fior return stack.
>
> In both cases the decoder will push the address after BL r0 onto its
> return stack.
>
> In the first case the decoder will use the supplied address, in the
> second will pop the top of its return stack.
>
> The decode output in both cases will be "branched to r0, ran code,
> returned via link register"
>
> The outcome is identical for the client. So the case for not tracing
> on a core that does not have return stack if specified is weak.
>
> Perhaps a warning will be sufficient?
>
> Mike
>
>
>
>> This seems like a similar problem to the issue causing the Coresight self
>> test failure where a certain sink was picked that couldn't be reached and
>> the test failed.
>>
>> In that case the change we made doesn't quite match up to my suggestion here:
>>
>> * Per-cpu but an unreachable sink -> fail
>> * Per-process and potentially reachable sink in the future -> pass
>>
>> Maybe it would have been better to say that the sink always has to be
>> reachable otherwise is the outcome predicatable?

Hi Mike,

If it has no effect on the output then it makes sense to me to just drop this
patch. I think even a warning would not add much and as far as I know they are
discouraged.

James

>>
>> James
>>
>>> Suzuki
>>>
>>>
>>>>
>>>> The consequence of not doing this is that the user may not be
>>>> aware that they are not enabling the feature as it is silently disabled.
>>>>
>>>> Signed-off-by: James Clark <[email protected]>
>>>> ---
>>>> drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++----
>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>>> index 04669ecc0efa..a93c1a5fe045 100644
>>>> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>>> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>>> @@ -674,10 +674,15 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
>>>> }
>>>> /* return stack - enable if selected and supported */
>>>> - if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
>>>> - /* bit[12], Return stack enable bit */
>>>> - config->cfg |= BIT(12);
>>>> -
>>>> + if (attr->config & BIT(ETM_OPT_RETSTK)) {
>>>> + if (!drvdata->retstack) {
>>>> + ret = -EINVAL;
>>>> + goto out;
>>>> + } else {
>>>> + /* bit[12], Return stack enable bit */
>>>> + config->cfg |= BIT(12);
>>>> + }
>>>> + }
>>>> /*
>>>> * Set any selected configuration and preset.
>>>> *
>>>
>
>
>

2022-05-04 20:09:25

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] coresight: Add config flag to enable branch broadcast

On 22/04/2022 11:18, James Clark wrote:
>
>
> On 11/03/2022 15:56, Mike Leach wrote:
>> Hi James,
>>
>> On Fri, 11 Mar 2022 at 14:58, James Clark <[email protected]> wrote:
>>>
>>>
>>>
>>> On 02/02/2022 20:25, Mike Leach wrote:
>>>> Hi James, Suzuki
>>>>
>>>> On Fri, 28 Jan 2022 at 11:19, Suzuki K Poulose <[email protected]> wrote:
>>>>>
>>>>> On 13/01/2022 09:10, James Clark wrote:
>>>>>> When enabled, all taken branch addresses are output, even if the branch
>>>>>> was because of a direct branch instruction. This enables reconstruction
>>>>>> of the program flow without having access to the memory image of the
>>>>>> code being executed.
>>>>>>
>>>>>> Use bit 8 for the config option which would be the correct bit for
>>>>>> programming ETMv3. Although branch broadcast can't be enabled on ETMv3
>>>>>> because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the
>>>>>> correct bit might help prevent future collisions or allow it to be
>>>>>> enabled if needed.
>>>>>>
>>>>>> Signed-off-by: James Clark <[email protected]>
>>>>>> ---
>>>>>> drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>>>>>> drivers/hwtracing/coresight/coresight-etm4x-core.c | 10 ++++++++++
>>>>>> include/linux/coresight-pmu.h | 2 ++
>>>>>> 3 files changed, 14 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>>>>> index c039b6ae206f..43bbd5dc3d3b 100644
>>>>>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>>>>>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>>>>> @@ -52,6 +52,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);
>>>>>> * The PMU formats were orignally for ETMv3.5/PTM's ETMCR 'config';
>>>>>> * now take them as general formats and apply on all ETMs.
>>>>>> */
>>>>>> +PMU_FORMAT_ATTR(branch_broadcast, "config:"__stringify(ETM_OPT_BRANCH_BROADCAST));
>>>>>> PMU_FORMAT_ATTR(cycacc, "config:" __stringify(ETM_OPT_CYCACC));
>>>>>> /* contextid1 enables tracing CONTEXTIDR_EL1 for ETMv4 */
>>>>>> PMU_FORMAT_ATTR(contextid1, "config:" __stringify(ETM_OPT_CTXTID));
>>>>>> @@ -97,6 +98,7 @@ static struct attribute *etm_config_formats_attr[] = {
>>>>>> &format_attr_sinkid.attr,
>>>>>> &format_attr_preset.attr,
>>>>>> &format_attr_configid.attr,
>>>>>> + &format_attr_branch_broadcast.attr,
>>>>>
>>>>> Does it make sense to hide the option if the bb is not supported ? I
>>>>> guess it will be tricky as we don't track the common feature set. So,
>>>>> that said...>>>>
>>>>>> NULL,
>>>>>> };
>>>>>>
>>>>>> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>>>>> index bf18128cf5de..04669ecc0efa 100644
>>>>>> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>>>>> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>>>>> @@ -692,6 +692,16 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
>>>>>> ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset);
>>>>>> }
>>>>>>
>>>>>> + /* branch broadcast - enable if selected and supported */
>>>>>> + if (attr->config & BIT(ETM_OPT_BRANCH_BROADCAST)) {
>>>>>> + if (!drvdata->trcbb) {
>>>>>> + ret = -EINVAL;
>>>>>
>>>>> Should we fail here ? We could simply ignore this and generate the trace
>>>>> normally. This would work on a big.LITTLE system with one set missing
>>>>> the branch broadcast, while the others support.
>
> Hi Suzuki,
>
> I think if the user really needs branch broadcast because they've modified their
> binaries then they would want feedback that one of the cores doesn't support it.
> Otherwise their decode would silently go wrong and they could get stuck.
>
> Perf already opens an event per core, so it wouldn't be much effort to manually
> modify the command line to only select cores where it is supported.
>
> I think for that scenario the current patch already works that way because the
> feature is checked on the active core when the event is enabled?

Fair enough. We could let the user pin the perf to the CPUs where this
is supported. This information is available to the user via sysfs.

So, let us proceed with the changes here.

Cheers
Suzuki