2021-09-21 13:45:41

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds

This series adds CPU erratum work arounds related to the self-hosted
tracing. The list of affected errata handled in this series are :

* TRBE may overwrite trace in FILL mode
- Arm Neoverse-N2 #2139208
- Cortex-A710 #211985

* A TSB instruction may not flush the trace completely when executed
in trace prohibited region.

- Arm Neoverse-N2 #2067961
- Cortex-A710 #2054223

* TRBE may write to out-of-range address
- Arm Neoverse-N2 #2253138
- Cortex-A710 #2224489

The series applies on the self-hosted/trbe fixes posted here [0].
A tree containing both the series is available here [1]

[0] https://lkml.kernel.org/r/[email protected]
[1] [email protected]:linux-arm/linux-skp.git coresight/errata/trbe-tsb-n2-a710/v2

Changes since v1:
https://lkml.kernel.org/r/[email protected]
- Added a fix to the TRBE driver handling of sink_specific data
- Added more description and ASCII art for overwrite in FILL mode
work around
- Added another TRBE erratum to the list.
"TRBE may write to out-of-range address"
Patches from 12-17
- Added comment to list the expectations around TSB erratum workaround.


Suzuki K Poulose (17):
coresight: trbe: Fix incorrect access of the sink specific data
coresight: trbe: Add infrastructure for Errata handling
coresight: trbe: Add a helper to calculate the trace generated
coresight: trbe: Add a helper to pad a given buffer area
coresight: trbe: Decouple buffer base from the hardware base
coresight: trbe: Allow driver to choose a different alignment
arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
arm64: Add erratum detection for TRBE overwrite in FILL mode
coresight: trbe: Workaround TRBE errata overwrite in FILL mode
arm64: Enable workaround for TRBE overwrite in FILL mode
arm64: errata: Add workaround for TSB flush failures
coresight: trbe: Add a helper to fetch cpudata from perf handle
coresight: trbe: Add a helper to determine the minimum buffer size
coresight: trbe: Make sure we have enough space
arm64: Add erratum detection for TRBE write to out-of-range
coresight: trbe: Work around write to out of range
arm64: Advertise TRBE erratum workaround for write to out-of-range address

Documentation/arm64/silicon-errata.rst | 12 +
arch/arm64/Kconfig | 109 ++++++
arch/arm64/include/asm/barrier.h | 16 +-
arch/arm64/include/asm/cputype.h | 4 +
arch/arm64/kernel/cpu_errata.c | 64 ++++
arch/arm64/tools/cpucaps | 3 +
drivers/hwtracing/coresight/coresight-trbe.c | 339 +++++++++++++++++--
7 files changed, 510 insertions(+), 37 deletions(-)

--
2.24.1


2021-09-21 13:45:42

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address

Add Kconfig entries for the errata workarounds for TRBE writing
to an out-of-range address.

Cc: Mathieu Poirier <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
Documentation/arm64/silicon-errata.rst | 4 +++
arch/arm64/Kconfig | 39 ++++++++++++++++++++++++++
2 files changed, 43 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index 569a92411dcd..5342e895fb60 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -96,6 +96,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-A710 | #2224489 | ARM64_ERRATUM_2224489 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1349291 | N/A |
@@ -106,6 +108,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N2 | #2067961 | ARM64_ERRATUM_2067961 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Neoverse-N2 | #2253138 | ARM64_ERRATUM_2253138 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-500 | #841119,826419 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0764774e12bb..611ae02aabbd 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -736,6 +736,45 @@ config ARM64_ERRATUM_2067961

If unsure, say Y.

+config ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
+ bool
+
+config ARM64_ERRATUM_2253138
+ bool "Neoverse-N2: 2253138: workaround TRBE writing to address out-of-range"
+ depends on CORESIGHT_TRBE
+ default y
+ select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
+ help
+ This option adds the workaround for ARM Neoverse-N2 erratum 2253138.
+
+ Affected Neoverse-N2 cores might write to an out-of-range address, not reserved
+ for TRBE. Under some conditions, the TRBE might generate a write to the next
+ virtually addressed page following the last page of the TRBE address space
+ (i.e, the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
+
+ We work around this in the driver by, always making sure that there is a
+ page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
+
+ If unsure, say Y.
+
+config ARM64_ERRATUM_2224489
+ bool "Cortex-A710: 2224489: workaround TRBE writing to address out-of-range"
+ depends on CORESIGHT_TRBE
+ default y
+ select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
+ help
+ This option adds the workaround for ARM Cortex-A710 erratum 2224489.
+
+ Affected Cortex-A710 cores might write to an out-of-range address, not reserved
+ for TRBE. Under some conditions, the TRBE might generate a write to the next
+ virtually addressed page following the last page of the TRBE address space
+ (i.e, the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
+
+ We work around this in the driver by, always making sure that there is a
+ page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
--
2.24.1

2021-09-21 13:45:56

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 14/17] coresight: trbe: Make sure we have enough space

The TRBE driver makes sure that there is enough space for a meaningful
run, otherwise pads the given space and restarts the offset calculation
once. But there is no guarantee that we may find space or hit "no space".
Make sure that we repeat the step until, either :
- We have the minimum space
OR
- There is NO space at all.

Cc: Anshuman Khandual <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
drivers/hwtracing/coresight/coresight-trbe.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 3373f4e2183b..02f9e00e2091 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -451,10 +451,14 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
* If the head is too close to the limit and we don't
* have space for a meaningful run, we rather pad it
* and start fresh.
+ *
+ * We might have to do this more than once to make sure
+ * we have enough required space.
*/
- if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
+ while (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
trbe_pad_buf(handle, limit - head);
limit = __trbe_normal_offset(handle);
+ head = PERF_IDX2OFF(handle->head, buf);
}
return limit;
}
--
2.24.1

2021-09-21 13:46:03

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated

We collect the trace from the TRBE on FILL event from IRQ context
and when via update_buffer(), when the event is stopped. Let us
consolidate how we calculate the trace generated into a helper.

Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++--------
1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 63f7edd5fd1f..063c4505a203 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
return TRBE_FAULT_ACT_SPURIOUS;
}

+static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
+ struct trbe_buf *buf,
+ bool wrap)
+{
+ u64 write;
+ u64 start_off, end_off;
+
+ /*
+ * If the TRBE has wrapped around the write pointer has
+ * wrapped and should be treated as limit.
+ */
+ if (wrap)
+ write = get_trbe_limit_pointer();
+ else
+ write = get_trbe_write_pointer();
+
+ end_off = write - buf->trbe_base;
+ start_off = PERF_IDX2OFF(handle->head, buf);
+
+ if (WARN_ON_ONCE(end_off < start_off))
+ return 0;
+ return (end_off - start_off);
+}
+
static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
struct perf_event *event, void **pages,
int nr_pages, bool snapshot)
@@ -588,9 +612,9 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
struct trbe_buf *buf = config;
enum trbe_fault_action act;
- unsigned long size, offset;
- unsigned long write, base, status;
+ unsigned long size, status;
unsigned long flags;
+ bool wrap = false;

WARN_ON(buf->cpudata != cpudata);
WARN_ON(cpudata->cpu != smp_processor_id());
@@ -630,8 +654,6 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
* handle gets freed in etm_event_stop().
*/
trbe_drain_and_disable_local();
- write = get_trbe_write_pointer();
- base = get_trbe_base_pointer();

/* Check if there is a pending interrupt and handle it here */
status = read_sysreg_s(SYS_TRBSR_EL1);
@@ -655,20 +677,11 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
goto done;
}

- /*
- * Otherwise, the buffer is full and the write pointer
- * has reached base. Adjust this back to the Limit pointer
- * for correct size. Also, mark the buffer truncated.
- */
- write = get_trbe_limit_pointer();
perf_aux_output_flag(handle, PERF_AUX_FLAG_COLLISION);
+ wrap = true;
}

- offset = write - base;
- if (WARN_ON_ONCE(offset < PERF_IDX2OFF(handle->head, buf)))
- size = 0;
- else
- size = offset - PERF_IDX2OFF(handle->head, buf);
+ size = trbe_get_trace_size(handle, buf, wrap);

done:
local_irq_restore(flags);
@@ -749,11 +762,10 @@ static int trbe_handle_overflow(struct perf_output_handle *handle)
{
struct perf_event *event = handle->event;
struct trbe_buf *buf = etm_perf_sink_config(handle);
- unsigned long offset, size;
+ unsigned long size;
struct etm_event_data *event_data;

- offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
- size = offset - PERF_IDX2OFF(handle->head, buf);
+ size = trbe_get_trace_size(handle, buf, true);
if (buf->snapshot)
handle->head += size;

--
2.24.1

2021-09-21 13:46:03

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures

Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
from errata, where a TSB (trace synchronization barrier)
fails to flush the trace data completely, when executed from
a trace prohibited region. In Linux we always execute it
after we have moved the PE to trace prohibited region. So,
we can apply the workaround everytime a TSB is executed.

The work around is to issue two TSB consecutively.

NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
that a late CPU could be blocked from booting if it is the
first CPU that requires the workaround. This is because we
do not allow setting a cpu_hwcaps after the SMP boot. The
other alternative is to use "this_cpu_has_cap()" instead
of the faster system wide check, which may be a bit of an
overhead, given we may have to do this in nvhe KVM host
before a guest entry.

Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Marc Zyngier <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
Changes since v1:
- Switch to cpus_have_final_cap()
- Document the requirements on TSB.
---
Documentation/arm64/silicon-errata.rst | 4 ++++
arch/arm64/Kconfig | 31 ++++++++++++++++++++++++++
arch/arm64/include/asm/barrier.h | 16 ++++++++++++-
arch/arm64/kernel/cpu_errata.c | 19 ++++++++++++++++
arch/arm64/tools/cpucaps | 1 +
5 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index 2f99229d993c..569a92411dcd 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -94,6 +94,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1349291 | N/A |
@@ -102,6 +104,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Neoverse-N2 | #2067961 | ARM64_ERRATUM_2067961 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-500 | #841119,826419 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eac4030322df..0764774e12bb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -705,6 +705,37 @@ config ARM64_ERRATUM_2139208

If unsure, say Y.

+config ARM64_WORKAROUND_TSB_FLUSH_FAILURE
+ bool
+
+config ARM64_ERRATUM_2054223
+ bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace"
+ default y
+ help
+ Enable workaround for ARM Cortex-A710 erratum 2054223
+
+ Affected cores may fail to flush the trace data on a TSB instruction, when
+ the PE is in trace prohibited state. This will cause losing a few bytes
+ of the trace cached.
+
+ Workaround is to issue two TSB consecutively on affected cores.
+
+ If unsure, say Y.
+
+config ARM64_ERRATUM_2067961
+ bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace"
+ default y
+ help
+ Enable workaround for ARM Neoverse-N2 erratum 2067961
+
+ Affected cores may fail to flush the trace data on a TSB instruction, when
+ the PE is in trace prohibited state. This will cause losing a few bytes
+ of the trace cached.
+
+ Workaround is to issue two TSB consecutively on affected cores.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 451e11e5fd23..1c5a00598458 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -23,7 +23,7 @@
#define dsb(opt) asm volatile("dsb " #opt : : : "memory")

#define psb_csync() asm volatile("hint #17" : : : "memory")
-#define tsb_csync() asm volatile("hint #18" : : : "memory")
+#define __tsb_csync() asm volatile("hint #18" : : : "memory")
#define csdb() asm volatile("hint #20" : : : "memory")

#ifdef CONFIG_ARM64_PSEUDO_NMI
@@ -46,6 +46,20 @@
#define dma_rmb() dmb(oshld)
#define dma_wmb() dmb(oshst)

+
+#define tsb_csync() \
+ do { \
+ /* \
+ * CPUs affected by Arm Erratum 2054223 or 2067961 needs \
+ * another TSB to ensure the trace is flushed. The barriers \
+ * don't have to be strictly back to back, as long as the \
+ * CPU is in trace prohibited state. \
+ */ \
+ if (cpus_have_final_cap(ARM64_WORKAROUND_TSB_FLUSH_FAILURE)) \
+ __tsb_csync(); \
+ __tsb_csync(); \
+ } while (0)
+
/*
* Generate a mask for array_index__nospec() that is ~0UL when 0 <= idx < sz
* and 0 otherwise.
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index ccd757373f36..bdbeac75ead6 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -352,6 +352,18 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
};
#endif /* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */

+#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE
+static const struct midr_range tsb_flush_fail_cpus[] = {
+#ifdef CONFIG_ARM64_ERRATUM_2067961
+ MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+#endif
+#ifdef CONFIG_ARM64_ERRATUM_2054223
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
+#endif
+ {},
+};
+#endif /* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
+
const struct arm64_cpu_capabilities arm64_errata[] = {
#ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
{
@@ -558,6 +570,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
.type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus),
},
+#endif
+#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILRE
+ {
+ .desc = "ARM erratum 2067961 or 2054223",
+ .capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
+ ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
+ },
#endif
{
}
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 1ccb92165bd8..2102e15af43d 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -54,6 +54,7 @@ WORKAROUND_1463225
WORKAROUND_1508412
WORKAROUND_1542419
WORKAROUND_TRBE_OVERWRITE_FILL_MODE
+WORKAROUND_TSB_FLUSH_FAILURE
WORKAROUND_CAVIUM_23154
WORKAROUND_CAVIUM_27456
WORKAROUND_CAVIUM_30115
--
2.24.1

2021-09-21 13:46:52

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range

Arm Neoverse-N2 and Cortex-A710 cores are affected by an erratum where the
trbe, under some circumstances, might write upto 64bytes to an address after
the Limit as programmed by the TRBLIMITR_EL1.LIMIT. This might -

- Corrupt a page in the ring buffer, which may corrupt trace from a
previous session, consumed by userspace.
- Hit the guard page at the end of the vmalloc area and raise a fault.

To keep the handling simpler, we always leave the last page from the
range, which TRBE is allowed to write. This can be achieved by ensuring
that we always have more than a PAGE worth space in the range, while
calculating the LIMIT for TRBE. And then the LIMIT pointer can be adjusted
to leave the PAGE (TRBLIMITR.LIMIT -= PAGE_SIZE), out of the TRBE range
while enabling it. This makes sure that the TRBE will only write to an area
within its allowed limit (i.e, [head-head+size]) and we do not have to handle
address faults within the driver.

Cc: Anshuman Khandual <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
arch/arm64/kernel/cpu_errata.c | 20 ++++++++++++++++++++
arch/arm64/tools/cpucaps | 1 +
2 files changed, 21 insertions(+)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index bdbeac75ead6..e2978b89d4b8 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -364,6 +364,18 @@ static const struct midr_range tsb_flush_fail_cpus[] = {
};
#endif /* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */

+#ifdef CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
+static struct midr_range trbe_write_out_of_range_cpus[] = {
+#ifdef CONFIG_ARM64_ERRATUM_2253138
+ MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+#endif
+#ifdef CONFIG_ARM64_ERRATUM_2224489
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
+#endif
+ {},
+};
+#endif /* CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE */
+
const struct arm64_cpu_capabilities arm64_errata[] = {
#ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
{
@@ -577,6 +589,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
.capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
},
+#endif
+#ifdef CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
+ {
+ .desc = "ARM erratum 2253138 or 2224489",
+ .capability = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
+ .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
+ CAP_MIDR_RANGE_LIST(trbe_write_out_of_range_cpus),
+ },
#endif
{
}
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 2102e15af43d..90628638e0f9 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -55,6 +55,7 @@ WORKAROUND_1508412
WORKAROUND_1542419
WORKAROUND_TRBE_OVERWRITE_FILL_MODE
WORKAROUND_TSB_FLUSH_FAILURE
+WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
WORKAROUND_CAVIUM_23154
WORKAROUND_CAVIUM_27456
WORKAROUND_CAVIUM_30115
--
2.24.1

2021-09-21 13:47:06

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 10/17] arm64: Enable workaround for TRBE overwrite in FILL mode

Now that we have the work around implmented in the TRBE
driver, add the Kconfig entries and document the errata.

Cc: Mark Rutland <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
Documentation/arm64/silicon-errata.rst | 4 +++
arch/arm64/Kconfig | 39 ++++++++++++++++++++++++++
2 files changed, 43 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index d410a47ffa57..2f99229d993c 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -92,12 +92,16 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1349291 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-500 | #841119,826419 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 077f2ec4eeb2..eac4030322df 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -666,6 +666,45 @@ config ARM64_ERRATUM_1508412

If unsure, say Y.

+config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
+ bool
+
+config ARM64_ERRATUM_2119858
+ bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
+ default y
+ depends on CORESIGHT_TRBE
+ select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
+ help
+ This option adds the workaround for ARM Cortex-A710 erratum 2119858.
+
+ Affected Cortex-A710 cores could overwrite upto 3 cache lines of trace
+ data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
+ the event of a WRAP event.
+
+ Work around the issue by always making sure we move the TRBPTR_EL1 by
+ 256bytes before enabling the buffer and filling the first 256bytes of
+ the buffer with ETM ignore packets upon disabling.
+
+ If unsure, say Y.
+
+config ARM64_ERRATUM_2139208
+ bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode"
+ default y
+ depends on CORESIGHT_TRBE
+ select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
+ help
+ This option adds the workaround for ARM Neoverse-N2 erratum 2139208.
+
+ Affected Neoverse-N2 cores could overwrite upto 3 cache lines of trace
+ data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
+ the event of a WRAP event.
+
+ Work around the issue by always making sure we move the TRBPTR_EL1 by
+ 256bytes before enabling the buffer and filling the first 256bytes of
+ the buffer with ETM ignore packets upon disabling.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
--
2.24.1

2021-09-21 13:47:13

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle

Add a helper to get the CPU specific data for TRBE instance, from
a given perf handle. This also adds extra checks to make sure that
the event associated with the handle is "bound" to the CPU and is
active on the TRBE.

Cc: Anshuman Khandual <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 983dd5039e52..797d978f9fa7 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
return buf->nr_pages * PAGE_SIZE;
}

+static inline struct trbe_cpudata *
+trbe_handle_to_cpudata(struct perf_output_handle *handle)
+{
+ struct trbe_buf *buf = etm_perf_sink_config(handle);
+
+ BUG_ON(!buf || !buf->cpudata);
+ return buf->cpudata;
+}
+
/*
* TRBE Limit Calculation
*
@@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
{
int ec = get_trbe_ec(trbsr);
int bsc = get_trbe_bsc(trbsr);
- struct trbe_buf *buf = etm_perf_sink_config(handle);
- struct trbe_cpudata *cpudata = buf->cpudata;
+ struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);

WARN_ON(is_trbe_running(trbsr));
if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
--
2.24.1

2021-09-21 13:47:43

by Suzuki K Poulose

[permalink] [raw]
Subject: [PATCH v2 16/17] coresight: trbe: Work around write to out of range

TRBE implementations affected by Arm erratum (2253138 or 2224489), could
write to the next address after the TRBLIMITR.LIMIT, instead of wrapping
to the TRBBASER. This implies that the TRBE could potentially corrupt :

- A page used by the rest of the kernel/user (if the LIMIT = end of
perf ring buffer)
- A page within the ring buffer, but outside the driver's range.
[head, head + size]. This may contain some trace data, may be
consumed by the userspace.

We workaround this erratum by :
- Making sure that there is at least an extra PAGE space left in the
TRBE's range than we normally assign. This will be additional to other
restrictions (e.g, the TRBE alignment for working around
TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE.
Thus we would have 2 * PAGE_SIZE)

- Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed
range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :

TRBLIMITR.LIMIT -= PAGE_SIZE

Cc: Anshuman Khandual <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
---
drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++-
1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 02f9e00e2091..ea907345354c 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -86,7 +86,8 @@ struct trbe_buf {
* affects the given instance of the TRBE.
*/
#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0
-#define TRBE_ERRATA_MAX 1
+#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE 1
+#define TRBE_ERRATA_MAX 2

/*
* Safe limit for the number of bytes that may be overwritten
@@ -96,6 +97,7 @@ struct trbe_buf {

static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
+ [TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
};

/*
@@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)

static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
{
- return TRBE_TRACE_MIN_BUF_SIZE;
+ u64 size = TRBE_TRACE_MIN_BUF_SIZE;
+ struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
+
+ /*
+ * When the TRBE is affected by an erratum that could make it
+ * write to the next "virtually addressed" page beyond the LIMIT.
+ * We need to make sure there is always a PAGE after the LIMIT,
+ * within the buffer. Thus we ensure there is at least an extra
+ * page than normal. With this we could then adjust the LIMIT
+ * pointer down by a PAGE later.
+ */
+ if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_WRITE_OUT_OF_RANGE))
+ size += PAGE_SIZE;
+ return size;
}

/*
@@ -585,6 +600,17 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
/*
* If the TRBE has wrapped around the write pointer has
* wrapped and should be treated as limit.
+ *
+ * When the TRBE is affected by TRBE_WORKAROUND_WRITE_OUT_OF_RANGE,
+ * it may write upto 64bytes beyond the "LIMIT". The driver already
+ * keeps a valid page next to the LIMIT and we could potentially
+ * consume the trace data that may have been collected there. But we
+ * cannot be really sure it is available, and the TRBPTR may not
+ * indicate the same. Also, affected cores are also affected by another
+ * erratum which forces the PAGE_SIZE alignment on the TRBPTR, and thus
+ * could potentially pad an entire PAGE_SIZE - 64bytes, to get those
+ * 64bytes. Thus we ignore the potential triggering of the erratum
+ * on WRAP and limit the data to LIMIT.
*/
if (wrap)
write = get_trbe_limit_pointer();
@@ -811,6 +837,35 @@ static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
}

+ /*
+ * TRBE_WORKAROUND_WRITE_OUT_OF_RANGE could cause the TRBE to write to
+ * the next page after the TRBLIMITR.LIMIT. For perf, the "next page"
+ * may be:
+ * - The page beyond the ring buffer. This could mean, TRBE could
+ * corrupt another entity (kernel / user)
+ * - A portion of the "ring buffer" consumed by the userspace.
+ * i.e, a page outisde [head, head + size].
+ *
+ * We work around this by:
+ * - Making sure that we have at least an extra space of PAGE left
+ * in the ring buffer [head, head + size], than we normally do
+ * without the erratum. See trbe_min_trace_buf_size().
+ *
+ * - Adjust the TRBLIMITR.LIMIT to leave the extra PAGE outside
+ * the TRBE's range (i.e [TRBBASER, TRBLIMITR.LIMI] ).
+ */
+ if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_WRITE_OUT_OF_RANGE)) {
+ s64 space = buf->trbe_limit - buf->trbe_write;
+ /*
+ * We must have more than a PAGE_SIZE worth space in the proposed
+ * range for the TRBE.
+ */
+ if (WARN_ON(space <= PAGE_SIZE ||
+ !IS_ALIGNED(buf->trbe_limit, PAGE_SIZE)))
+ return -EINVAL;
+ buf->trbe_limit -= PAGE_SIZE;
+ }
+
return 0;
}

--
2.24.1

2021-09-22 07:24:18

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 10/17] arm64: Enable workaround for TRBE overwrite in FILL mode



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Now that we have the work around implmented in the TRBE
> driver, add the Kconfig entries and document the errata.
>
> Cc: Mark Rutland <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Leo Yan <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> Documentation/arm64/silicon-errata.rst | 4 +++
> arch/arm64/Kconfig | 39 ++++++++++++++++++++++++++
> 2 files changed, 43 insertions(+)
>
> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
> index d410a47ffa57..2f99229d993c 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -92,12 +92,16 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 |
> +----------------+-----------------+-----------------+-----------------------------+
> +| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 |
> ++----------------+-----------------+-----------------+-----------------------------+
> | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
> +----------------+-----------------+-----------------+-----------------------------+
> | ARM | Neoverse-N1 | #1349291 | N/A |
> +----------------+-----------------+-----------------+-----------------------------+
> | ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 |
> +----------------+-----------------+-----------------+-----------------------------+
> +| ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 |
> ++----------------+-----------------+-----------------+-----------------------------+
> | ARM | MMU-500 | #841119,826419 | N/A |
> +----------------+-----------------+-----------------+-----------------------------+
> +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 077f2ec4eeb2..eac4030322df 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -666,6 +666,45 @@ config ARM64_ERRATUM_1508412
>
> If unsure, say Y.
>
> +config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
> + bool
> +
> +config ARM64_ERRATUM_2119858
> + bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
> + default y
> + depends on CORESIGHT_TRBE
> + select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
> + help
> + This option adds the workaround for ARM Cortex-A710 erratum 2119858.
> +
> + Affected Cortex-A710 cores could overwrite upto 3 cache lines of trace
> + data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
> + the event of a WRAP event.
> +
> + Work around the issue by always making sure we move the TRBPTR_EL1 by
> + 256bytes before enabling the buffer and filling the first 256bytes of
> + the buffer with ETM ignore packets upon disabling.
> +
> + If unsure, say Y.
> +
> +config ARM64_ERRATUM_2139208
> + bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode"
> + default y
> + depends on CORESIGHT_TRBE
> + select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
> + help
> + This option adds the workaround for ARM Neoverse-N2 erratum 2139208.
> +
> + Affected Neoverse-N2 cores could overwrite upto 3 cache lines of trace
> + data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in

s/ponited/pointed

> + the event of a WRAP event.
> +
> + Work around the issue by always making sure we move the TRBPTR_EL1 by
> + 256bytes before enabling the buffer and filling the first 256bytes of
> + the buffer with ETM ignore packets upon disabling.
> +
> + If unsure, say Y.
> +
> config CAVIUM_ERRATUM_22375
> bool "Cavium erratum 22375, 24313"
> default y
>

The real errata problem description for both these erratums are exactly
the same. Rather a more generalized description should be included for
the ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, which abstracts out the
problem and a corresponding solution that is implemented in the driver.
This should also help us reduce current redundancy.

----------------------------------------------------------------------
Affected cores could overwrite upto 3 cache lines of trace data at the
base of the buffer (pointed by TRBASER_EL1) in FILL mode in the event
of a WRAP event.

Work around the issue by always making sure we move the TRBPTR_EL1 by
256bytes before enabling the buffer and filling the first 256bytes of
the buffer with ETM ignore packets upon disabling.
----------------------------------------------------------------------

2021-09-22 07:40:22

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
> from errata, where a TSB (trace synchronization barrier)
> fails to flush the trace data completely, when executed from
> a trace prohibited region. In Linux we always execute it
> after we have moved the PE to trace prohibited region. So,
> we can apply the workaround everytime a TSB is executed.

s/everytime/every time

>
> The work around is to issue two TSB consecutively.
>
> NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
> that a late CPU could be blocked from booting if it is the
> first CPU that requires the workaround. This is because we
> do not allow setting a cpu_hwcaps after the SMP boot. The
> other alternative is to use "this_cpu_has_cap()" instead
> of the faster system wide check, which may be a bit of an
> overhead, given we may have to do this in nvhe KVM host
> before a guest entry.
>
> Cc: Will Deacon <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> Changes since v1:
> - Switch to cpus_have_final_cap()
> - Document the requirements on TSB.
> ---
> Documentation/arm64/silicon-errata.rst | 4 ++++
> arch/arm64/Kconfig | 31 ++++++++++++++++++++++++++
> arch/arm64/include/asm/barrier.h | 16 ++++++++++++-
> arch/arm64/kernel/cpu_errata.c | 19 ++++++++++++++++
> arch/arm64/tools/cpucaps | 1 +
> 5 files changed, 70 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
> index 2f99229d993c..569a92411dcd 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -94,6 +94,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 |
> +----------------+-----------------+-----------------+-----------------------------+
> +| ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 |
> ++----------------+-----------------+-----------------+-----------------------------+
> | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
> +----------------+-----------------+-----------------+-----------------------------+
> | ARM | Neoverse-N1 | #1349291 | N/A |
> @@ -102,6 +104,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 |
> +----------------+-----------------+-----------------+-----------------------------+
> +| ARM | Neoverse-N2 | #2067961 | ARM64_ERRATUM_2067961 |
> ++----------------+-----------------+-----------------+-----------------------------+
> | ARM | MMU-500 | #841119,826419 | N/A |
> +----------------+-----------------+-----------------+-----------------------------+
> +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index eac4030322df..0764774e12bb 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -705,6 +705,37 @@ config ARM64_ERRATUM_2139208
>
> If unsure, say Y.
>
> +config ARM64_WORKAROUND_TSB_FLUSH_FAILURE
> + bool
> +
> +config ARM64_ERRATUM_2054223
> + bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace"
> + default y
> + help
> + Enable workaround for ARM Cortex-A710 erratum 2054223
> +
> + Affected cores may fail to flush the trace data on a TSB instruction, when
> + the PE is in trace prohibited state. This will cause losing a few bytes
> + of the trace cached.
> +
> + Workaround is to issue two TSB consecutively on affected cores.
> +
> + If unsure, say Y.
> +
> +config ARM64_ERRATUM_2067961
> + bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace"
> + default y
> + help
> + Enable workaround for ARM Neoverse-N2 erratum 2067961
> +
> + Affected cores may fail to flush the trace data on a TSB instruction, when
> + the PE is in trace prohibited state. This will cause losing a few bytes
> + of the trace cached.
> +
> + Workaround is to issue two TSB consecutively on affected cores.

Like I had mentioned in the previous patch, these descriptions here could
be just factored out inside ARM64_WORKAROUND_TSB_FLUSH_FAILURE instead.

> +
> + If unsure, say Y.
> +
> config CAVIUM_ERRATUM_22375
> bool "Cavium erratum 22375, 24313"
> default y
> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> index 451e11e5fd23..1c5a00598458 100644
> --- a/arch/arm64/include/asm/barrier.h
> +++ b/arch/arm64/include/asm/barrier.h
> @@ -23,7 +23,7 @@
> #define dsb(opt) asm volatile("dsb " #opt : : : "memory")
>
> #define psb_csync() asm volatile("hint #17" : : : "memory")
> -#define tsb_csync() asm volatile("hint #18" : : : "memory")
> +#define __tsb_csync() asm volatile("hint #18" : : : "memory")
> #define csdb() asm volatile("hint #20" : : : "memory")
>
> #ifdef CONFIG_ARM64_PSEUDO_NMI
> @@ -46,6 +46,20 @@
> #define dma_rmb() dmb(oshld)
> #define dma_wmb() dmb(oshst)
>
> +
> +#define tsb_csync() \
> + do { \
> + /* \
> + * CPUs affected by Arm Erratum 2054223 or 2067961 needs \
> + * another TSB to ensure the trace is flushed. The barriers \
> + * don't have to be strictly back to back, as long as the \
> + * CPU is in trace prohibited state. \
> + */ \
> + if (cpus_have_final_cap(ARM64_WORKAROUND_TSB_FLUSH_FAILURE)) \
> + __tsb_csync(); \
> + __tsb_csync(); \
> + } while (0)
> +
> /*
> * Generate a mask for array_index__nospec() that is ~0UL when 0 <= idx < sz
> * and 0 otherwise.
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index ccd757373f36..bdbeac75ead6 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -352,6 +352,18 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
> };
> #endif /* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */
>
> +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE
> +static const struct midr_range tsb_flush_fail_cpus[] = {
> +#ifdef CONFIG_ARM64_ERRATUM_2067961
> + MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
> +#endif
> +#ifdef CONFIG_ARM64_ERRATUM_2054223
> + MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
> +#endif
> + {},
> +};
> +#endif /* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
> +
> const struct arm64_cpu_capabilities arm64_errata[] = {
> #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
> {
> @@ -558,6 +570,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
> .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
> CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus),
> },
> +#endif
> +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILRE
> + {
> + .desc = "ARM erratum 2067961 or 2054223",
> + .capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
> + ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
> + },
> #endif
> {
> }
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 1ccb92165bd8..2102e15af43d 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -54,6 +54,7 @@ WORKAROUND_1463225
> WORKAROUND_1508412
> WORKAROUND_1542419
> WORKAROUND_TRBE_OVERWRITE_FILL_MODE
> +WORKAROUND_TSB_FLUSH_FAILURE
> WORKAROUND_CAVIUM_23154
> WORKAROUND_CAVIUM_27456
> WORKAROUND_CAVIUM_30115
>

This adds all the required bits of these erratas in a single patch,
where as the previous work around had split all the required pieces
into multiple patches. Could we instead follow the same standard in
both the places ?

2021-09-22 08:00:53

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Add a helper to get the CPU specific data for TRBE instance, from
> a given perf handle. This also adds extra checks to make sure that
> the event associated with the handle is "bound" to the CPU and is
> active on the TRBE.
>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Leo Yan <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 983dd5039e52..797d978f9fa7 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
> return buf->nr_pages * PAGE_SIZE;
> }
>
> +static inline struct trbe_cpudata *
> +trbe_handle_to_cpudata(struct perf_output_handle *handle)

This is actually a perf handle not a TRBE handle. Hence
should be renamed as 'perf_handle_to_cpudata' instead.

> +{
> + struct trbe_buf *buf = etm_perf_sink_config(handle);
> +
> + BUG_ON(!buf || !buf->cpudata);
> + return buf->cpudata;
> +}
> +
> /*
> * TRBE Limit Calculation
> *
> @@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
> {
> int ec = get_trbe_ec(trbsr);
> int bsc = get_trbe_bsc(trbsr);
> - struct trbe_buf *buf = etm_perf_sink_config(handle);
> - struct trbe_cpudata *cpudata = buf->cpudata;
> + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>
> WARN_ON(is_trbe_running(trbsr));
> if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
>

2021-09-22 08:14:00

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 10/17] arm64: Enable workaround for TRBE overwrite in FILL mode

On 22/09/2021 08:23, Anshuman Khandual wrote:
>
>
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> Now that we have the work around implmented in the TRBE
>> driver, add the Kconfig entries and document the errata.
>>
>> Cc: Mark Rutland <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Anshuman Khandual <[email protected]>
>> Cc: Mathieu Poirier <[email protected]>
>> Cc: Mike Leach <[email protected]>
>> Cc: Leo Yan <[email protected]>
>> Signed-off-by: Suzuki K Poulose <[email protected]>
>> ---
>> Documentation/arm64/silicon-errata.rst | 4 +++
>> arch/arm64/Kconfig | 39 ++++++++++++++++++++++++++
>> 2 files changed, 43 insertions(+)
>>
>> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
>> index d410a47ffa57..2f99229d993c 100644
>> --- a/Documentation/arm64/silicon-errata.rst
>> +++ b/Documentation/arm64/silicon-errata.rst
>> @@ -92,12 +92,16 @@ stable kernels.
>> +----------------+-----------------+-----------------+-----------------------------+
>> | ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 |
>> +----------------+-----------------+-----------------+-----------------------------+
>> +| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 |
>> ++----------------+-----------------+-----------------+-----------------------------+
>> | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
>> +----------------+-----------------+-----------------+-----------------------------+
>> | ARM | Neoverse-N1 | #1349291 | N/A |
>> +----------------+-----------------+-----------------+-----------------------------+
>> | ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 |
>> +----------------+-----------------+-----------------+-----------------------------+
>> +| ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 |
>> ++----------------+-----------------+-----------------+-----------------------------+
>> | ARM | MMU-500 | #841119,826419 | N/A |
>> +----------------+-----------------+-----------------+-----------------------------+
>> +----------------+-----------------+-----------------+-----------------------------+
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 077f2ec4eeb2..eac4030322df 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -666,6 +666,45 @@ config ARM64_ERRATUM_1508412
>>
>> If unsure, say Y.
>>
>> +config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>> + bool
>> +
>> +config ARM64_ERRATUM_2119858
>> + bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
>> + default y
>> + depends on CORESIGHT_TRBE
>> + select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>> + help
>> + This option adds the workaround for ARM Cortex-A710 erratum 2119858.
>> +
>> + Affected Cortex-A710 cores could overwrite upto 3 cache lines of trace
>> + data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
>> + the event of a WRAP event.
>> +
>> + Work around the issue by always making sure we move the TRBPTR_EL1 by
>> + 256bytes before enabling the buffer and filling the first 256bytes of
>> + the buffer with ETM ignore packets upon disabling.
>> +
>> + If unsure, say Y.
>> +
>> +config ARM64_ERRATUM_2139208
>> + bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode"
>> + default y
>> + depends on CORESIGHT_TRBE
>> + select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>> + help
>> + This option adds the workaround for ARM Neoverse-N2 erratum 2139208.
>> +
>> + Affected Neoverse-N2 cores could overwrite upto 3 cache lines of trace
>> + data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
>
> s/ponited/pointed
>
>> + the event of a WRAP event.
>> +
>> + Work around the issue by always making sure we move the TRBPTR_EL1 by
>> + 256bytes before enabling the buffer and filling the first 256bytes of
>> + the buffer with ETM ignore packets upon disabling.
>> +
>> + If unsure, say Y.
>> +
>> config CAVIUM_ERRATUM_22375
>> bool "Cavium erratum 22375, 24313"
>> default y
>>
>
> The real errata problem description for both these erratums are exactly
> the same. Rather a more generalized description should be included for
> the ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, which abstracts out the
> problem and a corresponding solution that is implemented in the driver.
> This should also help us reduce current redundancy.
>

The issue is what a user wants to see. A user who wants to configure the
kernel specifically for a given CPU (think embedded systems), they would
want to hand pick the errata for the particular CPU. So, moving the help
text to an implicitly selected Kconfig symbol. I would rather keep this
as it is to keep it user friendly. This doesn't affect the code size
anyways.

The other option is to remove all the CPU specific Kconfig symbols and
update the "title" to reflect both the CPU/erratum numbers.

Kind regards
Suzuki

2021-09-22 09:59:19

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 14/17] coresight: trbe: Make sure we have enough space



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> The TRBE driver makes sure that there is enough space for a meaningful
> run, otherwise pads the given space and restarts the offset calculation
> once. But there is no guarantee that we may find space or hit "no space".

So what happens currently when it neither finds the required minimum buffer
space for a meaningful run nor does it hit the "no space" scenario ?

> Make sure that we repeat the step until, either :
> - We have the minimum space
> OR
> - There is NO space at all.
>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Leo Yan <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> drivers/hwtracing/coresight/coresight-trbe.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 3373f4e2183b..02f9e00e2091 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -451,10 +451,14 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
> * If the head is too close to the limit and we don't
> * have space for a meaningful run, we rather pad it
> * and start fresh.
> + *
> + * We might have to do this more than once to make sure
> + * we have enough required space.

OR no space at all, as explained in the commit message.
Hence this comment needs an update.

> */
> - if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
> + while (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
> trbe_pad_buf(handle, limit - head);
> limit = __trbe_normal_offset(handle);
> + head = PERF_IDX2OFF(handle->head, buf);

Should the loop be bound with a retry limit as well ?

> }
> return limit;
> }
>

2021-09-22 10:19:50

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 14/17] coresight: trbe: Make sure we have enough space

On 22/09/2021 10:58, Anshuman Khandual wrote:
>
>
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> The TRBE driver makes sure that there is enough space for a meaningful
>> run, otherwise pads the given space and restarts the offset calculation
>> once. But there is no guarantee that we may find space or hit "no space".
>
> So what happens currently when it neither finds the required minimum buffer
> space for a meaningful run nor does it hit the "no space" scenario ?

It tries once today and assumes that it will either hit :

- No space
OR
- Enough space

which is reasonable, given the minimum space needed is a few bytes.
But this may no longer be true with other erratum workaround.

>
>> Make sure that we repeat the step until, either :
>> - We have the minimum space
>> OR
>> - There is NO space at all.
>>
>> Cc: Anshuman Khandual <[email protected]>
>> Cc: Mathieu Poirier <[email protected]>
>> Cc: Mike Leach <[email protected]>
>> Cc: Leo Yan <[email protected]>
>> Signed-off-by: Suzuki K Poulose <[email protected]>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 3373f4e2183b..02f9e00e2091 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -451,10 +451,14 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
>> * If the head is too close to the limit and we don't
>> * have space for a meaningful run, we rather pad it
>> * and start fresh.
>> + *
>> + * We might have to do this more than once to make sure
>> + * we have enough required space.
>
> OR no space at all, as explained in the commit message.
> Hence this comment needs an update.
>
>> */
>> - if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>> + while (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>> trbe_pad_buf(handle, limit - head);
>> limit = __trbe_normal_offset(handle);
>> + head = PERF_IDX2OFF(handle->head, buf);
>
> Should the loop be bound with a retry limit as well ?

No. We will eventually hit No-space as we keep on padding
the buffer.

Suzuki

2021-09-22 10:59:57

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Arm Neoverse-N2 and Cortex-A710 cores are affected by an erratum where the
> trbe, under some circumstances, might write upto 64bytes to an address after
> the Limit as programmed by the TRBLIMITR_EL1.LIMIT. This might -
>
> - Corrupt a page in the ring buffer, which may corrupt trace from a
> previous session, consumed by userspace.
> - Hit the guard page at the end of the vmalloc area and raise a fault.
>
> To keep the handling simpler, we always leave the last page from the
> range, which TRBE is allowed to write. This can be achieved by ensuring
> that we always have more than a PAGE worth space in the range, while
> calculating the LIMIT for TRBE. And then the LIMIT pointer can be adjusted
> to leave the PAGE (TRBLIMITR.LIMIT -= PAGE_SIZE), out of the TRBE range
> while enabling it. This makes sure that the TRBE will only write to an area
> within its allowed limit (i.e, [head-head+size]) and we do not have to handle
> address faults within the driver.
>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Leo Yan <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>

Reviewed-by: Anshuman Khandual <[email protected]>

> ---
> arch/arm64/kernel/cpu_errata.c | 20 ++++++++++++++++++++
> arch/arm64/tools/cpucaps | 1 +
> 2 files changed, 21 insertions(+)
>
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index bdbeac75ead6..e2978b89d4b8 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -364,6 +364,18 @@ static const struct midr_range tsb_flush_fail_cpus[] = {
> };
> #endif /* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
>
> +#ifdef CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
> +static struct midr_range trbe_write_out_of_range_cpus[] = {
> +#ifdef CONFIG_ARM64_ERRATUM_2253138
> + MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
> +#endif
> +#ifdef CONFIG_ARM64_ERRATUM_2224489
> + MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
> +#endif
> + {},
> +};
> +#endif /* CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE */
> +
> const struct arm64_cpu_capabilities arm64_errata[] = {
> #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
> {
> @@ -577,6 +589,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
> .capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
> ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
> },
> +#endif
> +#ifdef CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
> + {
> + .desc = "ARM erratum 2253138 or 2224489",
> + .capability = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
> + .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
> + CAP_MIDR_RANGE_LIST(trbe_write_out_of_range_cpus),
> + },
> #endif
> {
> }
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 2102e15af43d..90628638e0f9 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -55,6 +55,7 @@ WORKAROUND_1508412
> WORKAROUND_1542419
> WORKAROUND_TRBE_OVERWRITE_FILL_MODE
> WORKAROUND_TSB_FLUSH_FAILURE
> +WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
> WORKAROUND_CAVIUM_23154
> WORKAROUND_CAVIUM_27456
> WORKAROUND_CAVIUM_30115
>

2021-09-22 11:05:48

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> Add Kconfig entries for the errata workarounds for TRBE writing
> to an out-of-range address.
>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Leo Yan <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>

Reviewed-by: Anshuman Khandual <[email protected]>

> ---
> Documentation/arm64/silicon-errata.rst | 4 +++
> arch/arm64/Kconfig | 39 ++++++++++++++++++++++++++
> 2 files changed, 43 insertions(+)
>
> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
> index 569a92411dcd..5342e895fb60 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -96,6 +96,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 |
> +----------------+-----------------+-----------------+-----------------------------+
> +| ARM | Cortex-A710 | #2224489 | ARM64_ERRATUM_2224489 |
> ++----------------+-----------------+-----------------+-----------------------------+
> | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
> +----------------+-----------------+-----------------+-----------------------------+
> | ARM | Neoverse-N1 | #1349291 | N/A |
> @@ -106,6 +108,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | ARM | Neoverse-N2 | #2067961 | ARM64_ERRATUM_2067961 |
> +----------------+-----------------+-----------------+-----------------------------+
> +| ARM | Neoverse-N2 | #2253138 | ARM64_ERRATUM_2253138 |
> ++----------------+-----------------+-----------------+-----------------------------+
> | ARM | MMU-500 | #841119,826419 | N/A |
> +----------------+-----------------+-----------------+-----------------------------+
> +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0764774e12bb..611ae02aabbd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -736,6 +736,45 @@ config ARM64_ERRATUM_2067961
>
> If unsure, say Y.
>
> +config ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
> + bool
> +
> +config ARM64_ERRATUM_2253138
> + bool "Neoverse-N2: 2253138: workaround TRBE writing to address out-of-range"
> + depends on CORESIGHT_TRBE
> + default y
> + select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
> + help
> + This option adds the workaround for ARM Neoverse-N2 erratum 2253138.
> +
> + Affected Neoverse-N2 cores might write to an out-of-range address, not reserved
> + for TRBE. Under some conditions, the TRBE might generate a write to the next
> + virtually addressed page following the last page of the TRBE address space
> + (i.e, the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
> +
> + We work around this in the driver by, always making sure that there is a
> + page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
> +
> + If unsure, say Y.
> +
> +config ARM64_ERRATUM_2224489
> + bool "Cortex-A710: 2224489: workaround TRBE writing to address out-of-range"
> + depends on CORESIGHT_TRBE
> + default y
> + select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
> + help
> + This option adds the workaround for ARM Cortex-A710 erratum 2224489.
> +
> + Affected Cortex-A710 cores might write to an out-of-range address, not reserved
> + for TRBE. Under some conditions, the TRBE might generate a write to the next
> + virtually addressed page following the last page of the TRBE address space
> + (i.e, the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
> +
> + We work around this in the driver by, always making sure that there is a
> + page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
> +
> + If unsure, say Y.
> +
> config CAVIUM_ERRATUM_22375
> bool "Cavium erratum 22375, 24313"
> default y
>

2021-09-22 12:05:44

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures

Hi Anshuman

On 22/09/2021 08:39, Anshuman Khandual wrote:
>
>
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
>> from errata, where a TSB (trace synchronization barrier)
>> fails to flush the trace data completely, when executed from
>> a trace prohibited region. In Linux we always execute it
>> after we have moved the PE to trace prohibited region. So,
>> we can apply the workaround everytime a TSB is executed.
>
> s/everytime/every time

Ack

>
>>
>> The work around is to issue two TSB consecutively.
>>
>> NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
>> that a late CPU could be blocked from booting if it is the
>> first CPU that requires the workaround. This is because we
>> do not allow setting a cpu_hwcaps after the SMP boot. The
>> other alternative is to use "this_cpu_has_cap()" instead
>> of the faster system wide check, which may be a bit of an
>> overhead, given we may have to do this in nvhe KVM host
>> before a guest entry.
>>
>> Cc: Will Deacon <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Mathieu Poirier <[email protected]>
>> Cc: Mike Leach <[email protected]>
>> Cc: Mark Rutland <[email protected]>
>> Cc: Anshuman Khandual <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> Signed-off-by: Suzuki K Poulose <[email protected]>
>> ---

...

>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index eac4030322df..0764774e12bb 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -705,6 +705,37 @@ config ARM64_ERRATUM_2139208
>>
>> If unsure, say Y.
>>
>> +config ARM64_WORKAROUND_TSB_FLUSH_FAILURE
>> + bool
>> +
>> +config ARM64_ERRATUM_2054223
>> + bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace"
>> + default y
>> + help
>> + Enable workaround for ARM Cortex-A710 erratum 2054223
>> +
>> + Affected cores may fail to flush the trace data on a TSB instruction, when
>> + the PE is in trace prohibited state. This will cause losing a few bytes
>> + of the trace cached.
>> +
>> + Workaround is to issue two TSB consecutively on affected cores.
>> +
>> + If unsure, say Y.
>> +
>> +config ARM64_ERRATUM_2067961
>> + bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace"
>> + default y
>> + help
>> + Enable workaround for ARM Neoverse-N2 erratum 2067961
>> +
>> + Affected cores may fail to flush the trace data on a TSB instruction, when
>> + the PE is in trace prohibited state. This will cause losing a few bytes
>> + of the trace cached.
>> +
>> + Workaround is to issue two TSB consecutively on affected cores.
>
> Like I had mentioned in the previous patch, these descriptions here could
> be just factored out inside ARM64_WORKAROUND_TSB_FLUSH_FAILURE instead.

Please see my response there.

>
>> +
>> + If unsure, say Y.
>> +
>> config CAVIUM_ERRATUM_22375
>> bool "Cavium erratum 22375, 24313"
>> default y
>> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>> index 451e11e5fd23..1c5a00598458 100644
>> --- a/arch/arm64/include/asm/barrier.h
>> +++ b/arch/arm64/include/asm/barrier.h
>> @@ -23,7 +23,7 @@
>> #define dsb(opt) asm volatile("dsb " #opt : : : "memory")
>>
>> #define psb_csync() asm volatile("hint #17" : : : "memory")
>> -#define tsb_csync() asm volatile("hint #18" : : : "memory")
>> +#define __tsb_csync() asm volatile("hint #18" : : : "memory")
>> #define csdb() asm volatile("hint #20" : : : "memory")
>>
>> #ifdef CONFIG_ARM64_PSEUDO_NMI
>> @@ -46,6 +46,20 @@
>> #define dma_rmb() dmb(oshld)
>> #define dma_wmb() dmb(oshst)
>>
>> +
>> +#define tsb_csync() \
>> + do { \
>> + /* \
>> + * CPUs affected by Arm Erratum 2054223 or 2067961 needs \
>> + * another TSB to ensure the trace is flushed. The barriers \
>> + * don't have to be strictly back to back, as long as the \
>> + * CPU is in trace prohibited state. \
>> + */ \
>> + if (cpus_have_final_cap(ARM64_WORKAROUND_TSB_FLUSH_FAILURE)) \
>> + __tsb_csync(); \
>> + __tsb_csync(); \
>> + } while (0)
>> +
>> /*
>> * Generate a mask for array_index__nospec() that is ~0UL when 0 <= idx < sz
>> * and 0 otherwise.
>> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
>> index ccd757373f36..bdbeac75ead6 100644
>> --- a/arch/arm64/kernel/cpu_errata.c
>> +++ b/arch/arm64/kernel/cpu_errata.c
>> @@ -352,6 +352,18 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
>> };
>> #endif /* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */
>>
>> +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE
>> +static const struct midr_range tsb_flush_fail_cpus[] = {
>> +#ifdef CONFIG_ARM64_ERRATUM_2067961
>> + MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
>> +#endif
>> +#ifdef CONFIG_ARM64_ERRATUM_2054223
>> + MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
>> +#endif
>> + {},
>> +};
>> +#endif /* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
>> +
>> const struct arm64_cpu_capabilities arm64_errata[] = {
>> #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
>> {
>> @@ -558,6 +570,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>> .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
>> CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus),
>> },
>> +#endif
>> +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILRE
>> + {
>> + .desc = "ARM erratum 2067961 or 2054223",
>> + .capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
>> + ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
>> + },
>> #endif
>> {
>> }
>> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
>> index 1ccb92165bd8..2102e15af43d 100644
>> --- a/arch/arm64/tools/cpucaps
>> +++ b/arch/arm64/tools/cpucaps
>> @@ -54,6 +54,7 @@ WORKAROUND_1463225
>> WORKAROUND_1508412
>> WORKAROUND_1542419
>> WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>> +WORKAROUND_TSB_FLUSH_FAILURE
>> WORKAROUND_CAVIUM_23154
>> WORKAROUND_CAVIUM_27456
>> WORKAROUND_CAVIUM_30115
>>
>
> This adds all the required bits of these erratas in a single patch,
> where as the previous work around had split all the required pieces
> into multiple patches. Could we instead follow the same standard in
> both the places ?

We could do this for this particular erratum as the work around is
within the arm64 kernel code, unlike the other ones - where the TRBE
driver needs a change.

So, there is a kind of dependency for the other two, which we don't
in this particular case.

i.e, TRBE driver needs a cpucap number to implement the work around ->
The arm64 kernel must define one, which we cant advertise yet until
we have a TRBE work around.

Thus, they follow a 3 step model.

- Define CPUCAP erratum
- TRBE driver work around
- Finally advertise to the user.

I don't think this one needs that.

Suzuki


>

2021-09-23 03:16:34

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] coresight: trbe: Work around write to out of range



On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
> TRBE implementations affected by Arm erratum (2253138 or 2224489), could
> write to the next address after the TRBLIMITR.LIMIT, instead of wrapping
> to the TRBBASER. This implies that the TRBE could potentially corrupt :
>
> - A page used by the rest of the kernel/user (if the LIMIT = end of
> perf ring buffer)
> - A page within the ring buffer, but outside the driver's range.
> [head, head + size]. This may contain some trace data, may be
> consumed by the userspace.
>
> We workaround this erratum by :
> - Making sure that there is at least an extra PAGE space left in the
> TRBE's range than we normally assign. This will be additional to other
> restrictions (e.g, the TRBE alignment for working around
> TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE.
> Thus we would have 2 * PAGE_SIZE)
>
> - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed
> range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :
>
> TRBLIMITR.LIMIT -= PAGE_SIZE
>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Leo Yan <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++-
> 1 file changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 02f9e00e2091..ea907345354c 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -86,7 +86,8 @@ struct trbe_buf {
> * affects the given instance of the TRBE.
> */
> #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0
> -#define TRBE_ERRATA_MAX 1
> +#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE 1
> +#define TRBE_ERRATA_MAX 2
>
> /*
> * Safe limit for the number of bytes that may be overwritten
> @@ -96,6 +97,7 @@ struct trbe_buf {
>
> static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
> [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
> + [TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
> };
>
> /*
> @@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
>
> static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
> {
> - return TRBE_TRACE_MIN_BUF_SIZE;
> + u64 size = TRBE_TRACE_MIN_BUF_SIZE;
> + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
> +
> + /*
> + * When the TRBE is affected by an erratum that could make it
> + * write to the next "virtually addressed" page beyond the LIMIT.

What if the next "virtually addressed" page is just blocked from future
usage in the kernel and never really gets mapped into a physical page ?
In that case it would be guaranteed that, a next "virtually addressed"
page would not even exist after the LIMIT pointer and hence the errata
would not be triggered. Something like there is a virtual mapping cliff
right after the LIMIT pointer from the MMU perspective.

Although it might be bit tricky. Currently the entire ring buffer gets
mapped at once with vmap() in arm_trbe_alloc_buffer(). Just to achieve
the above solution, each computation of the LIMIT pointer needs to be
followed by a temporary unmapping of next virtual page from existing
vmap() buffer. Subsequently it could be mapped back as trbe_buf->pages
always contains all the physical pages from the perf ring buffer.

> + * We need to make sure there is always a PAGE after the LIMIT,
> + * within the buffer. Thus we ensure there is at least an extra
> + * page than normal. With this we could then adjust the LIMIT
> + * pointer down by a PAGE later.
> + */
> + if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_WRITE_OUT_OF_RANGE))
> + size += PAGE_SIZE;
> + return size;
> }
>
> /*
> @@ -585,6 +600,17 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
> /*
> * If the TRBE has wrapped around the write pointer has
> * wrapped and should be treated as limit.
> + *
> + * When the TRBE is affected by TRBE_WORKAROUND_WRITE_OUT_OF_RANGE,
> + * it may write upto 64bytes beyond the "LIMIT". The driver already
> + * keeps a valid page next to the LIMIT and we could potentially
> + * consume the trace data that may have been collected there. But we
> + * cannot be really sure it is available, and the TRBPTR may not
> + * indicate the same. Also, affected cores are also affected by another
> + * erratum which forces the PAGE_SIZE alignment on the TRBPTR, and thus
> + * could potentially pad an entire PAGE_SIZE - 64bytes, to get those
> + * 64bytes. Thus we ignore the potential triggering of the erratum
> + * on WRAP and limit the data to LIMIT.
> */
> if (wrap)
> write = get_trbe_limit_pointer();
> @@ -811,6 +837,35 @@ static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
> buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
> }
>
> + /*
> + * TRBE_WORKAROUND_WRITE_OUT_OF_RANGE could cause the TRBE to write to
> + * the next page after the TRBLIMITR.LIMIT. For perf, the "next page"
> + * may be:
> + * - The page beyond the ring buffer. This could mean, TRBE could
> + * corrupt another entity (kernel / user)
> + * - A portion of the "ring buffer" consumed by the userspace.
> + * i.e, a page outisde [head, head + size].
> + *
> + * We work around this by:
> + * - Making sure that we have at least an extra space of PAGE left
> + * in the ring buffer [head, head + size], than we normally do
> + * without the erratum. See trbe_min_trace_buf_size().
> + *
> + * - Adjust the TRBLIMITR.LIMIT to leave the extra PAGE outside
> + * the TRBE's range (i.e [TRBBASER, TRBLIMITR.LIMI] ).
> + */
> + if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_WRITE_OUT_OF_RANGE)) {
> + s64 space = buf->trbe_limit - buf->trbe_write;
> + /*
> + * We must have more than a PAGE_SIZE worth space in the proposed
> + * range for the TRBE.
> + */
> + if (WARN_ON(space <= PAGE_SIZE ||
> + !IS_ALIGNED(buf->trbe_limit, PAGE_SIZE)))
> + return -EINVAL;
> + buf->trbe_limit -= PAGE_SIZE;
> + }
> +
> return 0;
> }
>
>

2021-09-28 10:34:23

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] coresight: trbe: Work around write to out of range

On 23/09/2021 04:15, Anshuman Khandual wrote:
>
>
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> TRBE implementations affected by Arm erratum (2253138 or 2224489), could
>> write to the next address after the TRBLIMITR.LIMIT, instead of wrapping
>> to the TRBBASER. This implies that the TRBE could potentially corrupt :
>>
>> - A page used by the rest of the kernel/user (if the LIMIT = end of
>> perf ring buffer)
>> - A page within the ring buffer, but outside the driver's range.
>> [head, head + size]. This may contain some trace data, may be
>> consumed by the userspace.
>>
>> We workaround this erratum by :
>> - Making sure that there is at least an extra PAGE space left in the
>> TRBE's range than we normally assign. This will be additional to other
>> restrictions (e.g, the TRBE alignment for working around
>> TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE.
>> Thus we would have 2 * PAGE_SIZE)
>>
>> - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed
>> range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :
>>
>> TRBLIMITR.LIMIT -= PAGE_SIZE
>>
>> Cc: Anshuman Khandual <[email protected]>
>> Cc: Mathieu Poirier <[email protected]>
>> Cc: Mike Leach <[email protected]>
>> Cc: Leo Yan <[email protected]>
>> Signed-off-by: Suzuki K Poulose <[email protected]>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++-
>> 1 file changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 02f9e00e2091..ea907345354c 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -86,7 +86,8 @@ struct trbe_buf {
>> * affects the given instance of the TRBE.
>> */
>> #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0
>> -#define TRBE_ERRATA_MAX 1
>> +#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE 1
>> +#define TRBE_ERRATA_MAX 2
>>
>> /*
>> * Safe limit for the number of bytes that may be overwritten
>> @@ -96,6 +97,7 @@ struct trbe_buf {
>>
>> static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>> [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>> + [TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
>> };
>>
>> /*
>> @@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
>>
>> static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
>> {
>> - return TRBE_TRACE_MIN_BUF_SIZE;
>> + u64 size = TRBE_TRACE_MIN_BUF_SIZE;
>> + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>> +
>> + /*
>> + * When the TRBE is affected by an erratum that could make it
>> + * write to the next "virtually addressed" page beyond the LIMIT.
>
> What if the next "virtually addressed" page is just blocked from future
> usage in the kernel and never really gets mapped into a physical page ?

That is the case today for vmap(), the end of the vm_area has a guard
page. But that implies when the erratum is triggered, the TRBE
encounters a fault and we need to handle that in the driver. This works
for "end" of the ring buffer. But not when the LIMIT is in the middle
of the ring buffer.

> In that case it would be guaranteed that, a next "virtually addressed"
> page would not even exist after the LIMIT pointer and hence the errata
> would not be triggered. Something like there is a virtual mapping cliff
> right after the LIMIT pointer from the MMU perspective.
>
> Although it might be bit tricky. Currently the entire ring buffer gets
> mapped at once with vmap() in arm_trbe_alloc_buffer(). Just to achieve
> the above solution, each computation of the LIMIT pointer needs to be
> followed by a temporary unmapping of next virtual page from existing
> vmap() buffer. Subsequently it could be mapped back as trbe_buf->pages
> always contains all the physical pages from the perf ring buffer.

It is much easier to leave a page aside than to do this map, unmap
dance, which might even change the VA address you get and thus it
complicates the TRBE driver in general. I believe this is much
simpler and we can reason about the code better. And all faults
are still illegal for the driver, which helps us to detect any
other issues in the TRBE.

Suzuki

2021-09-30 17:59:19

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated

Hi Suzuki,

On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:
> We collect the trace from the TRBE on FILL event from IRQ context
> and when via update_buffer(), when the event is stopped. Let us

s/"and when via"/"and via"

> consolidate how we calculate the trace generated into a helper.
>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Leo Yan <[email protected]>
> Reviewed-by: Anshuman Khandual <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++--------
> 1 file changed, 30 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 63f7edd5fd1f..063c4505a203 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
> return TRBE_FAULT_ACT_SPURIOUS;
> }
>
> +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
> + struct trbe_buf *buf,
> + bool wrap)

Stacking

> +{
> + u64 write;
> + u64 start_off, end_off;
> +
> + /*
> + * If the TRBE has wrapped around the write pointer has
> + * wrapped and should be treated as limit.
> + */
> + if (wrap)
> + write = get_trbe_limit_pointer();
> + else
> + write = get_trbe_write_pointer();
> +
> + end_off = write - buf->trbe_base;

In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is
acquired using get_trbe_base_pointer() but here it is referenced directly - any
reason for that? It certainly makes reviewing this simple patch quite
difficult because I keep wondering if I am missing something subtle...

> + start_off = PERF_IDX2OFF(handle->head, buf);
> +
> + if (WARN_ON_ONCE(end_off < start_off))
> + return 0;
> + return (end_off - start_off);
> +}
> +
> static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
> struct perf_event *event, void **pages,
> int nr_pages, bool snapshot)
> @@ -588,9 +612,9 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
> struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> struct trbe_buf *buf = config;
> enum trbe_fault_action act;
> - unsigned long size, offset;
> - unsigned long write, base, status;
> + unsigned long size, status;
> unsigned long flags;
> + bool wrap = false;
>
> WARN_ON(buf->cpudata != cpudata);
> WARN_ON(cpudata->cpu != smp_processor_id());
> @@ -630,8 +654,6 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
> * handle gets freed in etm_event_stop().
> */
> trbe_drain_and_disable_local();
> - write = get_trbe_write_pointer();
> - base = get_trbe_base_pointer();
>
> /* Check if there is a pending interrupt and handle it here */
> status = read_sysreg_s(SYS_TRBSR_EL1);
> @@ -655,20 +677,11 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
> goto done;
> }
>
> - /*
> - * Otherwise, the buffer is full and the write pointer
> - * has reached base. Adjust this back to the Limit pointer
> - * for correct size. Also, mark the buffer truncated.
> - */
> - write = get_trbe_limit_pointer();
> perf_aux_output_flag(handle, PERF_AUX_FLAG_COLLISION);
> + wrap = true;
> }
>
> - offset = write - base;
> - if (WARN_ON_ONCE(offset < PERF_IDX2OFF(handle->head, buf)))
> - size = 0;
> - else
> - size = offset - PERF_IDX2OFF(handle->head, buf);
> + size = trbe_get_trace_size(handle, buf, wrap);
>
> done:
> local_irq_restore(flags);
> @@ -749,11 +762,10 @@ static int trbe_handle_overflow(struct perf_output_handle *handle)
> {
> struct perf_event *event = handle->event;
> struct trbe_buf *buf = etm_perf_sink_config(handle);
> - unsigned long offset, size;
> + unsigned long size;
> struct etm_event_data *event_data;
>
> - offset = get_trbe_limit_pointer() - get_trbe_base_pointer();
> - size = offset - PERF_IDX2OFF(handle->head, buf);
> + size = trbe_get_trace_size(handle, buf, true);
> if (buf->snapshot)
> handle->head += size;
>
> --
> 2.24.1
>

2021-10-01 04:36:05

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 10/17] arm64: Enable workaround for TRBE overwrite in FILL mode



On 9/22/21 1:41 PM, Suzuki K Poulose wrote:
> On 22/09/2021 08:23, Anshuman Khandual wrote:
>>
>>
>> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>>> Now that we have the work around implmented in the TRBE
>>> driver, add the Kconfig entries and document the errata.
>>>
>>> Cc: Mark Rutland <[email protected]>
>>> Cc: Will Deacon <[email protected]>
>>> Cc: Catalin Marinas <[email protected]>
>>> Cc: Anshuman Khandual <[email protected]>
>>> Cc: Mathieu Poirier <[email protected]>
>>> Cc: Mike Leach <[email protected]>
>>> Cc: Leo Yan <[email protected]>
>>> Signed-off-by: Suzuki K Poulose <[email protected]>
>>> ---
>>>   Documentation/arm64/silicon-errata.rst |  4 +++
>>>   arch/arm64/Kconfig                     | 39 ++++++++++++++++++++++++++
>>>   2 files changed, 43 insertions(+)
>>>
>>> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
>>> index d410a47ffa57..2f99229d993c 100644
>>> --- a/Documentation/arm64/silicon-errata.rst
>>> +++ b/Documentation/arm64/silicon-errata.rst
>>> @@ -92,12 +92,16 @@ stable kernels.
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>>   | ARM            | Cortex-A77      | #1508412        | ARM64_ERRATUM_1508412       |
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>> +| ARM            | Cortex-A710     | #2119858        | ARM64_ERRATUM_2119858       |
>>> ++----------------+-----------------+-----------------+-----------------------------+
>>>   | ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>>   | ARM            | Neoverse-N1     | #1349291        | N/A                         |
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>>   | ARM            | Neoverse-N1     | #1542419        | ARM64_ERRATUM_1542419       |
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>> +| ARM            | Neoverse-N2     | #2139208        | ARM64_ERRATUM_2139208       |
>>> ++----------------+-----------------+-----------------+-----------------------------+
>>>   | ARM            | MMU-500         | #841119,826419  | N/A                         |
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>>   +----------------+-----------------+-----------------+-----------------------------+
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 077f2ec4eeb2..eac4030322df 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -666,6 +666,45 @@ config ARM64_ERRATUM_1508412
>>>           If unsure, say Y.
>>>   +config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>>> +    bool
>>> +
>>> +config ARM64_ERRATUM_2119858
>>> +    bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
>>> +    default y
>>> +    depends on CORESIGHT_TRBE
>>> +    select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>>> +    help
>>> +      This option adds the workaround for ARM Cortex-A710 erratum 2119858.
>>> +
>>> +      Affected Cortex-A710 cores could overwrite upto 3 cache lines of trace
>>> +      data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
>>> +      the event of a WRAP event.
>>> +
>>> +      Work around the issue by always making sure we move the TRBPTR_EL1 by
>>> +      256bytes before enabling the buffer and filling the first 256bytes of
>>> +      the buffer with ETM ignore packets upon disabling.
>>> +
>>> +      If unsure, say Y.
>>> +
>>> +config ARM64_ERRATUM_2139208
>>> +    bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode"
>>> +    default y
>>> +    depends on CORESIGHT_TRBE
>>> +    select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>>> +    help
>>> +      This option adds the workaround for ARM Neoverse-N2 erratum 2139208.
>>> +
>>> +      Affected Neoverse-N2 cores could overwrite upto 3 cache lines of trace
>>> +      data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
>>
>> s/ponited/pointed
>>
>>> +      the event of a WRAP event.
>>> +
>>> +      Work around the issue by always making sure we move the TRBPTR_EL1 by
>>> +      256bytes before enabling the buffer and filling the first 256bytes of
>>> +      the buffer with ETM ignore packets upon disabling.
>>> +
>>> +      If unsure, say Y.
>>> +
>>>   config CAVIUM_ERRATUM_22375
>>>       bool "Cavium erratum 22375, 24313"
>>>       default y
>>>
>>
>> The real errata problem description for both these erratums are exactly
>> the same. Rather a more generalized description should be included for
>> the ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, which abstracts out the
>> problem and a corresponding solution that is implemented in the driver.
>> This should also help us reduce current redundancy.
>>
>
> The issue is what a user wants to see. A user who wants to configure the
> kernel specifically for a given CPU (think embedded systems), they would
> want to hand pick the errata for the particular CPU. So, moving the help
> text to an implicitly selected Kconfig symbol. I would rather keep this
> as it is to keep it user friendly. This doesn't affect the code size
> anyways.

Understood.

>
> The other option is to remove all the CPU specific Kconfig symbols and
> update the "title" to reflect both the CPU/erratum numbers.

Hmm, but I guess the current proposal is better instead.

2021-10-01 04:41:49

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 14/17] coresight: trbe: Make sure we have enough space



On 9/22/21 3:46 PM, Suzuki K Poulose wrote:
> On 22/09/2021 10:58, Anshuman Khandual wrote:
>>
>>
>> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>>> The TRBE driver makes sure that there is enough space for a meaningful
>>> run, otherwise pads the given space and restarts the offset calculation
>>> once. But there is no guarantee that we may find space or hit "no space".
>>
>> So what happens currently when it neither finds the required minimum buffer
>> space for a meaningful run nor does it hit the "no space" scenario ?
>
> It tries once today and assumes that it will either hit :
>
>  - No space
>    OR
>  - Enough space
>
> which is reasonable, given the minimum space needed is a few bytes.
> But this may no longer be true with other erratum workaround.

Okay.

>
>>
>>> Make sure that we repeat the step until, either :
>>>    - We have the minimum space
>>>     OR
>>>    - There is NO space at all.
>>>
>>> Cc: Anshuman Khandual <[email protected]>
>>> Cc: Mathieu Poirier <[email protected]>
>>> Cc: Mike Leach <[email protected]>
>>> Cc: Leo Yan <[email protected]>
>>> Signed-off-by: Suzuki K Poulose <[email protected]>
>>> ---
>>>   drivers/hwtracing/coresight/coresight-trbe.c | 6 +++++-
>>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>>> index 3373f4e2183b..02f9e00e2091 100644
>>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>>> @@ -451,10 +451,14 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
>>>        * If the head is too close to the limit and we don't
>>>        * have space for a meaningful run, we rather pad it
>>>        * and start fresh.
>>> +     *
>>> +     * We might have to do this more than once to make sure
>>> +     * we have enough required space.
>>
>> OR no space at all, as explained in the commit message.
>> Hence this comment needs an update.
>>
>>>        */
>>> -    if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>>> +    while (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>>>           trbe_pad_buf(handle, limit - head);
>>>           limit = __trbe_normal_offset(handle);
>>> +        head = PERF_IDX2OFF(handle->head, buf);
>>
>> Should the loop be bound with a retry limit as well ?
>
> No. We will eventually hit No-space as we keep on padding
> the buffer.

Got it.

2021-10-01 05:00:46

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] coresight: trbe: Work around write to out of range



On 9/28/21 4:02 PM, Suzuki K Poulose wrote:
> On 23/09/2021 04:15, Anshuman Khandual wrote:
>>
>>
>> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>>> TRBE implementations affected by Arm erratum (2253138 or 2224489), could
>>> write to the next address after the TRBLIMITR.LIMIT, instead of wrapping
>>> to the TRBBASER. This implies that the TRBE could potentially corrupt :
>>>
>>>    - A page used by the rest of the kernel/user (if the LIMIT = end of
>>>      perf ring buffer)
>>>    - A page within the ring buffer, but outside the driver's range.
>>>      [head, head + size]. This may contain some trace data, may be
>>>      consumed by the userspace.
>>>
>>> We workaround this erratum by :
>>>    - Making sure that there is at least an extra PAGE space left in the
>>>      TRBE's range than we normally assign. This will be additional to other
>>>      restrictions (e.g, the TRBE alignment for working around
>>>      TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE.
>>>      Thus we would have 2 * PAGE_SIZE)
>>>
>>>    - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed
>>>      range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :
>>>
>>>          TRBLIMITR.LIMIT -= PAGE_SIZE
>>>
>>> Cc: Anshuman Khandual <[email protected]>
>>> Cc: Mathieu Poirier <[email protected]>
>>> Cc: Mike Leach <[email protected]>
>>> Cc: Leo Yan <[email protected]>
>>> Signed-off-by: Suzuki K Poulose <[email protected]>
>>> ---
>>>   drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++-
>>>   1 file changed, 57 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>>> index 02f9e00e2091..ea907345354c 100644
>>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>>> @@ -86,7 +86,8 @@ struct trbe_buf {
>>>    * affects the given instance of the TRBE.
>>>    */
>>>   #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE    0
>>> -#define TRBE_ERRATA_MAX                1
>>> +#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE    1
>>> +#define TRBE_ERRATA_MAX                2
>>>     /*
>>>    * Safe limit for the number of bytes that may be overwritten
>>> @@ -96,6 +97,7 @@ struct trbe_buf {
>>>     static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>>>       [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>>> +    [TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
>>>   };
>>>     /*
>>> @@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
>>>     static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
>>>   {
>>> -    return TRBE_TRACE_MIN_BUF_SIZE;
>>> +    u64 size = TRBE_TRACE_MIN_BUF_SIZE;
>>> +    struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>>> +
>>> +    /*
>>> +     * When the TRBE is affected by an erratum that could make it
>>> +     * write to the next "virtually addressed" page beyond the LIMIT.
>>
>> What if the next "virtually addressed" page is just blocked from future
>> usage in the kernel and never really gets mapped into a physical page ?
>
> That is the case today for vmap(), the end of the vm_area has a guard
> page. But that implies when the erratum is triggered, the TRBE
> encounters a fault and we need to handle that in the driver. This works
> for "end" of the ring buffer. But not when the LIMIT is in the middle
> of the ring buffer.
>
>> In that case it would be guaranteed that, a next "virtually addressed"
>> page would not even exist after the LIMIT pointer and hence the errata
>> would not be triggered. Something like there is a virtual mapping cliff
>> right after the LIMIT pointer from the MMU perspective.
>>
>> Although it might be bit tricky. Currently the entire ring buffer gets
>> mapped at once with vmap() in arm_trbe_alloc_buffer(). Just to achieve
>> the above solution, each computation of the LIMIT pointer needs to be
>> followed by a temporary unmapping of next virtual page from existing
>> vmap() buffer. Subsequently it could be mapped back as trbe_buf->pages
>> always contains all the physical pages from the perf ring buffer.
>
> It is much easier to leave a page aside than to do this map, unmap
> dance, which might even change the VA address you get and thus it
> complicates the TRBE driver in general. I believe this is much
> simpler and we can reason about the code better. And all faults
> are still illegal for the driver, which helps us to detect any
> other issues in the TRBE.

Agreed, as I had mentioned earlier this would have been anyways bit
complicated. Not changing the virtual address for the entire buffer
and to treat each fault inside the driver as illegal, makes current
implementation much simpler and easier to reason about. So probably
discarding those properties might not be a good idea after all.

2021-10-01 05:23:08

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures



On 9/22/21 5:33 PM, Suzuki K Poulose wrote:
> Hi Anshuman
>
> On 22/09/2021 08:39, Anshuman Khandual wrote:
>>
>>
>> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>>> Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
>>> from errata, where a TSB (trace synchronization barrier)
>>> fails to flush the trace data completely, when executed from
>>> a trace prohibited region. In Linux we always execute it
>>> after we have moved the PE to trace prohibited region. So,
>>> we can apply the workaround everytime a TSB is executed.
>>
>> s/everytime/every time
>
> Ack
>
>>
>>>
>>> The work around is to issue two TSB consecutively.
>>>
>>> NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
>>> that a late CPU could be blocked from booting if it is the
>>> first CPU that requires the workaround. This is because we
>>> do not allow setting a cpu_hwcaps after the SMP boot. The
>>> other alternative is to use "this_cpu_has_cap()" instead
>>> of the faster system wide check, which may be a bit of an
>>> overhead, given we may have to do this in nvhe KVM host
>>> before a guest entry.
>>>
>>> Cc: Will Deacon <[email protected]>
>>> Cc: Catalin Marinas <[email protected]>
>>> Cc: Mathieu Poirier <[email protected]>
>>> Cc: Mike Leach <[email protected]>
>>> Cc: Mark Rutland <[email protected]>
>>> Cc: Anshuman Khandual <[email protected]>
>>> Cc: Marc Zyngier <[email protected]>
>>> Signed-off-by: Suzuki K Poulose <[email protected]>
>>> ---
>
> ...
>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index eac4030322df..0764774e12bb 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -705,6 +705,37 @@ config ARM64_ERRATUM_2139208
>>>           If unsure, say Y.
>>>   +config ARM64_WORKAROUND_TSB_FLUSH_FAILURE
>>> +    bool
>>> +
>>> +config ARM64_ERRATUM_2054223
>>> +    bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace"
>>> +    default y
>>> +    help
>>> +      Enable workaround for ARM Cortex-A710 erratum 2054223
>>> +
>>> +      Affected cores may fail to flush the trace data on a TSB instruction, when
>>> +      the PE is in trace prohibited state. This will cause losing a few bytes
>>> +      of the trace cached.
>>> +
>>> +      Workaround is to issue two TSB consecutively on affected cores.
>>> +
>>> +      If unsure, say Y.
>>> +
>>> +config ARM64_ERRATUM_2067961
>>> +    bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace"
>>> +    default y
>>> +    help
>>> +      Enable workaround for ARM Neoverse-N2 erratum 2067961
>>> +
>>> +      Affected cores may fail to flush the trace data on a TSB instruction, when
>>> +      the PE is in trace prohibited state. This will cause losing a few bytes
>>> +      of the trace cached.
>>> +
>>> +      Workaround is to issue two TSB consecutively on affected cores.
>>
>> Like I had mentioned in the previous patch, these descriptions here could
>> be just factored out inside ARM64_WORKAROUND_TSB_FLUSH_FAILURE instead.
>
> Please see my response there.
>
>>
>>> +
>>> +      If unsure, say Y.
>>> +
>>>   config CAVIUM_ERRATUM_22375
>>>       bool "Cavium erratum 22375, 24313"
>>>       default y
>>> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
>>> index 451e11e5fd23..1c5a00598458 100644
>>> --- a/arch/arm64/include/asm/barrier.h
>>> +++ b/arch/arm64/include/asm/barrier.h
>>> @@ -23,7 +23,7 @@
>>>   #define dsb(opt)    asm volatile("dsb " #opt : : : "memory")
>>>     #define psb_csync()    asm volatile("hint #17" : : : "memory")
>>> -#define tsb_csync()    asm volatile("hint #18" : : : "memory")
>>> +#define __tsb_csync()    asm volatile("hint #18" : : : "memory")
>>>   #define csdb()        asm volatile("hint #20" : : : "memory")
>>>     #ifdef CONFIG_ARM64_PSEUDO_NMI
>>> @@ -46,6 +46,20 @@
>>>   #define dma_rmb()    dmb(oshld)
>>>   #define dma_wmb()    dmb(oshst)
>>>   +
>>> +#define tsb_csync()                                \
>>> +    do {                                    \
>>> +        /*                                \
>>> +         * CPUs affected by Arm Erratum 2054223 or 2067961 needs    \
>>> +         * another TSB to ensure the trace is flushed. The barriers    \
>>> +         * don't have to be strictly back to back, as long as the    \
>>> +         * CPU is in trace prohibited state.                \
>>> +         */                                \
>>> +        if (cpus_have_final_cap(ARM64_WORKAROUND_TSB_FLUSH_FAILURE))    \
>>> +            __tsb_csync();                        \
>>> +        __tsb_csync();                            \
>>> +    } while (0)
>>> +
>>>   /*
>>>    * Generate a mask for array_index__nospec() that is ~0UL when 0 <= idx < sz
>>>    * and 0 otherwise.
>>> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
>>> index ccd757373f36..bdbeac75ead6 100644
>>> --- a/arch/arm64/kernel/cpu_errata.c
>>> +++ b/arch/arm64/kernel/cpu_errata.c
>>> @@ -352,6 +352,18 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
>>>   };
>>>   #endif    /* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */
>>>   +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE
>>> +static const struct midr_range tsb_flush_fail_cpus[] = {
>>> +#ifdef CONFIG_ARM64_ERRATUM_2067961
>>> +    MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
>>> +#endif
>>> +#ifdef CONFIG_ARM64_ERRATUM_2054223
>>> +    MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
>>> +#endif
>>> +    {},
>>> +};
>>> +#endif    /* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */
>>> +
>>>   const struct arm64_cpu_capabilities arm64_errata[] = {
>>>   #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE
>>>       {
>>> @@ -558,6 +570,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>>>           .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
>>>           CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus),
>>>       },
>>> +#endif
>>> +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILRE
>>> +    {
>>> +        .desc = "ARM erratum 2067961 or 2054223",
>>> +        .capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE,
>>> +        ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus),
>>> +    },
>>>   #endif
>>>       {
>>>       }
>>> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
>>> index 1ccb92165bd8..2102e15af43d 100644
>>> --- a/arch/arm64/tools/cpucaps
>>> +++ b/arch/arm64/tools/cpucaps
>>> @@ -54,6 +54,7 @@ WORKAROUND_1463225
>>>   WORKAROUND_1508412
>>>   WORKAROUND_1542419
>>>   WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>>> +WORKAROUND_TSB_FLUSH_FAILURE
>>>   WORKAROUND_CAVIUM_23154
>>>   WORKAROUND_CAVIUM_27456
>>>   WORKAROUND_CAVIUM_30115
>>>
>>
>> This adds all the required bits of these erratas in a single patch,
>> where as the previous work around had split all the required pieces
>> into multiple patches. Could we instead follow the same standard in
>> both the places ?
>
> We could do this for this particular erratum as the work around is
> within the arm64 kernel code, unlike the other ones - where the TRBE
> driver needs a change.
>
> So, there is a kind of dependency for the other two, which we don't
> in this particular case.
>
> i.e, TRBE driver needs a cpucap number to implement the work around ->
> The arm64 kernel must define one, which we cant advertise yet until
> we have a TRBE work around.
>
> Thus, they follow a 3 step model.
>
>  - Define CPUCAP erratum
>  - TRBE driver work around
>  - Finally advertise to the user.
>
> I don't think this one needs that.

Okay, understood.

2021-10-01 08:56:47

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated

On 30/09/2021 18:54, Mathieu Poirier wrote:
> Hi Suzuki,
>
> On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:
>> We collect the trace from the TRBE on FILL event from IRQ context
>> and when via update_buffer(), when the event is stopped. Let us
>
> s/"and when via"/"and via"
>
>> consolidate how we calculate the trace generated into a helper.
>>
>> Cc: Mathieu Poirier <[email protected]>
>> Cc: Mike Leach <[email protected]>
>> Cc: Leo Yan <[email protected]>
>> Reviewed-by: Anshuman Khandual <[email protected]>
>> Signed-off-by: Suzuki K Poulose <[email protected]>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++--------
>> 1 file changed, 30 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 63f7edd5fd1f..063c4505a203 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>> return TRBE_FAULT_ACT_SPURIOUS;
>> }
>>
>> +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>> + struct trbe_buf *buf,
>> + bool wrap)
>
> Stacking
>

Ack

>> +{
>> + u64 write;
>> + u64 start_off, end_off;
>> +
>> + /*
>> + * If the TRBE has wrapped around the write pointer has
>> + * wrapped and should be treated as limit.
>> + */
>> + if (wrap)
>> + write = get_trbe_limit_pointer();
>> + else
>> + write = get_trbe_write_pointer();
>> +
>> + end_off = write - buf->trbe_base;
>
> In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is
> acquired using get_trbe_base_pointer() but here it is referenced directly - any
> reason for that? It certainly makes reviewing this simple patch quite
> difficult because I keep wondering if I am missing something subtle...

Very good observation. So far, we always prgrammed the TRBBASER with the
the VA(ring_buffer[0]). And thus reading the BASER and using the
buf->trbe_base is all fine.

But going forward, we are going to use different values for the TRBBASER
to work around erratum. Thus to make the computation of the "offsets"
within the ring buffer, it is always correct to use this field. I could
move this to the patch where the work around is introduced, and put in
a comment there.

Thanks for the review

Suzuki

2021-10-01 15:17:41

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated

On Fri, Oct 01, 2021 at 09:36:24AM +0100, Suzuki K Poulose wrote:
> On 30/09/2021 18:54, Mathieu Poirier wrote:
> > Hi Suzuki,
> >
> > On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:
> > > We collect the trace from the TRBE on FILL event from IRQ context
> > > and when via update_buffer(), when the event is stopped. Let us
> >
> > s/"and when via"/"and via"
> >
> > > consolidate how we calculate the trace generated into a helper.
> > >
> > > Cc: Mathieu Poirier <[email protected]>
> > > Cc: Mike Leach <[email protected]>
> > > Cc: Leo Yan <[email protected]>
> > > Reviewed-by: Anshuman Khandual <[email protected]>
> > > Signed-off-by: Suzuki K Poulose <[email protected]>
> > > ---
> > > drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++--------
> > > 1 file changed, 30 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> > > index 63f7edd5fd1f..063c4505a203 100644
> > > --- a/drivers/hwtracing/coresight/coresight-trbe.c
> > > +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> > > @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
> > > return TRBE_FAULT_ACT_SPURIOUS;
> > > }
> > > +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
> > > + struct trbe_buf *buf,
> > > + bool wrap)
> >
> > Stacking
> >
>
> Ack
>
> > > +{
> > > + u64 write;
> > > + u64 start_off, end_off;
> > > +
> > > + /*
> > > + * If the TRBE has wrapped around the write pointer has
> > > + * wrapped and should be treated as limit.
> > > + */
> > > + if (wrap)
> > > + write = get_trbe_limit_pointer();
> > > + else
> > > + write = get_trbe_write_pointer();
> > > +
> > > + end_off = write - buf->trbe_base;
> >
> > In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is
> > acquired using get_trbe_base_pointer() but here it is referenced directly - any
> > reason for that? It certainly makes reviewing this simple patch quite
> > difficult because I keep wondering if I am missing something subtle...
>
> Very good observation. So far, we always prgrammed the TRBBASER with the
> the VA(ring_buffer[0]). And thus reading the BASER and using the
> buf->trbe_base is all fine.
>
> But going forward, we are going to use different values for the TRBBASER
> to work around erratum. Thus to make the computation of the "offsets"
> within the ring buffer, it is always correct to use this field. I could
> move this to the patch where the work around is introduced, and put in
> a comment there.

That will be greatly appreciated.

>
> Thanks for the review
>
> Suzuki
>

2021-10-01 15:25:26

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated

On 01/10/2021 16:15, Mathieu Poirier wrote:
> On Fri, Oct 01, 2021 at 09:36:24AM +0100, Suzuki K Poulose wrote:
>> On 30/09/2021 18:54, Mathieu Poirier wrote:
>>> Hi Suzuki,
>>>
>>> On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:
>>>> We collect the trace from the TRBE on FILL event from IRQ context
>>>> and when via update_buffer(), when the event is stopped. Let us
>>>
>>> s/"and when via"/"and via"
>>>
>>>> consolidate how we calculate the trace generated into a helper.
>>>>
>>>> Cc: Mathieu Poirier <[email protected]>
>>>> Cc: Mike Leach <[email protected]>
>>>> Cc: Leo Yan <[email protected]>
>>>> Reviewed-by: Anshuman Khandual <[email protected]>
>>>> Signed-off-by: Suzuki K Poulose <[email protected]>
>>>> ---
>>>> drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++--------
>>>> 1 file changed, 30 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>>>> index 63f7edd5fd1f..063c4505a203 100644
>>>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>>>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>>>> @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>>>> return TRBE_FAULT_ACT_SPURIOUS;
>>>> }
>>>> +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>>>> + struct trbe_buf *buf,
>>>> + bool wrap)
>>>
>>> Stacking
>>>
>>
>> Ack
>>
>>>> +{
>>>> + u64 write;
>>>> + u64 start_off, end_off;
>>>> +
>>>> + /*
>>>> + * If the TRBE has wrapped around the write pointer has
>>>> + * wrapped and should be treated as limit.
>>>> + */
>>>> + if (wrap)
>>>> + write = get_trbe_limit_pointer();
>>>> + else
>>>> + write = get_trbe_write_pointer();
>>>> +
>>>> + end_off = write - buf->trbe_base;
>>>
>>> In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is
>>> acquired using get_trbe_base_pointer() but here it is referenced directly - any
>>> reason for that? It certainly makes reviewing this simple patch quite
>>> difficult because I keep wondering if I am missing something subtle...
>>
>> Very good observation. So far, we always prgrammed the TRBBASER with the
>> the VA(ring_buffer[0]). And thus reading the BASER and using the
>> buf->trbe_base is all fine.
>>
>> But going forward, we are going to use different values for the TRBBASER
>> to work around erratum. Thus to make the computation of the "offsets"
>> within the ring buffer, it is always correct to use this field. I could
>> move this to the patch where the work around is introduced, and put in
>> a comment there.
>
> That will be greatly appreciated.

I have moved this to the patch, which introduces the concept of "TRBE
using" a different BASE address than the beginning of the ring buffer.

Thanks
Suzuki

2021-10-05 00:18:05

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle

On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:
> Add a helper to get the CPU specific data for TRBE instance, from
> a given perf handle. This also adds extra checks to make sure that
> the event associated with the handle is "bound" to the CPU and is
> active on the TRBE.
>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Leo Yan <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>
> ---
> drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 983dd5039e52..797d978f9fa7 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
> return buf->nr_pages * PAGE_SIZE;
> }
>
> +static inline struct trbe_cpudata *
> +trbe_handle_to_cpudata(struct perf_output_handle *handle)
> +{
> + struct trbe_buf *buf = etm_perf_sink_config(handle);
> +
> + BUG_ON(!buf || !buf->cpudata);
> + return buf->cpudata;
> +}
> +
> /*
> * TRBE Limit Calculation
> *
> @@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
> {
> int ec = get_trbe_ec(trbsr);
> int bsc = get_trbe_bsc(trbsr);
> - struct trbe_buf *buf = etm_perf_sink_config(handle);
> - struct trbe_cpudata *cpudata = buf->cpudata;
> + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);

There is two other places where this pattern is present: is_perf_trbe() and
__trbe_normal_offset().

I have to stop here for today. More comments tomorrow.

Thanks,
Mathieu

>
> WARN_ON(is_trbe_running(trbsr));
> if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
> --
> 2.24.1
>

2021-10-05 17:09:17

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds

On Tue, Sep 21, 2021 at 02:41:04PM +0100, Suzuki K Poulose wrote:
> This series adds CPU erratum work arounds related to the self-hosted
> tracing. The list of affected errata handled in this series are :
>
> * TRBE may overwrite trace in FILL mode
> - Arm Neoverse-N2 #2139208
> - Cortex-A710 #211985
>
> * A TSB instruction may not flush the trace completely when executed
> in trace prohibited region.
>
> - Arm Neoverse-N2 #2067961
> - Cortex-A710 #2054223
>
> * TRBE may write to out-of-range address
> - Arm Neoverse-N2 #2253138
> - Cortex-A710 #2224489
>
> The series applies on the self-hosted/trbe fixes posted here [0].
> A tree containing both the series is available here [1]
>
> [0] https://lkml.kernel.org/r/[email protected]
> [1] [email protected]:linux-arm/linux-skp.git coresight/errata/trbe-tsb-n2-a710/v2
>
> Changes since v1:
> https://lkml.kernel.org/r/[email protected]
> - Added a fix to the TRBE driver handling of sink_specific data
> - Added more description and ASCII art for overwrite in FILL mode
> work around
> - Added another TRBE erratum to the list.
> "TRBE may write to out-of-range address"
> Patches from 12-17
> - Added comment to list the expectations around TSB erratum workaround.
>
>
> Suzuki K Poulose (17):
> coresight: trbe: Fix incorrect access of the sink specific data
> coresight: trbe: Add infrastructure for Errata handling
> coresight: trbe: Add a helper to calculate the trace generated
> coresight: trbe: Add a helper to pad a given buffer area
> coresight: trbe: Decouple buffer base from the hardware base
> coresight: trbe: Allow driver to choose a different alignment
> arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
> arm64: Add erratum detection for TRBE overwrite in FILL mode
> coresight: trbe: Workaround TRBE errata overwrite in FILL mode
> arm64: Enable workaround for TRBE overwrite in FILL mode
> arm64: errata: Add workaround for TSB flush failures
> coresight: trbe: Add a helper to fetch cpudata from perf handle
> coresight: trbe: Add a helper to determine the minimum buffer size
> coresight: trbe: Make sure we have enough space
> arm64: Add erratum detection for TRBE write to out-of-range
> coresight: trbe: Work around write to out of range
> arm64: Advertise TRBE erratum workaround for write to out-of-range address
>
> Documentation/arm64/silicon-errata.rst | 12 +
> arch/arm64/Kconfig | 109 ++++++
> arch/arm64/include/asm/barrier.h | 16 +-
> arch/arm64/include/asm/cputype.h | 4 +
> arch/arm64/kernel/cpu_errata.c | 64 ++++
> arch/arm64/tools/cpucaps | 3 +
> drivers/hwtracing/coresight/coresight-trbe.c | 339 +++++++++++++++++--
> 7 files changed, 510 insertions(+), 37 deletions(-)

Patches 04 to 11 and 13 to 17:

Reviewed-by: Mathieu Poirier <[email protected]>

I am done reviewing this set.

Thanks,
Mathieu

>
> --
> 2.24.1
>

2021-10-05 22:39:35

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle

Hi Mathieu

On 04/10/2021 18:42, Mathieu Poirier wrote:
> On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:
>> Add a helper to get the CPU specific data for TRBE instance, from
>> a given perf handle. This also adds extra checks to make sure that
>> the event associated with the handle is "bound" to the CPU and is
>> active on the TRBE.
>>
>> Cc: Anshuman Khandual <[email protected]>
>> Cc: Mike Leach <[email protected]>
>> Cc: Mathieu Poirier <[email protected]>
>> Cc: Leo Yan <[email protected]>
>> Signed-off-by: Suzuki K Poulose <[email protected]>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 983dd5039e52..797d978f9fa7 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
>> return buf->nr_pages * PAGE_SIZE;
>> }
>>
>> +static inline struct trbe_cpudata *
>> +trbe_handle_to_cpudata(struct perf_output_handle *handle)
>> +{
>> + struct trbe_buf *buf = etm_perf_sink_config(handle);
>> +
>> + BUG_ON(!buf || !buf->cpudata);
>> + return buf->cpudata;
>> +}
>> +
>> /*
>> * TRBE Limit Calculation
>> *
>> @@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
>> {
>> int ec = get_trbe_ec(trbsr);
>> int bsc = get_trbe_bsc(trbsr);
>> - struct trbe_buf *buf = etm_perf_sink_config(handle);
>> - struct trbe_cpudata *cpudata = buf->cpudata;
>> + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>
> There is two other places where this pattern is present: is_perf_trbe() and
> __trbe_normal_offset().

I skipped them, as they have to get access to the "trbe_buf" anyways.
So the step by step, made sense. But I could replace them too to make it
transparent.

What do you think ?

Suzuki


2021-10-06 17:20:37

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle

On Tue, Oct 05, 2021 at 11:35:13PM +0100, Suzuki K Poulose wrote:
> Hi Mathieu
>
> On 04/10/2021 18:42, Mathieu Poirier wrote:
> > On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:
> > > Add a helper to get the CPU specific data for TRBE instance, from
> > > a given perf handle. This also adds extra checks to make sure that
> > > the event associated with the handle is "bound" to the CPU and is
> > > active on the TRBE.
> > >
> > > Cc: Anshuman Khandual <[email protected]>
> > > Cc: Mike Leach <[email protected]>
> > > Cc: Mathieu Poirier <[email protected]>
> > > Cc: Leo Yan <[email protected]>
> > > Signed-off-by: Suzuki K Poulose <[email protected]>
> > > ---
> > > drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
> > > 1 file changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> > > index 983dd5039e52..797d978f9fa7 100644
> > > --- a/drivers/hwtracing/coresight/coresight-trbe.c
> > > +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> > > @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
> > > return buf->nr_pages * PAGE_SIZE;
> > > }
> > > +static inline struct trbe_cpudata *
> > > +trbe_handle_to_cpudata(struct perf_output_handle *handle)
> > > +{
> > > + struct trbe_buf *buf = etm_perf_sink_config(handle);
> > > +
> > > + BUG_ON(!buf || !buf->cpudata);
> > > + return buf->cpudata;
> > > +}
> > > +
> > > /*
> > > * TRBE Limit Calculation
> > > *
> > > @@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
> > > {
> > > int ec = get_trbe_ec(trbsr);
> > > int bsc = get_trbe_bsc(trbsr);
> > > - struct trbe_buf *buf = etm_perf_sink_config(handle);
> > > - struct trbe_cpudata *cpudata = buf->cpudata;
> > > + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
> >
> > There is two other places where this pattern is present: is_perf_trbe() and
> > __trbe_normal_offset().
>
> I skipped them, as they have to get access to the "trbe_buf" anyways.
> So the step by step, made sense. But I could replace them too to make it
> transparent.
>
> What do you think ?

Humm... I don't think there is a right way or a wrong way here. If we move
forward with this patchset we have two ways of getting to buf->cpudata. One
using trbe_handle_to_cpudata() and another one as laid out in is_perf_trbe() and
__trbe_normal_offset(), each with an equal number of occurences (2 for each).

I am usually not fond of small functions like trbe_handle_to_cpudata() and to me
keeping the current heuristic in trbe_get_fault_act() would have been just fine.
I agree with the argument that trbe_handle_to_cpudata() provides more checks but
is it really worth it if they aren't done everywhere?

In short I would get rid of trbe_handle_to_cpudata() entirely and live without
the extra checks... But I'm not strongly opinionated on this either.

>
> Suzuki
>
>

2021-10-07 10:58:39

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle

On 06/10/2021 18:15, Mathieu Poirier wrote:
> On Tue, Oct 05, 2021 at 11:35:13PM +0100, Suzuki K Poulose wrote:
>> Hi Mathieu
>>
>> On 04/10/2021 18:42, Mathieu Poirier wrote:
>>> On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:
>>>> Add a helper to get the CPU specific data for TRBE instance, from
>>>> a given perf handle. This also adds extra checks to make sure that
>>>> the event associated with the handle is "bound" to the CPU and is
>>>> active on the TRBE.
>>>>
>>>> Cc: Anshuman Khandual <[email protected]>
>>>> Cc: Mike Leach <[email protected]>
>>>> Cc: Mathieu Poirier <[email protected]>
>>>> Cc: Leo Yan <[email protected]>
>>>> Signed-off-by: Suzuki K Poulose <[email protected]>
>>>> ---
>>>> drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++--
>>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>>>> index 983dd5039e52..797d978f9fa7 100644
>>>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>>>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>>>> @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle)
>>>> return buf->nr_pages * PAGE_SIZE;
>>>> }
>>>> +static inline struct trbe_cpudata *
>>>> +trbe_handle_to_cpudata(struct perf_output_handle *handle)
>>>> +{
>>>> + struct trbe_buf *buf = etm_perf_sink_config(handle);
>>>> +
>>>> + BUG_ON(!buf || !buf->cpudata);
>>>> + return buf->cpudata;
>>>> +}
>>>> +
>>>> /*
>>>> * TRBE Limit Calculation
>>>> *
>>>> @@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand
>>>> {
>>>> int ec = get_trbe_ec(trbsr);
>>>> int bsc = get_trbe_bsc(trbsr);
>>>> - struct trbe_buf *buf = etm_perf_sink_config(handle);
>>>> - struct trbe_cpudata *cpudata = buf->cpudata;
>>>> + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>>>
>>> There is two other places where this pattern is present: is_perf_trbe() and
>>> __trbe_normal_offset().
>>
>> I skipped them, as they have to get access to the "trbe_buf" anyways.
>> So the step by step, made sense. But I could replace them too to make it
>> transparent.
>>
>> What do you think ?
>
> Humm... I don't think there is a right way or a wrong way here. If we move
> forward with this patchset we have two ways of getting to buf->cpudata. One
> using trbe_handle_to_cpudata() and another one as laid out in is_perf_trbe() and
> __trbe_normal_offset(), each with an equal number of occurences (2 for each).
>
> I am usually not fond of small functions like trbe_handle_to_cpudata() and to me
> keeping the current heuristic in trbe_get_fault_act() would have been just fine.

There is another user introduced in the work around patch. But, yes, I
agree, we could open code it, rather than having it inconsistent across
the driver.

> I agree with the argument that trbe_handle_to_cpudata() provides more checks but
> is it really worth it if they aren't done everywhere?
>
> In short I would get rid of trbe_handle_to_cpudata() entirely and live without
> the extra checks... But I'm not strongly opinionated on this either.

Ok, I will remove this then. Thanks for the feedback.

Suzuki

2021-10-07 19:40:40

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures

On Tue, Sep 21, 2021 at 02:41:15PM +0100, Suzuki K Poulose wrote:
> Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers
> from errata, where a TSB (trace synchronization barrier)
> fails to flush the trace data completely, when executed from
> a trace prohibited region. In Linux we always execute it
> after we have moved the PE to trace prohibited region. So,
> we can apply the workaround everytime a TSB is executed.
>
> The work around is to issue two TSB consecutively.
>
> NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying
> that a late CPU could be blocked from booting if it is the
> first CPU that requires the workaround. This is because we
> do not allow setting a cpu_hwcaps after the SMP boot. The
> other alternative is to use "this_cpu_has_cap()" instead
> of the faster system wide check, which may be a bit of an
> overhead, given we may have to do this in nvhe KVM host
> before a guest entry.
>
> Cc: Will Deacon <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>

Acked-by: Catalin Marinas <[email protected]>

2021-10-07 19:40:41

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range

On Tue, Sep 21, 2021 at 02:41:19PM +0100, Suzuki K Poulose wrote:
> Arm Neoverse-N2 and Cortex-A710 cores are affected by an erratum where the
> trbe, under some circumstances, might write upto 64bytes to an address after
> the Limit as programmed by the TRBLIMITR_EL1.LIMIT. This might -
>
> - Corrupt a page in the ring buffer, which may corrupt trace from a
> previous session, consumed by userspace.
> - Hit the guard page at the end of the vmalloc area and raise a fault.
>
> To keep the handling simpler, we always leave the last page from the
> range, which TRBE is allowed to write. This can be achieved by ensuring
> that we always have more than a PAGE worth space in the range, while
> calculating the LIMIT for TRBE. And then the LIMIT pointer can be adjusted
> to leave the PAGE (TRBLIMITR.LIMIT -= PAGE_SIZE), out of the TRBE range
> while enabling it. This makes sure that the TRBE will only write to an area
> within its allowed limit (i.e, [head-head+size]) and we do not have to handle
> address faults within the driver.
>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Leo Yan <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>

Acked-by: Catalin Marinas <[email protected]>

2021-10-07 19:40:43

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address

On Tue, Sep 21, 2021 at 02:41:21PM +0100, Suzuki K Poulose wrote:
> Add Kconfig entries for the errata workarounds for TRBE writing
> to an out-of-range address.
>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Leo Yan <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>

Acked-by: Catalin Marinas <[email protected]>

2021-10-07 20:47:59

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 10/17] arm64: Enable workaround for TRBE overwrite in FILL mode

On Tue, Sep 21, 2021 at 02:41:14PM +0100, Suzuki K Poulose wrote:
> Now that we have the work around implmented in the TRBE
> driver, add the Kconfig entries and document the errata.
>
> Cc: Mark Rutland <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Mathieu Poirier <[email protected]>
> Cc: Mike Leach <[email protected]>
> Cc: Leo Yan <[email protected]>
> Signed-off-by: Suzuki K Poulose <[email protected]>

Acked-by: Catalin Marinas <[email protected]>

2021-10-08 07:37:02

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds

Hi Suzuki,

On Tue, Sep 21, 2021 at 02:41:04PM +0100, Suzuki K Poulose wrote:
> This series adds CPU erratum work arounds related to the self-hosted
> tracing. The list of affected errata handled in this series are :
>
> * TRBE may overwrite trace in FILL mode
> - Arm Neoverse-N2 #2139208
> - Cortex-A710 #211985
>
> * A TSB instruction may not flush the trace completely when executed
> in trace prohibited region.
>
> - Arm Neoverse-N2 #2067961
> - Cortex-A710 #2054223
>
> * TRBE may write to out-of-range address
> - Arm Neoverse-N2 #2253138
> - Cortex-A710 #2224489
>
> The series applies on the self-hosted/trbe fixes posted here [0].
> A tree containing both the series is available here [1]

Any chance you could put the arch/arm64/ bits at the start of the series,
please? That way, I can queue them on their own branch which can be shared
with the coresight tree.

Thanks,

Will

2021-10-08 09:29:28

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds

Hi Will

On 08/10/2021 08:32, Will Deacon wrote:
> Hi Suzuki,
>
> On Tue, Sep 21, 2021 at 02:41:04PM +0100, Suzuki K Poulose wrote:
>> This series adds CPU erratum work arounds related to the self-hosted
>> tracing. The list of affected errata handled in this series are :
>>
>> * TRBE may overwrite trace in FILL mode
>> - Arm Neoverse-N2 #2139208
>> - Cortex-A710 #211985
>>
>> * A TSB instruction may not flush the trace completely when executed
>> in trace prohibited region.
>>
>> - Arm Neoverse-N2 #2067961
>> - Cortex-A710 #2054223
>>
>> * TRBE may write to out-of-range address
>> - Arm Neoverse-N2 #2253138
>> - Cortex-A710 #2224489
>>
>> The series applies on the self-hosted/trbe fixes posted here [0].
>> A tree containing both the series is available here [1]
>
> Any chance you could put the arch/arm64/ bits at the start of the series,
> please? That way, I can queue them on their own branch which can be shared
> with the coresight tree.

I could move the bits around. I have a question though.

Will, Catalin, Mathieu,

The workaround for these errata, at least two of them are
in the TRBE driver patches. Are we happy with enabling the Kconfig
entry in the kernel, without the CoreSight patches to implement the work
around ?

Suzuki

2021-10-08 09:54:54

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds

On Fri, Oct 08, 2021 at 10:25:03AM +0100, Suzuki K Poulose wrote:
> Hi Will
>
> On 08/10/2021 08:32, Will Deacon wrote:
> > Hi Suzuki,
> >
> > On Tue, Sep 21, 2021 at 02:41:04PM +0100, Suzuki K Poulose wrote:
> > > This series adds CPU erratum work arounds related to the self-hosted
> > > tracing. The list of affected errata handled in this series are :
> > >
> > > * TRBE may overwrite trace in FILL mode
> > > - Arm Neoverse-N2 #2139208
> > > - Cortex-A710 #211985
> > >
> > > * A TSB instruction may not flush the trace completely when executed
> > > in trace prohibited region.
> > >
> > > - Arm Neoverse-N2 #2067961
> > > - Cortex-A710 #2054223
> > >
> > > * TRBE may write to out-of-range address
> > > - Arm Neoverse-N2 #2253138
> > > - Cortex-A710 #2224489
> > >
> > > The series applies on the self-hosted/trbe fixes posted here [0].
> > > A tree containing both the series is available here [1]
> >
> > Any chance you could put the arch/arm64/ bits at the start of the series,
> > please? That way, I can queue them on their own branch which can be shared
> > with the coresight tree.
>
> I could move the bits around. I have a question though.
>
> Will, Catalin, Mathieu,
>
> The workaround for these errata, at least two of them are
> in the TRBE driver patches. Are we happy with enabling the Kconfig
> entry in the kernel, without the CoreSight patches to implement the work
> around ?

I suppose you could move all the Kconfig changes into their own patch and
stick it right at the end in the Coresight tree.

Will

2021-10-08 10:01:11

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds

On 08/10/2021 10:52, Will Deacon wrote:
> On Fri, Oct 08, 2021 at 10:25:03AM +0100, Suzuki K Poulose wrote:
>> Hi Will
>>
>> On 08/10/2021 08:32, Will Deacon wrote:
>>> Hi Suzuki,
>>>
>>> On Tue, Sep 21, 2021 at 02:41:04PM +0100, Suzuki K Poulose wrote:
>>>> This series adds CPU erratum work arounds related to the self-hosted
>>>> tracing. The list of affected errata handled in this series are :
>>>>
>>>> * TRBE may overwrite trace in FILL mode
>>>> - Arm Neoverse-N2 #2139208
>>>> - Cortex-A710 #211985
>>>>
>>>> * A TSB instruction may not flush the trace completely when executed
>>>> in trace prohibited region.
>>>>
>>>> - Arm Neoverse-N2 #2067961
>>>> - Cortex-A710 #2054223
>>>>
>>>> * TRBE may write to out-of-range address
>>>> - Arm Neoverse-N2 #2253138
>>>> - Cortex-A710 #2224489
>>>>
>>>> The series applies on the self-hosted/trbe fixes posted here [0].
>>>> A tree containing both the series is available here [1]
>>>
>>> Any chance you could put the arch/arm64/ bits at the start of the series,
>>> please? That way, I can queue them on their own branch which can be shared
>>> with the coresight tree.
>>
>> I could move the bits around. I have a question though.
>>
>> Will, Catalin, Mathieu,
>>
>> The workaround for these errata, at least two of them are
>> in the TRBE driver patches. Are we happy with enabling the Kconfig
>> entry in the kernel, without the CoreSight patches to implement the work
>> around ?
>
> I suppose you could move all the Kconfig changes into their own patch and
> stick it right at the end in the Coresight tree.

Cool, I will do that then. Thanks. I will send the updated series.

Suzuki