2022-04-28 07:53:51

by Sai Prakash Ranjan

[permalink] [raw]
Subject: [PATCHv11 0/6] lib/rwmmio/arm64: Add support to trace register reads/writes

Generic MMIO read/write i.e., __raw_{read,write}{b,l,w,q} accessors
are typically used to read/write from/to memory mapped registers
and can cause hangs or some undefined behaviour in following cases,

* If the access to the register space is unclocked, for example: if
there is an access to multimedia(MM) block registers without MM
clocks.

* If the register space is protected and not set to be accessible from
non-secure world, for example: only EL3 (EL: Exception level) access
is allowed and any EL2/EL1 access is forbidden.

* If xPU(memory/register protection units) is controlling access to
certain memory/register space for specific clients.

and more...

Such cases usually results in instant reboot/SErrors/NOC or interconnect
hangs and tracing these register accesses can be very helpful to debug
such issues during initial development stages and also in later stages.

So use ftrace trace events to log such MMIO register accesses which
provides rich feature set such as early enablement of trace events,
filtering capability, dumping ftrace logs on console and many more.

Sample output:

rwmmio_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
rwmmio_post_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
rwmmio_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 addr=0xfffffbfffdbff610
rwmmio_post_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 val=0x0 addr=0xfffffbfffdbff610

This series is a follow-up for the series [1] and a recent series [2] making use
of both.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/

Note in v4 version, Arnd suggested to benchmark and compare size with callback
based implementation, please see [3] for more details on that with brief comparison below.


**Inline version with CONFIG_FTRACE=y and CONFIG_TRACE_MMIO_ACCESS=y**
$ size vmlinux
text data bss dec hex filename
23884219 14284468 532568 38701255 24e88c7 vmlinux

**Callback version with CONFIG_FTRACE=y and CONFIG_TRACE_MMIO_ACCESS=y**
$ size vmlinux
text data bss dec hex filename
24108179 14279596 532568 38920343 251e097 vmlinux

$ ./scripts/bloat-o-meter inline-vmlinux callback-vmlinux
add/remove: 8/3 grow/shrink: 4889/89 up/down: 242244/-11564 (230680)
Total: Before=25812612, After=26043292, chg +0.89%

[3] https://lore.kernel.org/lkml/[email protected]/

Changes in v11:
* Use unsigned long for caller ip and current ip addr (Steven Rostedt).
* Include review tags from Arnd.

Changes in v10:
* Use GENMASK(31, 0) for -Woverflow warning in irqchip tegra driver (Marc).
* Convert ETM4x ARM64 driver to use asm-generic IO memory barriers (Catalin).
* Collect ack from Catalin for arm64 change.

Changes in v9:
* Use TRACE_EVENT_CLASS for rwmmio_write and post_write (Steven Rostedt).

Changes in v8:
* Fix build error reported by kernel test robot.

Changes in v7:
* Use lib/ instead of kernel/trace/ based on review comment by Steven Rostedt.

Changes in v6:
* Implemented suggestions by Arnd Bergmann:
- Use arch independent IO barriers in arm64/asm
- Add ARCH_HAVE_TRACE_MMIO_ACCESS
- Add post read and post write logging support
- Remove tracepoint_active check
* Fix build error reported by kernel test robot.

Changes in v5:
* Move arm64 to use asm-generic provided high level MMIO accessors (Arnd).
* Add inline logging for MMIO relaxed and non-relaxed accessors.
* Move nVHE KVM comment to makefile (Marc).
* Fix overflow warning due to switch to inline accessors instead of macro.
* Modify trace event field to include caller and parent details for more detailed logs.

Changes in v4:
* Drop dynamic debug based filter support since that will be developed later with
the help from Steven (Ftrace maintainer).
* Drop value passed to writel as it is causing hangs when tracing is enabled.
* Code cleanup for trace event as suggested by Steven for earlier version.
* Fixed some build errors reported by 0-day bot.

Changes in v3:
* Create a generic mmio header for instrumented version (Earlier suggested in [1]
by Will Deacon and recently [2] by Greg to have a generic version first).
* Add dynamic debug support to filter out traces which can be very useful for targeted
debugging specific to subsystems or drivers.
* Few modifications to the rwmmio trace event fields to include the mmio width and print
addresses in hex.
* Rewrote commit msg to explain some more about usecases.

Prasad Sodagudi (1):
lib: Add register read/write tracing support

Sai Prakash Ranjan (5):
arm64: io: Use asm-generic high level MMIO accessors
coresight: etm4x: Use asm-generic IO memory barriers
irqchip/tegra: Fix overflow implicit truncation warnings
drm/meson: Fix overflow implicit truncation warnings
asm-generic/io: Add logging support for MMIO accessors

arch/Kconfig | 3 +
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/io.h | 41 ++------
arch/arm64/kvm/hyp/nvhe/Makefile | 7 +-
drivers/gpu/drm/meson/meson_viu.c | 22 ++---
.../coresight/coresight-etm4x-core.c | 8 +-
drivers/hwtracing/coresight/coresight-etm4x.h | 8 +-
drivers/irqchip/irq-tegra.c | 10 +-
include/asm-generic/io.h | 82 +++++++++++++++-
include/trace/events/rwmmio.h | 97 +++++++++++++++++++
lib/Kconfig | 7 ++
lib/Makefile | 2 +
lib/trace_readwrite.c | 47 +++++++++
13 files changed, 273 insertions(+), 62 deletions(-)
create mode 100644 include/trace/events/rwmmio.h
create mode 100644 lib/trace_readwrite.c


base-commit: 53ab78cd6d5aba25575a7cfb95729336ba9497d8
--
2.33.1


2022-04-28 10:10:13

by Sai Prakash Ranjan

[permalink] [raw]
Subject: [PATCHv11 3/6] irqchip/tegra: Fix overflow implicit truncation warnings

Fix -Woverflow warnings for tegra irqchip driver which is a result
of moving arm64 custom MMIO accessor macros to asm-generic function
implementations giving a bonus type-checking now and uncovering these
overflow warnings.

drivers/irqchip/irq-tegra.c: In function ‘tegra_ictlr_suspend’:
drivers/irqchip/irq-tegra.c:151:18: warning: large integer implicitly truncated to unsigned type [-Woverflow]
writel_relaxed(~0ul, ictlr + ICTLR_COP_IER_CLR);
^

Cc: Marc Zyngier <[email protected]>
Suggested-by: Marc Zyngier <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Sai Prakash Ranjan <[email protected]>
---
drivers/irqchip/irq-tegra.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-tegra.c b/drivers/irqchip/irq-tegra.c
index e1f771c72fc4..ad3e2c1b3c87 100644
--- a/drivers/irqchip/irq-tegra.c
+++ b/drivers/irqchip/irq-tegra.c
@@ -148,10 +148,10 @@ static int tegra_ictlr_suspend(void)
lic->cop_iep[i] = readl_relaxed(ictlr + ICTLR_COP_IEP_CLASS);

/* Disable COP interrupts */
- writel_relaxed(~0ul, ictlr + ICTLR_COP_IER_CLR);
+ writel_relaxed(GENMASK(31, 0), ictlr + ICTLR_COP_IER_CLR);

/* Disable CPU interrupts */
- writel_relaxed(~0ul, ictlr + ICTLR_CPU_IER_CLR);
+ writel_relaxed(GENMASK(31, 0), ictlr + ICTLR_CPU_IER_CLR);

/* Enable the wakeup sources of ictlr */
writel_relaxed(lic->ictlr_wake_mask[i], ictlr + ICTLR_CPU_IER_SET);
@@ -172,12 +172,12 @@ static void tegra_ictlr_resume(void)

writel_relaxed(lic->cpu_iep[i],
ictlr + ICTLR_CPU_IEP_CLASS);
- writel_relaxed(~0ul, ictlr + ICTLR_CPU_IER_CLR);
+ writel_relaxed(GENMASK(31, 0), ictlr + ICTLR_CPU_IER_CLR);
writel_relaxed(lic->cpu_ier[i],
ictlr + ICTLR_CPU_IER_SET);
writel_relaxed(lic->cop_iep[i],
ictlr + ICTLR_COP_IEP_CLASS);
- writel_relaxed(~0ul, ictlr + ICTLR_COP_IER_CLR);
+ writel_relaxed(GENMASK(31, 0), ictlr + ICTLR_COP_IER_CLR);
writel_relaxed(lic->cop_ier[i],
ictlr + ICTLR_COP_IER_SET);
}
@@ -312,7 +312,7 @@ static int __init tegra_ictlr_init(struct device_node *node,
lic->base[i] = base;

/* Disable all interrupts */
- writel_relaxed(~0UL, base + ICTLR_CPU_IER_CLR);
+ writel_relaxed(GENMASK(31, 0), base + ICTLR_CPU_IER_CLR);
/* All interrupts target IRQ */
writel_relaxed(0, base + ICTLR_CPU_IEP_CLASS);

--
2.33.1

2022-04-28 15:18:32

by Sai Prakash Ranjan

[permalink] [raw]
Subject: [PATCHv11 5/6] lib: Add register read/write tracing support

From: Prasad Sodagudi <[email protected]>

Generic MMIO read/write i.e., __raw_{read,write}{b,l,w,q} accessors
are typically used to read/write from/to memory mapped registers
and can cause hangs or some undefined behaviour in following few
cases,

* If the access to the register space is unclocked, for example: if
there is an access to multimedia(MM) block registers without MM
clocks.

* If the register space is protected and not set to be accessible from
non-secure world, for example: only EL3 (EL: Exception level) access
is allowed and any EL2/EL1 access is forbidden.

* If xPU(memory/register protection units) is controlling access to
certain memory/register space for specific clients.

and more...

Such cases usually results in instant reboot/SErrors/NOC or interconnect
hangs and tracing these register accesses can be very helpful to debug
such issues during initial development stages and also in later stages.

So use ftrace trace events to log such MMIO register accesses which
provides rich feature set such as early enablement of trace events,
filtering capability, dumping ftrace logs on console and many more.

Sample output:

rwmmio_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
rwmmio_post_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
rwmmio_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 addr=0xfffffbfffdbff610
rwmmio_post_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 val=0x0 addr=0xfffffbfffdbff610

Signed-off-by: Prasad Sodagudi <[email protected]>
Co-developed-by: Sai Prakash Ranjan <[email protected]>
Signed-off-by: Sai Prakash Ranjan <[email protected]>
---
arch/Kconfig | 3 ++
arch/arm64/Kconfig | 1 +
include/trace/events/rwmmio.h | 97 +++++++++++++++++++++++++++++++++++
lib/Kconfig | 7 +++
lib/Makefile | 2 +
lib/trace_readwrite.c | 47 +++++++++++++++++
6 files changed, 157 insertions(+)
create mode 100644 include/trace/events/rwmmio.h
create mode 100644 lib/trace_readwrite.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 678a80713b21..efbbc36658dc 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1315,6 +1315,9 @@ config ARCH_HAS_ELFCORE_COMPAT
config ARCH_HAS_PARANOID_L1D_FLUSH
bool

+config ARCH_HAVE_TRACE_MMIO_ACCESS
+ bool
+
config DYNAMIC_SIGFRAME
bool

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 09b885cc4db5..321ae97df987 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -46,6 +46,7 @@ config ARM64
select ARCH_HAS_ZONE_DMA_SET if EXPERT
select ARCH_HAVE_ELF_PROT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
+ select ARCH_HAVE_TRACE_MMIO_ACCESS
select ARCH_INLINE_READ_LOCK if !PREEMPTION
select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION
select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION
diff --git a/include/trace/events/rwmmio.h b/include/trace/events/rwmmio.h
new file mode 100644
index 000000000000..82edee9bf716
--- /dev/null
+++ b/include/trace/events/rwmmio.h
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2021-2022 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM rwmmio
+
+#if !defined(_TRACE_RWMMIO_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_RWMMIO_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(rwmmio_rw_template,
+
+ TP_PROTO(unsigned long caller, u64 val, u8 width, volatile void __iomem *addr),
+
+ TP_ARGS(caller, val, width, addr),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, caller)
+ __field(unsigned long, addr)
+ __field(u64, val)
+ __field(u8, width)
+ ),
+
+ TP_fast_assign(
+ __entry->caller = caller;
+ __entry->val = val;
+ __entry->addr = (unsigned long)(void *)addr;
+ __entry->width = width;
+ ),
+
+ TP_printk("%pS width=%d val=%#llx addr=%#lx",
+ (void *)(unsigned long)__entry->caller, __entry->width,
+ __entry->val, __entry->addr)
+);
+
+DEFINE_EVENT(rwmmio_rw_template, rwmmio_write,
+ TP_PROTO(unsigned long caller, u64 val, u8 width, volatile void __iomem *addr),
+ TP_ARGS(caller, val, width, addr)
+);
+
+DEFINE_EVENT(rwmmio_rw_template, rwmmio_post_write,
+ TP_PROTO(unsigned long caller, u64 val, u8 width, volatile void __iomem *addr),
+ TP_ARGS(caller, val, width, addr)
+);
+
+TRACE_EVENT(rwmmio_read,
+
+ TP_PROTO(unsigned long caller, u8 width, const volatile void __iomem *addr),
+
+ TP_ARGS(caller, width, addr),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, caller)
+ __field(unsigned long, addr)
+ __field(u8, width)
+ ),
+
+ TP_fast_assign(
+ __entry->caller = caller;
+ __entry->addr = (unsigned long)(void *)addr;
+ __entry->width = width;
+ ),
+
+ TP_printk("%pS width=%d addr=%#lx",
+ (void *)(unsigned long)__entry->caller, __entry->width, __entry->addr)
+);
+
+TRACE_EVENT(rwmmio_post_read,
+
+ TP_PROTO(unsigned long caller, u64 val, u8 width, const volatile void __iomem *addr),
+
+ TP_ARGS(caller, val, width, addr),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, caller)
+ __field(unsigned long, addr)
+ __field(u64, val)
+ __field(u8, width)
+ ),
+
+ TP_fast_assign(
+ __entry->caller = caller;
+ __entry->val = val;
+ __entry->addr = (unsigned long)(void *)addr;
+ __entry->width = width;
+ ),
+
+ TP_printk("%pS width=%d val=%#llx addr=%#lx",
+ (void *)(unsigned long)__entry->caller, __entry->width,
+ __entry->val, __entry->addr)
+);
+
+#endif /* _TRACE_RWMMIO_H */
+
+#include <trace/define_trace.h>
diff --git a/lib/Kconfig b/lib/Kconfig
index c80fde816a7e..ea520c315c0f 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -119,6 +119,13 @@ config INDIRECT_IOMEM_FALLBACK
mmio accesses when the IO memory address is not a registered
emulated region.

+config TRACE_MMIO_ACCESS
+ bool "Register read/write tracing"
+ depends on TRACING && ARCH_HAVE_TRACE_MMIO_ACCESS
+ help
+ Create tracepoints for MMIO read/write operations. These trace events
+ can be used for logging all MMIO read/write operations.
+
source "lib/crypto/Kconfig"

config CRC_CCITT
diff --git a/lib/Makefile b/lib/Makefile
index 300f569c626b..43813b0061cd 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -152,6 +152,8 @@ lib-y += logic_pio.o

lib-$(CONFIG_INDIRECT_IOMEM) += logic_iomem.o

+obj-$(CONFIG_TRACE_MMIO_ACCESS) += trace_readwrite.o
+
obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o

obj-$(CONFIG_BTREE) += btree.o
diff --git a/lib/trace_readwrite.c b/lib/trace_readwrite.c
new file mode 100644
index 000000000000..88637038b30c
--- /dev/null
+++ b/lib/trace_readwrite.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Register read and write tracepoints
+ *
+ * Copyright (c) 2021-2022 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <linux/ftrace.h>
+#include <linux/module.h>
+#include <asm-generic/io.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/rwmmio.h>
+
+#ifdef CONFIG_TRACE_MMIO_ACCESS
+void log_write_mmio(u64 val, u8 width, volatile void __iomem *addr,
+ unsigned long caller_addr)
+{
+ trace_rwmmio_write(caller_addr, val, width, addr);
+}
+EXPORT_SYMBOL_GPL(log_write_mmio);
+EXPORT_TRACEPOINT_SYMBOL_GPL(rwmmio_write);
+
+void log_post_write_mmio(u64 val, u8 width, volatile void __iomem *addr,
+ unsigned long caller_addr)
+{
+ trace_rwmmio_post_write(caller_addr, val, width, addr);
+}
+EXPORT_SYMBOL_GPL(log_post_write_mmio);
+EXPORT_TRACEPOINT_SYMBOL_GPL(rwmmio_post_write);
+
+void log_read_mmio(u8 width, const volatile void __iomem *addr,
+ unsigned long caller_addr)
+{
+ trace_rwmmio_read(caller_addr, width, addr);
+}
+EXPORT_SYMBOL_GPL(log_read_mmio);
+EXPORT_TRACEPOINT_SYMBOL_GPL(rwmmio_read);
+
+void log_post_read_mmio(u64 val, u8 width, const volatile void __iomem *addr,
+ unsigned long caller_addr)
+{
+ trace_rwmmio_post_read(caller_addr, val, width, addr);
+}
+EXPORT_SYMBOL_GPL(log_post_read_mmio);
+EXPORT_TRACEPOINT_SYMBOL_GPL(rwmmio_post_read);
+#endif /* CONFIG_TRACE_MMIO_ACCESS */
--
2.33.1

2022-04-28 16:47:39

by Sai Prakash Ranjan

[permalink] [raw]
Subject: [PATCHv11 1/6] arm64: io: Use asm-generic high level MMIO accessors

Remove custom arm64 MMIO accessors read{b,w,l,q} and their relaxed
versions in support to use asm-generic defined accessors. Also define
one set of IO barriers (ar/bw version) used by asm-generic code to
override the arm64 specific variants.

Suggested-by: Arnd Bergmann <[email protected]>
Signed-off-by: Sai Prakash Ranjan <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
---
arch/arm64/include/asm/io.h | 41 ++++++++-----------------------------
1 file changed, 8 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 7fd836bea7eb..1b436810d779 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -91,7 +91,7 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
}

/* IO barriers */
-#define __iormb(v) \
+#define __io_ar(v) \
({ \
unsigned long tmp; \
\
@@ -108,39 +108,14 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
: "memory"); \
})

-#define __io_par(v) __iormb(v)
-#define __iowmb() dma_wmb()
-#define __iomb() dma_mb()
-
-/*
- * Relaxed I/O memory access primitives. These follow the Device memory
- * ordering rules but do not guarantee any ordering relative to Normal memory
- * accesses.
- */
-#define readb_relaxed(c) ({ u8 __r = __raw_readb(c); __r; })
-#define readw_relaxed(c) ({ u16 __r = le16_to_cpu((__force __le16)__raw_readw(c)); __r; })
-#define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32)__raw_readl(c)); __r; })
-#define readq_relaxed(c) ({ u64 __r = le64_to_cpu((__force __le64)__raw_readq(c)); __r; })
+#define __io_bw() dma_wmb()
+#define __io_br(v)
+#define __io_aw(v)

-#define writeb_relaxed(v,c) ((void)__raw_writeb((v),(c)))
-#define writew_relaxed(v,c) ((void)__raw_writew((__force u16)cpu_to_le16(v),(c)))
-#define writel_relaxed(v,c) ((void)__raw_writel((__force u32)cpu_to_le32(v),(c)))
-#define writeq_relaxed(v,c) ((void)__raw_writeq((__force u64)cpu_to_le64(v),(c)))
-
-/*
- * I/O memory access primitives. Reads are ordered relative to any
- * following Normal memory access. Writes are ordered relative to any prior
- * Normal memory access.
- */
-#define readb(c) ({ u8 __v = readb_relaxed(c); __iormb(__v); __v; })
-#define readw(c) ({ u16 __v = readw_relaxed(c); __iormb(__v); __v; })
-#define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(__v); __v; })
-#define readq(c) ({ u64 __v = readq_relaxed(c); __iormb(__v); __v; })
-
-#define writeb(v,c) ({ __iowmb(); writeb_relaxed((v),(c)); })
-#define writew(v,c) ({ __iowmb(); writew_relaxed((v),(c)); })
-#define writel(v,c) ({ __iowmb(); writel_relaxed((v),(c)); })
-#define writeq(v,c) ({ __iowmb(); writeq_relaxed((v),(c)); })
+/* arm64-specific, don't use in portable drivers */
+#define __iormb(v) __io_ar(v)
+#define __iowmb() __io_bw()
+#define __iomb() dma_mb()

/*
* I/O port access primitives.
--
2.33.1