From: Zhen Lei <[email protected]>
v5 --> v6:
1. Use print_hex_dump() to dump the memory of slab object.
2. Add a new dump prefix DUMP_PREFIX_ADDRESS_LOW16
3. Minimize the output width of the offset
v4 --> v5:
1. Add Reviewed-by Acked-by for patch 1/3
2. Add patch 3/3:
mm: Dump the memory of slab object in kmem_dump_obj()
v3 --> v4:
1. Remove kmem_valid_obj() and convert kmem_dump_obj() to work the same way
as vmalloc_dump_obj().
2. In kernel/rcu/rcu.h
-#include <linux/mm.h>
+#include <linux/slab.h>
v2 --> v3:
1. I made statistics about the source of 'rhp'. kmem_valid_obj() accounts for
more than 97.5%, and vmalloc accounts for less than 1%. So change call
mem_dump_obj() to call kmem_dump_obj() can meet debugging requirements and
avoid the potential deadlock risk of vmalloc_dump_obj().
- mem_dump_obj(rhp);
+ if (kmem_valid_obj(rhp))
+ kmem_dump_obj(rhp);
The discussion about vmap_area_lock deadlock in v2:
https://lkml.org/lkml/2022/11/11/493
2. Provide static inline empty functions for kmem_valid_obj() and kmem_dump_obj()
when CONFIG_PRINTK=n.
v1 --> v2:
1. Remove condition "(unsigned long)rhp->func & 0x3", it have problems on x86.
2. Paul E. McKenney helped me update the commit message, thanks.
Zhen Lei (5):
hexdump: add a new dump prefix DUMP_PREFIX_ADDRESS_LOW16
hexdump: minimize the output width of the offset
mm: Remove kmem_valid_obj()
mm: Dump the memory of slab object in kmem_dump_obj()
rcu: Dump memory object info if callback function is invalid
include/linux/printk.h | 1 +
include/linux/slab.h | 5 ++--
kernel/rcu/rcu.h | 7 +++++
kernel/rcu/srcutiny.c | 1 +
kernel/rcu/srcutree.c | 1 +
kernel/rcu/tasks.h | 1 +
kernel/rcu/tiny.c | 1 +
kernel/rcu/tree.c | 1 +
lib/hexdump.c | 17 +++++++++--
mm/slab_common.c | 68 ++++++++++++++++++++++--------------------
mm/util.c | 4 +--
11 files changed, 67 insertions(+), 40 deletions(-)
--
2.34.1
From: Zhen Lei <[email protected]>
Currently, function print_hex_dump() supports three dump prefixes:
DUMP_PREFIX_NONE, DUMP_PREFIX_ADDRESS and DUMP_PREFIX_OFFSET. But for some
usage scenarios, they don't work perfectly. For example, dump the content
of one task's stack. In order to quickly identify a stack frame,
DUMP_PREFIX_ADDRESS is preferred. But printing multiple 64-bit addresses
is a bit unwise when the 'sp' value is already printed. It is redundant
and unintuitive.
For example:
dump memory at sp=ffff800080883a90:
ffff800080883a90: 80883ac0 ffff8000 3d8e936c ffffbd5b
ffff800080883aa0: 5833f000 ffff3580 00000001 00000000
ffff800080883ab0: 40299840 ffff3580 590dfa00 ffff3580
ffff800080883ac0: 80883b30 ffff8000 3d938b28 ffffbd5b
ffff800080883ad0: 40877180 ffff3580 590dfa00 ffff3580
ffff800080883ae0: 4090f600 ffff3580 80883cb0 ffff8000
ffff800080883af0: 00000010 00000000 00000000 00000000
ffff800080883b00: 4090f700 ffff3580 00000001 00000000
Generally, we do not dump more than 64 KB memory. It is sufficient to
print only the lower 16 bits of the address.
dump memory at sp=ffff800080883a90:
3a90: 80883ac0 ffff8000 3d8e936c ffffbd5b
3aa0: 5833f000 ffff3580 00000001 00000000
3ab0: 40299840 ffff3580 590dfa00 ffff3580
3ac0: 80883b30 ffff8000 3d938b28 ffffbd5b
3ad0: 40877180 ffff3580 590dfa00 ffff3580
3ae0: 4090f600 ffff3580 80883cb0 ffff8000
3af0: 00000010 00000000 00000000 00000000
3b00: 4090f700 ffff3580 00000001 00000000
Another benefit of adding DUMP_PREFIX_ADDRESS_LOW16 is that we don't have
to worry about %p outputting address as hashed value.
Signed-off-by: Zhen Lei <[email protected]>
---
include/linux/printk.h | 1 +
lib/hexdump.c | 4 ++++
2 files changed, 5 insertions(+)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 8ef499ab3c1ed2e..ccad9e8eaaf0c31 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -704,6 +704,7 @@ extern const struct file_operations kmsg_fops;
enum {
DUMP_PREFIX_NONE,
DUMP_PREFIX_ADDRESS,
+ DUMP_PREFIX_ADDRESS_LOW16,
DUMP_PREFIX_OFFSET
};
extern int hex_dump_to_buffer(const void *buf, size_t len, int rowsize,
diff --git a/lib/hexdump.c b/lib/hexdump.c
index 06833d404398d74..247c8765cc7ca3f 100644
--- a/lib/hexdump.c
+++ b/lib/hexdump.c
@@ -281,6 +281,10 @@ void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
printk("%s%s%p: %s\n",
level, prefix_str, ptr + i, linebuf);
break;
+ case DUMP_PREFIX_ADDRESS_LOW16:
+ printk("%s%s%04lx: %s\n", level,
+ prefix_str, 0xffff & (unsigned long)(ptr + i), linebuf);
+ break;
case DUMP_PREFIX_OFFSET:
printk("%s%s%.8x: %s\n", level, prefix_str, i, linebuf);
break;
--
2.34.1
From: Zhen Lei <[email protected]>
The offset of case DUMP_PREFIX_OFFSET always starts from 0. Currently,
the output width is fixed to 8. Therefore, the high-order bits filled
with zeros are meaningless except for increasing the number of characters
to be printed. Let's minimize the output width of the offset to improve
readability.
Before:
dump_size=36:
00000000: c0 ba 8c 80 00 80 ff ff 6c 93 ee 2f ee bf ff ff
00000010: 00 50 1e 98 ff 27 ff ff 01 00 00 00 00 00 00 00
00000020: 80 ca 2f 98
After:
dump_size=8:
0: c0 ba 89 80 00 80 ff ff
dump_size=36:
00: c0 3a 91 80 00 80 ff ff 6c 93 ae 76 30 ce ff ff
10: 00 60 cd 60 7d 4e ff ff 01 00 00 00 00 00 00 00
20: 40 9e 29 40
dump_size=300:
000: c0 ba 8d 80 00 80 ff ff 6c 93 ce d4 78 a7 ff ff
010: 00 00 16 18 0c 40 ff ff 01 00 00 00 00 00 00 00
020: 01 00 00 00 00 00 00 00 e8 bc 8d 80 00 80 ff ff
... ...
110: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
120: 00 08 12 01 0c 40 ff ff 00 00 01 00
Signed-off-by: Zhen Lei <[email protected]>
---
lib/hexdump.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/lib/hexdump.c b/lib/hexdump.c
index 247c8765cc7ca3f..d3c5b7bb1b8813b 100644
--- a/lib/hexdump.c
+++ b/lib/hexdump.c
@@ -263,12 +263,21 @@ void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
const void *buf, size_t len, bool ascii)
{
const u8 *ptr = buf;
- int i, linelen, remaining = len;
+ int i, linelen, width = 0, remaining = len;
unsigned char linebuf[32 * 3 + 2 + 32 + 1];
if (rowsize != 16 && rowsize != 32)
rowsize = 16;
+ if (prefix_type == DUMP_PREFIX_OFFSET) {
+ unsigned long tmp = len;
+
+ do {
+ width++;
+ tmp >>= 4;
+ } while (tmp);
+ }
+
for (i = 0; i < len; i += rowsize) {
linelen = min(remaining, rowsize);
remaining -= rowsize;
@@ -286,7 +295,7 @@ void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
prefix_str, 0xffff & (unsigned long)(ptr + i), linebuf);
break;
case DUMP_PREFIX_OFFSET:
- printk("%s%s%.8x: %s\n", level, prefix_str, i, linebuf);
+ printk("%s%s%0*x: %s\n", level, prefix_str, width, i, linebuf);
break;
default:
printk("%s%s%s\n", level, prefix_str, linebuf);
--
2.34.1
From: Zhen Lei <[email protected]>
When a structure containing an RCU callback rhp is (incorrectly) freed
and reallocated after rhp is passed to call_rcu(), it is not unusual for
rhp->func to be set to NULL. This defeats the debugging prints used by
__call_rcu_common() in kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y,
which expect to identify the offending code using the identity of this
function.
And in kernels build without CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, things
are even worse, as can be seen from this splat:
Unable to handle kernel NULL pointer dereference at virtual address 0
... ...
PC is at 0x0
LR is at rcu_do_batch+0x1c0/0x3b8
... ...
(rcu_do_batch) from (rcu_core+0x1d4/0x284)
(rcu_core) from (__do_softirq+0x24c/0x344)
(__do_softirq) from (__irq_exit_rcu+0x64/0x108)
(__irq_exit_rcu) from (irq_exit+0x8/0x10)
(irq_exit) from (__handle_domain_irq+0x74/0x9c)
(__handle_domain_irq) from (gic_handle_irq+0x8c/0x98)
(gic_handle_irq) from (__irq_svc+0x5c/0x94)
(__irq_svc) from (arch_cpu_idle+0x20/0x3c)
(arch_cpu_idle) from (default_idle_call+0x4c/0x78)
(default_idle_call) from (do_idle+0xf8/0x150)
(do_idle) from (cpu_startup_entry+0x18/0x20)
(cpu_startup_entry) from (0xc01530)
This commit therefore adds calls to mem_dump_obj(rhp) to output some
information, for example:
slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256
This provides the rough size of the memory block and the offset of the
rcu_head structure, which as least provides at least a few clues to help
locate the problem. If the problem is reproducible, additional slab
debugging can be enabled, for example, CONFIG_DEBUG_SLAB=y, which can
provide significantly more information.
Signed-off-by: Zhen Lei <[email protected]>
---
kernel/rcu/rcu.h | 7 +++++++
kernel/rcu/srcutiny.c | 1 +
kernel/rcu/srcutree.c | 1 +
kernel/rcu/tasks.h | 1 +
kernel/rcu/tiny.c | 1 +
kernel/rcu/tree.c | 1 +
6 files changed, 12 insertions(+)
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index d1dcb09750efbd6..bc81582238b9846 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -10,6 +10,7 @@
#ifndef __LINUX_RCU_H
#define __LINUX_RCU_H
+#include <linux/slab.h>
#include <trace/events/rcu.h>
/*
@@ -248,6 +249,12 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
}
#endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
+static inline void debug_rcu_head_callback(struct rcu_head *rhp)
+{
+ if (unlikely(!rhp->func))
+ kmem_dump_obj(rhp);
+}
+
extern int rcu_cpu_stall_suppress_at_boot;
static inline bool rcu_stall_is_suppressed_at_boot(void)
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 336af24e0fe358a..c38e5933a5d6937 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -138,6 +138,7 @@ void srcu_drive_gp(struct work_struct *wp)
while (lh) {
rhp = lh;
lh = lh->next;
+ debug_rcu_head_callback(rhp);
local_bh_disable();
rhp->func(rhp);
local_bh_enable();
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index f1a905200fc2f79..833a8f848a90ae6 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1710,6 +1710,7 @@ static void srcu_invoke_callbacks(struct work_struct *work)
rhp = rcu_cblist_dequeue(&ready_cbs);
for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
debug_rcu_head_unqueue(rhp);
+ debug_rcu_head_callback(rhp);
local_bh_disable();
rhp->func(rhp);
local_bh_enable();
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 7294be62727b12c..148ac6a464bfb12 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -538,6 +538,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
len = rcl.len;
for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) {
+ debug_rcu_head_callback(rhp);
local_bh_disable();
rhp->func(rhp);
local_bh_enable();
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index 42f7589e51e09e7..fec804b7908032d 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -97,6 +97,7 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
trace_rcu_invoke_callback("", head);
f = head->func;
+ debug_rcu_head_callback(head);
WRITE_ONCE(head->func, (rcu_callback_t)0L);
f(head);
rcu_lock_release(&rcu_callback_map);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7c79480bfaa04e4..927c5ba0ae42269 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2135,6 +2135,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
trace_rcu_invoke_callback(rcu_state.name, rhp);
f = rhp->func;
+ debug_rcu_head_callback(rhp);
WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
f(rhp);
--
2.34.1
From: Zhen Lei <[email protected]>
The contents of the slab object may contain some magic words and other
useful information that may be helpful in locating problems such as
memory corruption and use-after-free.
To avoid print flooding, dump up to "16 * sizeof(int) = 64" bytes
centered on argument 'ojbect'.
For example:
slab kmalloc-64 start ffff4043802d8b40 pointer offset 24 size 64
[8b40]: 12345678 00000000 8092d000 ffff8000
[8b50]: 00101000 00000000 8199ee00 ffff4043
[8b60]: 00000000 00000000 00000000 00000100
[8b70]: 00000000 9abcdef0 a8744de4 ffffc7fe
Signed-off-by: Zhen Lei <[email protected]>
---
mm/slab_common.c | 27 ++++++++++++++++++++++++---
1 file changed, 24 insertions(+), 3 deletions(-)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index ee6ed6dd7ba9fa5..3564e60a2584b12 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -553,7 +553,7 @@ static void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *
bool kmem_dump_obj(void *object)
{
char *cp = IS_ENABLED(CONFIG_MMU) ? "" : "/vmalloc";
- int i;
+ int i, object_size = 0;
struct slab *slab;
unsigned long ptroffset;
struct kmem_obj_info kp = { };
@@ -580,12 +580,33 @@ bool kmem_dump_obj(void *object)
ptroffset = ((char *)object - (char *)kp.kp_objp) - kp.kp_data_offset;
pr_cont(" pointer offset %lu", ptroffset);
}
- if (kp.kp_slab_cache && kp.kp_slab_cache->object_size)
- pr_cont(" size %u", kp.kp_slab_cache->object_size);
+ if (kp.kp_slab_cache && kp.kp_slab_cache->object_size) {
+ object_size = kp.kp_slab_cache->object_size;
+ pr_cont(" size %u", object_size);
+ }
if (kp.kp_ret)
pr_cont(" allocated at %pS\n", kp.kp_ret);
else
pr_cont("\n");
+
+ /* Dump a small piece of memory centered on 'object' */
+ if (kp.kp_objp && object_size) {
+ void *p = object;
+ int dump_size = 64;
+
+ p += dump_size / 2;
+ if (p > kp.kp_objp + object_size)
+ p = kp.kp_objp + object_size;
+
+ p -= dump_size;
+ if (p < kp.kp_objp)
+ p = kp.kp_objp;
+
+ dump_size = min(object_size, dump_size);
+ print_hex_dump(KERN_INFO, "",
+ DUMP_PREFIX_ADDRESS_LOW16, 16, 4, p, dump_size, false);
+ }
+
for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
if (!kp.kp_stack[i])
break;
--
2.34.1
On Fri, Aug 04, 2023 at 05:11:30PM +0800, [email protected] wrote:
> From: Zhen Lei <[email protected]>
>
> v5 --> v6:
> 1. Use print_hex_dump() to dump the memory of slab object.
> 2. Add a new dump prefix DUMP_PREFIX_ADDRESS_LOW16
> 3. Minimize the output width of the offset
>
> v4 --> v5:
> 1. Add Reviewed-by Acked-by for patch 1/3
> 2. Add patch 3/3:
> mm: Dump the memory of slab object in kmem_dump_obj()
>
> v3 --> v4:
> 1. Remove kmem_valid_obj() and convert kmem_dump_obj() to work the same way
> as vmalloc_dump_obj().
> 2. In kernel/rcu/rcu.h
> -#include <linux/mm.h>
> +#include <linux/slab.h>
>
> v2 --> v3:
> 1. I made statistics about the source of 'rhp'. kmem_valid_obj() accounts for
> more than 97.5%, and vmalloc accounts for less than 1%. So change call
> mem_dump_obj() to call kmem_dump_obj() can meet debugging requirements and
> avoid the potential deadlock risk of vmalloc_dump_obj().
> - mem_dump_obj(rhp);
> + if (kmem_valid_obj(rhp))
> + kmem_dump_obj(rhp);
>
> The discussion about vmap_area_lock deadlock in v2:
> https://lkml.org/lkml/2022/11/11/493
>
> 2. Provide static inline empty functions for kmem_valid_obj() and kmem_dump_obj()
> when CONFIG_PRINTK=n.
>
> v1 --> v2:
> 1. Remove condition "(unsigned long)rhp->func & 0x3", it have problems on x86.
> 2. Paul E. McKenney helped me update the commit message, thanks.
I would be happy to take the patch that Matthew and Vlastimil are happy
with, and also the one against RCU. But unless you tell me otherwise,
I will assume that you would prefer me to wait until the entire series
is ready. The best way to tell me otherwise is of course to resend just
those two patches in their own series. ;-)
Thanx, Paul
> Zhen Lei (5):
> hexdump: add a new dump prefix DUMP_PREFIX_ADDRESS_LOW16
> hexdump: minimize the output width of the offset
> mm: Remove kmem_valid_obj()
> mm: Dump the memory of slab object in kmem_dump_obj()
> rcu: Dump memory object info if callback function is invalid
>
> include/linux/printk.h | 1 +
> include/linux/slab.h | 5 ++--
> kernel/rcu/rcu.h | 7 +++++
> kernel/rcu/srcutiny.c | 1 +
> kernel/rcu/srcutree.c | 1 +
> kernel/rcu/tasks.h | 1 +
> kernel/rcu/tiny.c | 1 +
> kernel/rcu/tree.c | 1 +
> lib/hexdump.c | 17 +++++++++--
> mm/slab_common.c | 68 ++++++++++++++++++++++--------------------
> mm/util.c | 4 +--
> 11 files changed, 67 insertions(+), 40 deletions(-)
>
> --
> 2.34.1
>
On 2023/8/5 1:31, Paul E. McKenney wrote:
> On Fri, Aug 04, 2023 at 05:11:30PM +0800, [email protected] wrote:
>> From: Zhen Lei <[email protected]>
>>
>> v5 --> v6:
>> 1. Use print_hex_dump() to dump the memory of slab object.
>> 2. Add a new dump prefix DUMP_PREFIX_ADDRESS_LOW16
>> 3. Minimize the output width of the offset
>>
>> v4 --> v5:
>> 1. Add Reviewed-by Acked-by for patch 1/3
>> 2. Add patch 3/3:
>> mm: Dump the memory of slab object in kmem_dump_obj()
>>
>> v3 --> v4:
>> 1. Remove kmem_valid_obj() and convert kmem_dump_obj() to work the same way
>> as vmalloc_dump_obj().
>> 2. In kernel/rcu/rcu.h
>> -#include <linux/mm.h>
>> +#include <linux/slab.h>
>>
>> v2 --> v3:
>> 1. I made statistics about the source of 'rhp'. kmem_valid_obj() accounts for
>> more than 97.5%, and vmalloc accounts for less than 1%. So change call
>> mem_dump_obj() to call kmem_dump_obj() can meet debugging requirements and
>> avoid the potential deadlock risk of vmalloc_dump_obj().
>> - mem_dump_obj(rhp);
>> + if (kmem_valid_obj(rhp))
>> + kmem_dump_obj(rhp);
>>
>> The discussion about vmap_area_lock deadlock in v2:
>> https://lkml.org/lkml/2022/11/11/493
>>
>> 2. Provide static inline empty functions for kmem_valid_obj() and kmem_dump_obj()
>> when CONFIG_PRINTK=n.
>>
>> v1 --> v2:
>> 1. Remove condition "(unsigned long)rhp->func & 0x3", it have problems on x86.
>> 2. Paul E. McKenney helped me update the commit message, thanks.
>
> I would be happy to take the patch that Matthew and Vlastimil are happy
> with, and also the one against RCU. But unless you tell me otherwise,
> I will assume that you would prefer me to wait until the entire series
> is ready. The best way to tell me otherwise is of course to resend just
> those two patches in their own series. ;-)
Yes, I also feel this snowball rolling bigger and bigger. Let me resend the two
RCU-related patches that we've discussed OK.
>
> Thanx, Paul
>
>> Zhen Lei (5):
>> hexdump: add a new dump prefix DUMP_PREFIX_ADDRESS_LOW16
>> hexdump: minimize the output width of the offset
>> mm: Remove kmem_valid_obj()
>> mm: Dump the memory of slab object in kmem_dump_obj()
>> rcu: Dump memory object info if callback function is invalid
>>
>> include/linux/printk.h | 1 +
>> include/linux/slab.h | 5 ++--
>> kernel/rcu/rcu.h | 7 +++++
>> kernel/rcu/srcutiny.c | 1 +
>> kernel/rcu/srcutree.c | 1 +
>> kernel/rcu/tasks.h | 1 +
>> kernel/rcu/tiny.c | 1 +
>> kernel/rcu/tree.c | 1 +
>> lib/hexdump.c | 17 +++++++++--
>> mm/slab_common.c | 68 ++++++++++++++++++++++--------------------
>> mm/util.c | 4 +--
>> 11 files changed, 67 insertions(+), 40 deletions(-)
>>
>> --
>> 2.34.1
>>
> .
>
--
Regards,
Zhen Lei