2023-11-16 12:31:05

by Naresh Kamboju

[permalink] [raw]
Subject: selftests: ftrace: WARNING: __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))

Following kernel crash noticed while running selftests: ftrace on arm64 Juno-r2
device running stable-rc linux-6.6.y kernel.

This kernel crash is hard to reproduce.

Reported-by: Linux Kernel Functional Testing <[email protected]>

kselftest: Running tests in ftrace
TAP version 13
1..1
# timeout set to 0
# selftests: ftrace: ftracetest-ktap
# unlink: cannot unlink '/opt/kselftests/default-in-kernel/ftrace/logs/latest': No such file or directory
# TAP version 13
# 1..130
# ok 1 Basic trace file check
...
# ok 44 ftrace - test for function traceon/off triggers
# ok 45 ftrace - test tracing error log support
[ 617.613901] ------------[ cut here ]------------
[ 617.618870] list_del corruption. prev->next should be ffff000837184758, but was ffff000837184638. (prev=ffff0008359f82e8)
[ 617.629894] WARNING: CPU: 1 PID: 13512 at lib/list_debug.c:62 __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))
[ 617.639496] Modules linked in: onboard_usb_hub hdlcd tda998x crct10dif_ce drm_dma_helper cec drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [last unloaded: trace_printk]
[ 617.655832] CPU: 1 PID: 13512 Comm: ls Not tainted 6.6.2-rc1 #1
[ 617.661765] Hardware name: ARM Juno development board (r2) (DT)
[ 617.667692] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 617.674668] pc : __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))
[ 617.680339] lr : __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))
[ 617.686009] sp : ffff80008507b910
[ 617.689325] x29: ffff80008507b910 x28: 0000000000000003 x27: 0000000000000001
[ 617.696482] x26: ffff000834694788 x25: ffff000835a39c88 x24: 0000000000000000
[ 617.703637] x23: 0000000000000000 x22: ffff0008359f8248 x21: ffff000837184720
[ 617.710792] x20: ffff0008359f8248 x19: ffff000837184758 x18: 0000000000000010
[ 617.717948] x17: 20747562202c3835 x16: 3734383137333830 x15: 3030666666662065
[ 617.725102] x14: 6220646c756f6873 x13: 2938653238663935 x12: 3338303030666666
[ 617.732257] x11: 663d766572702820 x10: 2e38333634383137 x9 : ffff800080145634
[ 617.739411] x8 : ffff80008507b638 x7 : 0000000000017fe8 x6 : 0000000000057fa8
[ 617.746566] x5 : 0000000000000fff x4 : ffff00097ed27d50 x3 : ffff8008fc5d3000
[ 617.753720] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0008010722c0
[ 617.760874] Call trace:
[ 617.763320] __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))
[ 617.768643] __dentry_kill (include/linux/list.h:215 (discriminator 1) fs/dcache.c:550 (discriminator 1) fs/dcache.c:603 (discriminator 1))
[ 617.772311] dput (fs/dcache.c:896)
[ 617.775281] create_dentry (fs/tracefs/event_inode.c:404)
[ 617.779040] dcache_dir_open_wrapper (fs/tracefs/event_inode.c:536)
[ 617.783580] do_dentry_open (fs/open.c:929)
[ 617.787424] vfs_open (fs/open.c:1064)
[ 617.790572] path_openat (fs/namei.c:3640 fs/namei.c:3797)
[ 617.794153] do_filp_open (fs/namei.c:3824)
[ 617.797733] do_sys_openat2 (fs/open.c:1422)
[ 617.801490] __arm64_sys_openat (fs/open.c:1448)
[ 617.805508] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:56)
[ 617.809267] el0_svc_common.constprop.0 (include/linux/thread_info.h:127 (discriminator 2) arch/arm64/kernel/syscall.c:144 (discriminator 2))
[ 617.814069] do_el0_svc (arch/arm64/kernel/syscall.c:156)
[ 617.817391] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:679)
[ 617.820454] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
[ 617.824820] el0t_64_sync (arch/arm64/kernel/entry.S:595)
[ 617.828488] ---[ end trace 0000000000000000 ]---
[ 617.833214] ------------[ cut here ]------------
[ 617.837840] WARNING: CPU: 2 PID: 13523 at fs/dcache.c:365 dentry_free (fs/dcache.c:365 (discriminator 1))
[ 617.845173] Modules linked in: onboard_usb_hub hdlcd tda998x crct10dif_ce drm_dma_helper cec drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [last unloaded: trace_printk]
[ 617.861503] CPU: 2 PID: 13523 Comm: rmdir Tainted: G W 6.6.2-rc1 #1
[ 617.869175] Hardware name: ARM Juno development board (r2) (DT)
[ 617.875102] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 617.882077] pc : dentry_free (fs/dcache.c:365 (discriminator 1))
[ 617.885835] lr : __dentry_kill (fs/dcache.c:623)
[ 617.889937] sp : ffff800085123a30
[ 617.893252] x29: ffff800085123a30 x28: ffff0008371845a8 x27: ffff0008371845a8
[ 617.900409] x26: ffff000837184600 x25: ffff000837184638 x24: ffff0008359f82a0
[ 617.907565] x23: ffff000835a39d28 x22: ffff0008359f8248 x21: ffff000837184720
[ 617.914719] x20: ffff0008359f8248 x19: ffff0008371846c8 x18: 0000000000000000
[ 617.921874] x17: 000000040044ffff x16: 00500074b5503510 x15: 0000000000000000
[ 617.929029] x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101
[ 617.936184] x11: 7f7f7f7f7f7f7f7f x10: 0000000000000c00 x9 : ffff80008042bce0
[ 617.943339] x8 : ffff800085123968 x7 : 0000000000000000 x6 : 00000000000001fb
[ 617.950493] x5 : 0000000000000001 x4 : ffff80008328f000 x3 : ffff80008328f2e8
[ 617.957647] x2 : 0000000000000002 x1 : ffff80008042bcd8 x0 : ffff800080428e90
[ 617.964801] Call trace:
[ 617.967246] dentry_free (fs/dcache.c:365 (discriminator 1))
[ 617.970656] __dentry_kill (fs/dcache.c:623)
[ 617.974409] dput (fs/dcache.c:896)
[ 617.977380] simple_recursive_removal (include/linux/fs.h:807 fs/libfs.c:552)
[ 617.982008] eventfs_remove (fs/tracefs/event_inode.c:946)
[ 617.985679] remove_event_file_dir (include/linux/list.h:124 include/linux/list.h:215 include/linux/list.h:229 kernel/trace/trace_events.c:1017)
[ 617.989958] event_trace_del_tracer (kernel/trace/trace_events.c:3620 (discriminator 3) kernel/trace/trace_events.c:3772 (discriminator 3))
[ 617.994324] __remove_instance (kernel/trace/trace.c:9682 kernel/trace/trace.c:9663)
[ 617.998344] instance_rmdir (kernel/trace/trace.c:9741)
[ 618.002010] tracefs_syscall_rmdir (fs/tracefs/inode.c:134)
[ 618.006289] vfs_rmdir (fs/namei.c:4204 fs/namei.c:4181)
[ 618.009608] do_rmdir (fs/namei.c:4263 (discriminator 1))
[ 618.012928] __arm64_sys_unlinkat (fs/namei.c:4441 (discriminator 2) fs/namei.c:4435 (discriminator 2) fs/namei.c:4435 (discriminator 2))
[ 618.017118] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:56)
[ 618.020876] el0_svc_common.constprop.0 (include/linux/thread_info.h:127 (discriminator 2) arch/arm64/kernel/syscall.c:144 (discriminator 2))
[ 618.025678] do_el0_svc (arch/arm64/kernel/syscall.c:156)
[ 618.029000] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:679)
[ 618.032060] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
[ 618.036427] el0t_64_sync (arch/arm64/kernel/entry.S:595)
[ 618.040094] ---[ end trace 0000000000000000 ]---
[ 618.071010] Unable to handle kernel execute from non-executable memory at virtual address ffff000835b0b758
[ 618.080715] Mem abort info:
[ 618.083536] ESR = 0x000000008600000f
[ 618.087306] EC = 0x21: IABT (current EL), IL = 32 bits
[ 618.092646] SET = 0, FnV = 0
[ 618.095717] EA = 0, S1PTW = 0
[ 618.098869] FSC = 0x0f: level 3 permission fault
[ 618.103680] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000826a6000
[ 618.110405] [ffff000835b0b758] pgd=18000009ffdf8003, p4d=18000009ffdf8003, pud=18000009ffa0e003, pmd=18000009ff860003, pte=00680008b5b0b707
[ 618.123000] Internal error: Oops: 000000008600000f [#1] PREEMPT SMP
[ 618.129287] Modules linked in: onboard_usb_hub hdlcd tda998x crct10dif_ce drm_dma_helper cec drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [last unloaded: trace_printk]
[ 618.145624] CPU: 2 PID: 13523 Comm: rmdir Tainted: G W 6.6.2-rc1 #1
[ 618.153298] Hardware name: ARM Juno development board (r2) (DT)
[ 618.159225] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 618.166203] pc : 0xffff000835b0b758
[ 618.169703] lr : rcu_core (arch/arm64/include/asm/preempt.h:13 (discriminator 1) kernel/rcu/tree.c:2146 (discriminator 1) kernel/rcu/tree.c:2403 (discriminator 1))
[ 618.173377] sp : ffff80008394be60
[ 618.176692] x29: ffff80008394be60 x28: ffff0008278e3400 x27: 000000000000000a
[ 618.183848] x26: 0000000000000000 x25: 0000000000000000 x24: ffff80008394bf10
[ 618.191003] x23: ffff800082752008 x22: ffff80008328ea80 x21: ffff80008332f700
[ 618.198157] x20: ffff00097ed59440 x19: 0000000000000001 x18: 0000000000000000
[ 618.205311] x17: ffff8008fc5f2000 x16: ffff800083948000 x15: 0000000000000000
[ 618.212468] x14: 0000000000000000 x13: 0000000000000030 x12: 00000000000163f1
[ 618.219627] x11: 0000000000000001 x10: 000000010001369b x9 : ffff800080174e34
[ 618.226782] x8 : ffff80008394bd98 x7 : 0000000000000000 x6 : 0000000000000100
[ 618.233939] x5 : 0000000000000100 x4 : ffff80008328f000 x3 : ffff80008328f2e8
[ 618.241100] x2 : 0000000000000000 x1 : ffff000835b0b758 x0 : ffff000837184778
[ 618.248257] Call trace:
[ 618.250707] 0xffff000835b0b758
[ 618.253851] rcu_core_si (kernel/rcu/tree.c:2421)
[ 618.257261] __do_softirq (arch/arm64/include/asm/jump_label.h:21 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:554)
[ 618.260929] ____do_softirq (arch/arm64/kernel/irq.c:81)
[ 618.264597] call_on_irq_stack (arch/arm64/kernel/entry.S:892)
[ 618.268527] do_softirq_own_stack (arch/arm64/kernel/irq.c:86)
[ 618.272717] irq_exit_rcu (arch/arm64/include/asm/percpu.h:44 kernel/softirq.c:612 kernel/softirq.c:634 kernel/softirq.c:644)
[ 618.276299] el1_interrupt (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/entry-common.c:246 arch/arm64/kernel/entry-common.c:505 arch/arm64/kernel/entry-common.c:517)
[ 618.279883] el1h_64_irq_handler (arch/arm64/kernel/entry-common.c:523)
[ 618.283989] el1h_64_irq (arch/arm64/kernel/entry.S:591)
[ 618.287394] __srcu_read_unlock (kernel/rcu/srcutree.c:730)
[ 618.291409] srcu_gp_start_if_needed (kernel/rcu/srcutree.c:1298)
[ 618.296034] call_srcu (kernel/rcu/srcutree.c:1362)
[ 618.299266] eventfs_remove_rec (fs/tracefs/event_inode.c:921)
[ 618.303286] eventfs_remove_rec (fs/tracefs/event_inode.c:909 (discriminator 5))
[ 618.307303] eventfs_remove (fs/tracefs/event_inode.c:939)
[ 618.310973] remove_event_file_dir (include/linux/list.h:124 include/linux/list.h:215 include/linux/list.h:229 kernel/trace/trace_events.c:1017)
[ 618.315252] event_trace_del_tracer (kernel/trace/trace_events.c:3620 (discriminator 3) kernel/trace/trace_events.c:3772 (discriminator 3))
[ 618.319618] __remove_instance (kernel/trace/trace.c:9682 kernel/trace/trace.c:9663)
[ 618.323637] instance_rmdir (kernel/trace/trace.c:9741)
[ 618.327304] tracefs_syscall_rmdir (fs/tracefs/inode.c:134)
[ 618.331582] vfs_rmdir (fs/namei.c:4204 fs/namei.c:4181)
[ 618.334903] do_rmdir (fs/namei.c:4263 (discriminator 1))
[ 618.338223] __arm64_sys_unlinkat (fs/namei.c:4441 (discriminator 2) fs/namei.c:4435 (discriminator 2) fs/namei.c:4435 (discriminator 2))
[ 618.342413] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:56)
[ 618.346171] el0_svc_common.constprop.0 (include/linux/thread_info.h:127 (discriminator 2) arch/arm64/kernel/syscall.c:144 (discriminator 2))
[ 618.350973] do_el0_svc (arch/arm64/kernel/syscall.c:156)
[ 618.354294] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:679)
[ 618.357355] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
[ 618.361721] el0t_64_sync (arch/arm64/kernel/entry.S:595)
[ 618.365391] Code: 35b0b748 ffff0008 35b0b748 ffff0008 (37184778)
All code
========
0: 35b0b748 cbnz w8, 0xfffffffffff616e8
4: ffff0008 .inst 0xffff0008 ; undefined
8: 35b0b748 cbnz w8, 0xfffffffffff616f0
c: ffff0008 .inst 0xffff0008 ; undefined
10:* 37184778 tbnz w24, #3, 0x8fc <-- trapping instruction

Code starting with the faulting instruction
===========================================
0: 37184778 tbnz w24, #3, 0x8ec
[ 618.371494] ---[ end trace 0000000000000000 ]---
[ 618.376117] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[ 618.383001] SMP: stopping secondary CPUs
[ 618.387117] Kernel Offset: disabled
[ 618.390607] CPU features: 0x1,0000020c,3c020000,0000421b
[ 618.395927] Memory Limit: none
[ 618.398986] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---


Links:
- https://lkft.validation.linaro.org/scheduler/job/7030448#L4546

metadata:
git_ref: linux-6.6.y
git_repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
git_sha: df34d612fd4ef266814366c8101094b7f83b6a92
git_describe: v6.6.1-604-gdf34d612fd4e
kernel_version: 6.6.2-rc1
kernel-config:
https://storage.tuxsuite.com/public/linaro/lkft/builds/2YE2AiHE0lU8S30LM8h9i7wempi/config
artifact-location:
https://storage.tuxsuite.com/public/linaro/lkft/builds/2YE2AiHE0lU8S30LM8h9i7wempi/
toolchain: gcc-13


--
Linaro LKFT
https://lkft.linaro.org


2023-11-20 20:36:31

by Steven Rostedt

[permalink] [raw]
Subject: Re: selftests: ftrace: WARNING: __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))

On Thu, 16 Nov 2023 18:00:16 +0530
Naresh Kamboju <[email protected]> wrote:

> Following kernel crash noticed while running selftests: ftrace on arm64 Juno-r2
> device running stable-rc linux-6.6.y kernel.
>
> This kernel crash is hard to reproduce.
>

Can you test this patch.

Note, there's a similar bug on 6.7-rc1 which I'll fix first. And when
that's accepted, I'll push this one for v6.6. This may be two patches as
one if the d_invalidate() issue, and another is a memory leak fix.

-- Steve

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 5fcfb634fec2..b60048469df1 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -289,6 +289,8 @@ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry)
ef = dentry->d_fsdata;
if (ef)
free_ef(ef);
+ else
+ kfree(ei);
return;
}

@@ -342,7 +344,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
static struct dentry *
create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
{
- bool invalidate = false;
struct dentry *dentry;

mutex_lock(&eventfs_mutex);
@@ -387,23 +388,24 @@ create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
eventfs_post_create_dir(ef);
dentry->d_fsdata = ef;
} else {
- /* A race here, should try again (unless freed) */
- invalidate = true;
-
/*
+ * If we are here then the directory is being freed.
+ * The simple_recursive_removal() will get rid of the dentry
+ * here.
+ */
+ dentry = NULL;
+ /*
* Should never happen unless we get here due to being freed.
* Otherwise it means two dentries exist with the same name.
*/
WARN_ON_ONCE(!ef->is_freed);
}
mutex_unlock(&eventfs_mutex);
- if (invalidate)
- d_invalidate(dentry);

- if (lookup || invalidate)
+ if (lookup)
dput(dentry);

- return invalidate ? NULL : dentry;
+ return dentry;
}

static bool match_event_file(struct eventfs_file *ef, const char *name)
--
2.42.0

2023-11-22 14:20:03

by Naresh Kamboju

[permalink] [raw]
Subject: Re: selftests: ftrace: WARNING: __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))

Hi Steven,



On Tue, 21 Nov 2023 at 02:06, Steven Rostedt <[email protected]> wrote:
>
> On Thu, 16 Nov 2023 18:00:16 +0530
> Naresh Kamboju <[email protected]> wrote:
>
> > Following kernel crash noticed while running selftests: ftrace on arm64 Juno-r2
> > device running stable-rc linux-6.6.y kernel.
> >
> > This kernel crash is hard to reproduce.
> >
>
> Can you test this patch.

I have applied your patch on top of Linux stable-rc linux-6.6.y and tested
selftests: ftrace: for 100 iterations on arm64 Juno-r2 device.

Out of 100 test runs I see the following exception only once.
I do not see crashes related to the previous report.


Test log:
----
# ok 8 trace_pipe and trace_marker
...
[ 282.726999] Unexpected kernel BRK exception at EL1
[ 282.731840] Internal error: BRK handler: 00000000f20003e8 [#1] PREEMPT SMP
[ 282.738752] Modules linked in: ftrace_direct ftrace_direct_too
hdlcd tda998x onboard_usb_hub drm_dma_helper cec crct10dif_ce
drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [last
unloaded: ftrace_direct]
[ 282.758152] CPU: 0 PID: 987 Comm: rmmod Not tainted 6.6.2-rc1 #1
[ 282.764191] Hardware name: ARM Juno development board (r2) (DT)
[ 282.770138] pstate: a00000c5 (NzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 282.777133] pc : ring_buffer_lock_reserve
(kernel/trace/ring_buffer.c:3304 (discriminator 1)
kernel/trace/ring_buffer.c:3819 (discriminator 1))
[ 282.782230] lr : trace_function (kernel/trace/trace.c:992
kernel/trace/trace.c:3078)
[ 282.786360] sp : ffff80008428ba20
[ 282.789695] x29: ffff80008428ba20 x28: ffff00082bb367c0 x27: 0000000000000000
[ 282.796898] x26: ffff800080371d84 x25: 0000000000000000 x24: ffff00097ed1e4b0
[ 282.804100] x23: ffff000800018400 x22: 0000000000000000 x21: 0000000000000018
[ 282.811302] x20: ffff000800018400 x19: ffff000800041400 x18: ffffffffffffffff
[ 282.818504] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800081d2ec58
[ 282.825706] x14: ffffffffffffffff x13: ffff80008428beb0 x12: 000002550000ffff
[ 282.832908] x11: ffff800083343078 x10: ffff80008428bbc0 x9 : ffff8000803e3110
[ 282.840110] x8 : ffff80008428ba98 x7 : 0000000000000000 x6 : 0000000000000009
[ 282.847311] x5 : 0000000000000081 x4 : fffffffffffffffb x3 : 0000000000000001
[ 282.854513] x2 : ffffffffffffffff x1 : 0000000000000000 x0 : 0000000000000000
[ 282.861715] Call trace:
[ 282.864179] ring_buffer_lock_reserve
(kernel/trace/ring_buffer.c:3304 (discriminator 1)
kernel/trace/ring_buffer.c:3819 (discriminator 1))
[ 282.868920] trace_function (kernel/trace/trace.c:992
kernel/trace/trace.c:3078)
[ 282.872702] tracer_hardirqs_off (kernel/trace/trace_irqsoff.c:279
kernel/trace/trace_irqsoff.c:397 kernel/trace/trace_irqsoff.c:619
kernel/trace/trace_irqsoff.c:616)
[ 282.877009] trace_hardirqs_off.part.0
(kernel/trace/trace_preemptirq.c:78 (discriminator 1))
[ 282.881748] trace_hardirqs_off (kernel/trace/trace_preemptirq.c:94)
[ 282.885791] obj_cgroup_charge (mm/memcontrol.c:3236 (discriminator
1) mm/memcontrol.c:3364 (discriminator 1))
[ 282.889918] kmem_cache_alloc (mm/slab.h:503 (discriminator 2)
mm/slab.h:714 (discriminator 2) mm/slub.c:3460 (discriminator 2)
mm/slub.c:3486 (discriminator 2) mm/slub.c:3493 (discriminator 2)
mm/slub.c:3502 (discriminator 2))
[ 282.893873] __anon_vma_prepare (mm/rmap.c:197)
[ 282.897999] __handle_mm_fault (mm/memory.c:4111 (discriminator 2)
mm/memory.c:3670 (discriminator 2) mm/memory.c:4981 (discriminator 2)
mm/memory.c:5122 (discriminator 2))
[ 282.902213] handle_mm_fault (mm/memory.c:5287)
[ 282.906077] do_page_fault (arch/arm64/mm/fault.c:513
arch/arm64/mm/fault.c:626)
[ 282.909856] do_translation_fault (arch/arm64/mm/fault.c:714)
[ 282.914068] do_mem_abort (arch/arm64/mm/fault.c:846 (discriminator 1))
[ 282.917591] el0_da (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:144
arch/arm64/kernel/entry-common.c:547)
[ 282.920590] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:700)
[ 282.924894] el0t_64_sync (arch/arm64/kernel/entry.S:595)
[ 282.928593] Code: 88037c22 35ffffa3 17fffff8 94549eb9 (d4207d00)
All code
========
0: 88037c22 stxr w3, w2, [x1]
4: 35ffffa3 cbnz w3, 0xfffffffffffffff8
8: 17fffff8 b 0xffffffffffffffe8
c: 94549eb9 bl 0x1527af0
10:* d4207d00 brk #0x3e8 <-- trapping instruction

Code starting with the faulting instruction
===========================================
0: d4207d00 brk #0x3e8
[ 282.934716] ---[ end trace 0000000000000000 ]---
[ 282.939358] note: rmmod[987] exited with irqs disabled
[ 282.945441] note: rmmod[987] exited with preempt_count 2
[ 282.951337] ------------[ cut here ]------------
[ 282.955980] WARNING: CPU: 0 PID: 0 at kernel/context_tracking.c:128
ct_kernel_exit.constprop.0 (kernel/context_tracking.c:128
(discriminator 1))
[ 282.965683] Modules linked in: ftrace_direct ftrace_direct_too
hdlcd tda998x onboard_usb_hub drm_dma_helper cec crct10dif_ce
drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [last
unloaded: ftrace_direct]
[ 282.985075] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D
6.6.2-rc1 #1
[ 282.992767] Hardware name: ARM Juno development board (r2) (DT)
[ 282.998713] pstate: 200003c5 (nzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 283.005708] pc : ct_kernel_exit.constprop.0
(kernel/context_tracking.c:128 (discriminator 1))
[ 283.010965] lr : ct_idle_enter (kernel/context_tracking.c:321)
[ 283.014915] sp : ffff800083283b90
[ 283.018249] x29: ffff800083283b90 x28: 00000000826a40ac x27: 0000000000000000
[ 283.025452] x26: 00000000fef770dc x25: 0000000000000001 x24: ffff000822878880
[ 283.032654] x23: 0000000000000001 x22: 00000000000003c0 x21: 0000000000010000
[ 283.039856] x20: ffff80008172c118 x19: ffff00097ed16da0 x18: 0000000000000010
[ 283.047059] x17: 0000000000000000 x16: 3030303030303020 x15: 0000000000000000
[ 283.054260] x14: 0000000000000000 x13: 3220746e756f635f x12: 0000000000000003
[ 283.061462] x11: ffff800083343078 x10: ffff800083283bc0 x9 : ffff800080039620
[ 283.068665] x8 : ffff800083283af8 x7 : 0000000000000000 x6 : 000000000000003f
[ 283.075867] x5 : 4000000000000002 x4 : ffff8008fc5b4000 x3 : ffff800083283b90
[ 283.083069] x2 : ffff800082762da0 x1 : 4000000000000000 x0 : ffff800082762da0
[ 283.090271] Call trace:
[ 283.092736] ct_kernel_exit.constprop.0
(kernel/context_tracking.c:128 (discriminator 1))
[ 283.097646] ct_idle_enter (kernel/context_tracking.c:321)
[ 283.101249] cpu_suspend (arch/arm64/kernel/suspend.c:130)
[ 283.104854] psci_cpu_suspend_enter (drivers/firmware/psci/psci.c:488)
[ 283.109244] psci_enter_idle_state
(drivers/cpuidle/cpuidle-psci.c:190 (discriminator 10))
[ 283.113544] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
[ 283.117757] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
[ 283.121361] do_idle (kernel/sched/idle.c:134
kernel/sched/idle.c:215 kernel/sched/idle.c:282)
[ 283.124619] cpu_startup_entry (kernel/sched/idle.c:380 (discriminator 1))
[ 283.128572] rest_init (include/linux/cpumask.h:1058 init/main.c:700)
[ 283.131913] arch_call_rest_init+0x18/0x20
[ 283.136042] start_kernel (init/main.c:882 (discriminator 1))
[ 283.139734] __primary_switched (arch/arm64/kernel/head.S:524)
[ 283.143775] ---[ end trace 0000000000000000 ]---

Links:
- https://lkft.validation.linaro.org/scheduler/job/7043802#L1756

--
Linaro LKFT
https://lkft.linaro.org

> Note, there's a similar bug on 6.7-rc1 which I'll fix first. And when
> that's accepted, I'll push this one for v6.6. This may be two patches as
> one if the d_invalidate() issue, and another is a memory leak fix.
>
> -- Steve
>
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 5fcfb634fec2..b60048469df1 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -289,6 +289,8 @@ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry)
> ef = dentry->d_fsdata;
> if (ef)
> free_ef(ef);
> + else
> + kfree(ei);
> return;
> }
>
> @@ -342,7 +344,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> static struct dentry *
> create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> {
> - bool invalidate = false;
> struct dentry *dentry;
>
> mutex_lock(&eventfs_mutex);
> @@ -387,23 +388,24 @@ create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> eventfs_post_create_dir(ef);
> dentry->d_fsdata = ef;
> } else {
> - /* A race here, should try again (unless freed) */
> - invalidate = true;
> -
> /*
> + * If we are here then the directory is being freed.
> + * The simple_recursive_removal() will get rid of the dentry
> + * here.
> + */
> + dentry = NULL;
> + /*
> * Should never happen unless we get here due to being freed.
> * Otherwise it means two dentries exist with the same name.
> */
> WARN_ON_ONCE(!ef->is_freed);
> }
> mutex_unlock(&eventfs_mutex);
> - if (invalidate)
> - d_invalidate(dentry);
>
> - if (lookup || invalidate)
> + if (lookup)
> dput(dentry);
>
> - return invalidate ? NULL : dentry;
> + return dentry;
> }
>
> static bool match_event_file(struct eventfs_file *ef, const char *name)
> --
> 2.42.0
>

2023-11-22 15:13:04

by Steven Rostedt

[permalink] [raw]
Subject: Re: selftests: ftrace: WARNING: __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))

On Wed, 22 Nov 2023 19:49:43 +0530
Naresh Kamboju <[email protected]> wrote:

> Hi Steven,
>
>
>
> On Tue, 21 Nov 2023 at 02:06, Steven Rostedt <[email protected]> wrote:
> >
> > On Thu, 16 Nov 2023 18:00:16 +0530
> > Naresh Kamboju <[email protected]> wrote:
> >
> > > Following kernel crash noticed while running selftests: ftrace on arm64 Juno-r2
> > > device running stable-rc linux-6.6.y kernel.
> > >
> > > This kernel crash is hard to reproduce.
> > >
> >
> > Can you test this patch.
>
> I have applied your patch on top of Linux stable-rc linux-6.6.y and tested
> selftests: ftrace: for 100 iterations on arm64 Juno-r2 device.
>
> Out of 100 test runs I see the following exception only once.
> I do not see crashes related to the previous report.

This one looks unrelated to the other crash.

>
>
> Test log:
> ----
> # ok 8 trace_pipe and trace_marker
> ...
> [ 282.726999] Unexpected kernel BRK exception at EL1

What's a "BRK exception"?

> [ 282.731840] Internal error: BRK handler: 00000000f20003e8 [#1] PREEMPT SMP
> [ 282.738752] Modules linked in: ftrace_direct ftrace_direct_too
> hdlcd tda998x onboard_usb_hub drm_dma_helper cec crct10dif_ce
> drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [last
> unloaded: ftrace_direct]
> [ 282.758152] CPU: 0 PID: 987 Comm: rmmod Not tainted 6.6.2-rc1 #1

So this happened on removing a module? Unloading ftrace_direct?

This could be a direct trampoline bug.

But it doesn't look to be related to eventfs, so I'm going with my other patches.

Thanks,

-- Steve


> [ 282.764191] Hardware name: ARM Juno development board (r2) (DT)
> [ 282.770138] pstate: a00000c5 (NzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 282.777133] pc : ring_buffer_lock_reserve
> (kernel/trace/ring_buffer.c:3304 (discriminator 1)
> kernel/trace/ring_buffer.c:3819 (discriminator 1))
> [ 282.782230] lr : trace_function (kernel/trace/trace.c:992
> kernel/trace/trace.c:3078)
> [ 282.786360] sp : ffff80008428ba20
> [ 282.789695] x29: ffff80008428ba20 x28: ffff00082bb367c0 x27: 0000000000000000
> [ 282.796898] x26: ffff800080371d84 x25: 0000000000000000 x24: ffff00097ed1e4b0
> [ 282.804100] x23: ffff000800018400 x22: 0000000000000000 x21: 0000000000000018
> [ 282.811302] x20: ffff000800018400 x19: ffff000800041400 x18: ffffffffffffffff
> [ 282.818504] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800081d2ec58
> [ 282.825706] x14: ffffffffffffffff x13: ffff80008428beb0 x12: 000002550000ffff
> [ 282.832908] x11: ffff800083343078 x10: ffff80008428bbc0 x9 : ffff8000803e3110
> [ 282.840110] x8 : ffff80008428ba98 x7 : 0000000000000000 x6 : 0000000000000009
> [ 282.847311] x5 : 0000000000000081 x4 : fffffffffffffffb x3 : 0000000000000001
> [ 282.854513] x2 : ffffffffffffffff x1 : 0000000000000000 x0 : 0000000000000000
> [ 282.861715] Call trace:
> [ 282.864179] ring_buffer_lock_reserve
> (kernel/trace/ring_buffer.c:3304 (discriminator 1)
> kernel/trace/ring_buffer.c:3819 (discriminator 1))
> [ 282.868920] trace_function (kernel/trace/trace.c:992
> kernel/trace/trace.c:3078)
> [ 282.872702] tracer_hardirqs_off (kernel/trace/trace_irqsoff.c:279
> kernel/trace/trace_irqsoff.c:397 kernel/trace/trace_irqsoff.c:619
> kernel/trace/trace_irqsoff.c:616)
> [ 282.877009] trace_hardirqs_off.part.0
> (kernel/trace/trace_preemptirq.c:78 (discriminator 1))
> [ 282.881748] trace_hardirqs_off (kernel/trace/trace_preemptirq.c:94)
> [ 282.885791] obj_cgroup_charge (mm/memcontrol.c:3236 (discriminator
> 1) mm/memcontrol.c:3364 (discriminator 1))
> [ 282.889918] kmem_cache_alloc (mm/slab.h:503 (discriminator 2)
> mm/slab.h:714 (discriminator 2) mm/slub.c:3460 (discriminator 2)
> mm/slub.c:3486 (discriminator 2) mm/slub.c:3493 (discriminator 2)
> mm/slub.c:3502 (discriminator 2))
> [ 282.893873] __anon_vma_prepare (mm/rmap.c:197)
> [ 282.897999] __handle_mm_fault (mm/memory.c:4111 (discriminator 2)
> mm/memory.c:3670 (discriminator 2) mm/memory.c:4981 (discriminator 2)
> mm/memory.c:5122 (discriminator 2))
> [ 282.902213] handle_mm_fault (mm/memory.c:5287)
> [ 282.906077] do_page_fault (arch/arm64/mm/fault.c:513
> arch/arm64/mm/fault.c:626)
> [ 282.909856] do_translation_fault (arch/arm64/mm/fault.c:714)
> [ 282.914068] do_mem_abort (arch/arm64/mm/fault.c:846 (discriminator 1))
> [ 282.917591] el0_da (arch/arm64/include/asm/daifflags.h:28
> arch/arm64/kernel/entry-common.c:133
> arch/arm64/kernel/entry-common.c:144
> arch/arm64/kernel/entry-common.c:547)
> [ 282.920590] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:700)
> [ 282.924894] el0t_64_sync (arch/arm64/kernel/entry.S:595)
> [ 282.928593] Code: 88037c22 35ffffa3 17fffff8 94549eb9 (d4207d00)
> All code
> ========
> 0: 88037c22 stxr w3, w2, [x1]
> 4: 35ffffa3 cbnz w3, 0xfffffffffffffff8
> 8: 17fffff8 b 0xffffffffffffffe8
> c: 94549eb9 bl 0x1527af0
> 10:* d4207d00 brk #0x3e8 <-- trapping instruction
>
> Code starting with the faulting instruction
> ===========================================
> 0: d4207d00 brk #0x3e8
> [ 282.934716] ---[ end trace 0000000000000000 ]---
> [ 282.939358] note: rmmod[987] exited with irqs disabled
> [ 282.945441] note: rmmod[987] exited with preempt_count 2
> [ 282.951337] ------------[ cut here ]------------
> [ 282.955980] WARNING: CPU: 0 PID: 0 at kernel/context_tracking.c:128
> ct_kernel_exit.constprop.0 (kernel/context_tracking.c:128
> (discriminator 1))
> [ 282.965683] Modules linked in: ftrace_direct ftrace_direct_too
> hdlcd tda998x onboard_usb_hub drm_dma_helper cec crct10dif_ce
> drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [last
> unloaded: ftrace_direct]
> [ 282.985075] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D
> 6.6.2-rc1 #1
> [ 282.992767] Hardware name: ARM Juno development board (r2) (DT)
> [ 282.998713] pstate: 200003c5 (nzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 283.005708] pc : ct_kernel_exit.constprop.0
> (kernel/context_tracking.c:128 (discriminator 1))
> [ 283.010965] lr : ct_idle_enter (kernel/context_tracking.c:321)
> [ 283.014915] sp : ffff800083283b90
> [ 283.018249] x29: ffff800083283b90 x28: 00000000826a40ac x27: 0000000000000000
> [ 283.025452] x26: 00000000fef770dc x25: 0000000000000001 x24: ffff000822878880
> [ 283.032654] x23: 0000000000000001 x22: 00000000000003c0 x21: 0000000000010000
> [ 283.039856] x20: ffff80008172c118 x19: ffff00097ed16da0 x18: 0000000000000010
> [ 283.047059] x17: 0000000000000000 x16: 3030303030303020 x15: 0000000000000000
> [ 283.054260] x14: 0000000000000000 x13: 3220746e756f635f x12: 0000000000000003
> [ 283.061462] x11: ffff800083343078 x10: ffff800083283bc0 x9 : ffff800080039620
> [ 283.068665] x8 : ffff800083283af8 x7 : 0000000000000000 x6 : 000000000000003f
> [ 283.075867] x5 : 4000000000000002 x4 : ffff8008fc5b4000 x3 : ffff800083283b90
> [ 283.083069] x2 : ffff800082762da0 x1 : 4000000000000000 x0 : ffff800082762da0
> [ 283.090271] Call trace:
> [ 283.092736] ct_kernel_exit.constprop.0
> (kernel/context_tracking.c:128 (discriminator 1))
> [ 283.097646] ct_idle_enter (kernel/context_tracking.c:321)
> [ 283.101249] cpu_suspend (arch/arm64/kernel/suspend.c:130)
> [ 283.104854] psci_cpu_suspend_enter (drivers/firmware/psci/psci.c:488)
> [ 283.109244] psci_enter_idle_state
> (drivers/cpuidle/cpuidle-psci.c:190 (discriminator 10))
> [ 283.113544] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
> [ 283.117757] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
> [ 283.121361] do_idle (kernel/sched/idle.c:134
> kernel/sched/idle.c:215 kernel/sched/idle.c:282)
> [ 283.124619] cpu_startup_entry (kernel/sched/idle.c:380 (discriminator 1))
> [ 283.128572] rest_init (include/linux/cpumask.h:1058 init/main.c:700)
> [ 283.131913] arch_call_rest_init+0x18/0x20
> [ 283.136042] start_kernel (init/main.c:882 (discriminator 1))
> [ 283.139734] __primary_switched (arch/arm64/kernel/head.S:524)
> [ 283.143775] ---[ end trace 0000000000000000 ]---
>
> Links:
> - https://lkft.validation.linaro.org/scheduler/job/7043802#L1756
>
> --
> Linaro LKFT
> https://lkft.linaro.org
>
> > Note, there's a similar bug on 6.7-rc1 which I'll fix first. And when
> > that's accepted, I'll push this one for v6.6. This may be two patches as
> > one if the d_invalidate() issue, and another is a memory leak fix.
> >
> > -- Steve
> >
> > diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> > index 5fcfb634fec2..b60048469df1 100644
> > --- a/fs/tracefs/event_inode.c
> > +++ b/fs/tracefs/event_inode.c
> > @@ -289,6 +289,8 @@ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry)
> > ef = dentry->d_fsdata;
> > if (ef)
> > free_ef(ef);
> > + else
> > + kfree(ei);
> > return;
> > }
> >
> > @@ -342,7 +344,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> > static struct dentry *
> > create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> > {
> > - bool invalidate = false;
> > struct dentry *dentry;
> >
> > mutex_lock(&eventfs_mutex);
> > @@ -387,23 +388,24 @@ create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> > eventfs_post_create_dir(ef);
> > dentry->d_fsdata = ef;
> > } else {
> > - /* A race here, should try again (unless freed) */
> > - invalidate = true;
> > -
> > /*
> > + * If we are here then the directory is being freed.
> > + * The simple_recursive_removal() will get rid of the dentry
> > + * here.
> > + */
> > + dentry = NULL;
> > + /*
> > * Should never happen unless we get here due to being freed.
> > * Otherwise it means two dentries exist with the same name.
> > */
> > WARN_ON_ONCE(!ef->is_freed);
> > }
> > mutex_unlock(&eventfs_mutex);
> > - if (invalidate)
> > - d_invalidate(dentry);
> >
> > - if (lookup || invalidate)
> > + if (lookup)
> > dput(dentry);
> >
> > - return invalidate ? NULL : dentry;
> > + return dentry;
> > }
> >
> > static bool match_event_file(struct eventfs_file *ef, const char *name)
> > --
> > 2.42.0
> >

2023-11-23 17:51:06

by Mark Rutland

[permalink] [raw]
Subject: Re: selftests: ftrace: WARNING: __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))

On Wed, Nov 22, 2023 at 10:12:51AM -0500, Steven Rostedt wrote:
> On Wed, 22 Nov 2023 19:49:43 +0530
> Naresh Kamboju <[email protected]> wrote:
>
> > Hi Steven,
> >
> >
> >
> > On Tue, 21 Nov 2023 at 02:06, Steven Rostedt <[email protected]> wrote:
> > >
> > > On Thu, 16 Nov 2023 18:00:16 +0530
> > > Naresh Kamboju <[email protected]> wrote:
> > [ 282.726999] Unexpected kernel BRK exception at EL1
>
> What's a "BRK exception"?

That's triggered by a BRK instruction (software breakpoint), we use it like UD2
on x86.

> > [ 282.731840] Internal error: BRK handler: 00000000f20003e8 [#1] PREEMPT SMP

That lump of hex here means "this was triggered by a BRK #1000" instruction.

That immediate (0x3e8 / 1000) is what GCC/Clang use for __builtin_trap(), which
is generated for a bunch of reasons, usually a sanitizer like UBSAN, or options
to trap on runtime issues.

Naresh, where can I find the config used for this run? I can try to reproduce
this and investigate.

Thanks,
Mark.

> > [ 282.738752] Modules linked in: ftrace_direct ftrace_direct_too
> > hdlcd tda998x onboard_usb_hub drm_dma_helper cec crct10dif_ce
> > drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [last
> > unloaded: ftrace_direct]
> > [ 282.758152] CPU: 0 PID: 987 Comm: rmmod Not tainted 6.6.2-rc1 #1
>
> So this happened on removing a module? Unloading ftrace_direct?
>
> This could be a direct trampoline bug.
>
> But it doesn't look to be related to eventfs, so I'm going with my other patches.
>
> Thanks,
>
> -- Steve
>
>
> > [ 282.764191] Hardware name: ARM Juno development board (r2) (DT)
> > [ 282.770138] pstate: a00000c5 (NzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 282.777133] pc : ring_buffer_lock_reserve
> > (kernel/trace/ring_buffer.c:3304 (discriminator 1)
> > kernel/trace/ring_buffer.c:3819 (discriminator 1))
> > [ 282.782230] lr : trace_function (kernel/trace/trace.c:992
> > kernel/trace/trace.c:3078)
> > [ 282.786360] sp : ffff80008428ba20
> > [ 282.789695] x29: ffff80008428ba20 x28: ffff00082bb367c0 x27: 0000000000000000
> > [ 282.796898] x26: ffff800080371d84 x25: 0000000000000000 x24: ffff00097ed1e4b0
> > [ 282.804100] x23: ffff000800018400 x22: 0000000000000000 x21: 0000000000000018
> > [ 282.811302] x20: ffff000800018400 x19: ffff000800041400 x18: ffffffffffffffff
> > [ 282.818504] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800081d2ec58
> > [ 282.825706] x14: ffffffffffffffff x13: ffff80008428beb0 x12: 000002550000ffff
> > [ 282.832908] x11: ffff800083343078 x10: ffff80008428bbc0 x9 : ffff8000803e3110
> > [ 282.840110] x8 : ffff80008428ba98 x7 : 0000000000000000 x6 : 0000000000000009
> > [ 282.847311] x5 : 0000000000000081 x4 : fffffffffffffffb x3 : 0000000000000001
> > [ 282.854513] x2 : ffffffffffffffff x1 : 0000000000000000 x0 : 0000000000000000
> > [ 282.861715] Call trace:
> > [ 282.864179] ring_buffer_lock_reserve
> > (kernel/trace/ring_buffer.c:3304 (discriminator 1)
> > kernel/trace/ring_buffer.c:3819 (discriminator 1))
> > [ 282.868920] trace_function (kernel/trace/trace.c:992
> > kernel/trace/trace.c:3078)
> > [ 282.872702] tracer_hardirqs_off (kernel/trace/trace_irqsoff.c:279
> > kernel/trace/trace_irqsoff.c:397 kernel/trace/trace_irqsoff.c:619
> > kernel/trace/trace_irqsoff.c:616)
> > [ 282.877009] trace_hardirqs_off.part.0
> > (kernel/trace/trace_preemptirq.c:78 (discriminator 1))
> > [ 282.881748] trace_hardirqs_off (kernel/trace/trace_preemptirq.c:94)
> > [ 282.885791] obj_cgroup_charge (mm/memcontrol.c:3236 (discriminator
> > 1) mm/memcontrol.c:3364 (discriminator 1))
> > [ 282.889918] kmem_cache_alloc (mm/slab.h:503 (discriminator 2)
> > mm/slab.h:714 (discriminator 2) mm/slub.c:3460 (discriminator 2)
> > mm/slub.c:3486 (discriminator 2) mm/slub.c:3493 (discriminator 2)
> > mm/slub.c:3502 (discriminator 2))
> > [ 282.893873] __anon_vma_prepare (mm/rmap.c:197)
> > [ 282.897999] __handle_mm_fault (mm/memory.c:4111 (discriminator 2)
> > mm/memory.c:3670 (discriminator 2) mm/memory.c:4981 (discriminator 2)
> > mm/memory.c:5122 (discriminator 2))
> > [ 282.902213] handle_mm_fault (mm/memory.c:5287)
> > [ 282.906077] do_page_fault (arch/arm64/mm/fault.c:513
> > arch/arm64/mm/fault.c:626)
> > [ 282.909856] do_translation_fault (arch/arm64/mm/fault.c:714)
> > [ 282.914068] do_mem_abort (arch/arm64/mm/fault.c:846 (discriminator 1))
> > [ 282.917591] el0_da (arch/arm64/include/asm/daifflags.h:28
> > arch/arm64/kernel/entry-common.c:133
> > arch/arm64/kernel/entry-common.c:144
> > arch/arm64/kernel/entry-common.c:547)
> > [ 282.920590] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:700)
> > [ 282.924894] el0t_64_sync (arch/arm64/kernel/entry.S:595)
> > [ 282.928593] Code: 88037c22 35ffffa3 17fffff8 94549eb9 (d4207d00)
> > All code
> > ========
> > 0: 88037c22 stxr w3, w2, [x1]
> > 4: 35ffffa3 cbnz w3, 0xfffffffffffffff8
> > 8: 17fffff8 b 0xffffffffffffffe8
> > c: 94549eb9 bl 0x1527af0
> > 10:* d4207d00 brk #0x3e8 <-- trapping instruction
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: d4207d00 brk #0x3e8
> > [ 282.934716] ---[ end trace 0000000000000000 ]---
> > [ 282.939358] note: rmmod[987] exited with irqs disabled
> > [ 282.945441] note: rmmod[987] exited with preempt_count 2
> > [ 282.951337] ------------[ cut here ]------------
> > [ 282.955980] WARNING: CPU: 0 PID: 0 at kernel/context_tracking.c:128
> > ct_kernel_exit.constprop.0 (kernel/context_tracking.c:128
> > (discriminator 1))
> > [ 282.965683] Modules linked in: ftrace_direct ftrace_direct_too
> > hdlcd tda998x onboard_usb_hub drm_dma_helper cec crct10dif_ce
> > drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [last
> > unloaded: ftrace_direct]
> > [ 282.985075] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D
> > 6.6.2-rc1 #1
> > [ 282.992767] Hardware name: ARM Juno development board (r2) (DT)
> > [ 282.998713] pstate: 200003c5 (nzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 283.005708] pc : ct_kernel_exit.constprop.0
> > (kernel/context_tracking.c:128 (discriminator 1))
> > [ 283.010965] lr : ct_idle_enter (kernel/context_tracking.c:321)
> > [ 283.014915] sp : ffff800083283b90
> > [ 283.018249] x29: ffff800083283b90 x28: 00000000826a40ac x27: 0000000000000000
> > [ 283.025452] x26: 00000000fef770dc x25: 0000000000000001 x24: ffff000822878880
> > [ 283.032654] x23: 0000000000000001 x22: 00000000000003c0 x21: 0000000000010000
> > [ 283.039856] x20: ffff80008172c118 x19: ffff00097ed16da0 x18: 0000000000000010
> > [ 283.047059] x17: 0000000000000000 x16: 3030303030303020 x15: 0000000000000000
> > [ 283.054260] x14: 0000000000000000 x13: 3220746e756f635f x12: 0000000000000003
> > [ 283.061462] x11: ffff800083343078 x10: ffff800083283bc0 x9 : ffff800080039620
> > [ 283.068665] x8 : ffff800083283af8 x7 : 0000000000000000 x6 : 000000000000003f
> > [ 283.075867] x5 : 4000000000000002 x4 : ffff8008fc5b4000 x3 : ffff800083283b90
> > [ 283.083069] x2 : ffff800082762da0 x1 : 4000000000000000 x0 : ffff800082762da0
> > [ 283.090271] Call trace:
> > [ 283.092736] ct_kernel_exit.constprop.0
> > (kernel/context_tracking.c:128 (discriminator 1))
> > [ 283.097646] ct_idle_enter (kernel/context_tracking.c:321)
> > [ 283.101249] cpu_suspend (arch/arm64/kernel/suspend.c:130)
> > [ 283.104854] psci_cpu_suspend_enter (drivers/firmware/psci/psci.c:488)
> > [ 283.109244] psci_enter_idle_state
> > (drivers/cpuidle/cpuidle-psci.c:190 (discriminator 10))
> > [ 283.113544] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:267)
> > [ 283.117757] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2))
> > [ 283.121361] do_idle (kernel/sched/idle.c:134
> > kernel/sched/idle.c:215 kernel/sched/idle.c:282)
> > [ 283.124619] cpu_startup_entry (kernel/sched/idle.c:380 (discriminator 1))
> > [ 283.128572] rest_init (include/linux/cpumask.h:1058 init/main.c:700)
> > [ 283.131913] arch_call_rest_init+0x18/0x20
> > [ 283.136042] start_kernel (init/main.c:882 (discriminator 1))
> > [ 283.139734] __primary_switched (arch/arm64/kernel/head.S:524)
> > [ 283.143775] ---[ end trace 0000000000000000 ]---
> >
> > Links:
> > - https://lkft.validation.linaro.org/scheduler/job/7043802#L1756
> >
> > --
> > Linaro LKFT
> > https://lkft.linaro.org
> >
> > > Note, there's a similar bug on 6.7-rc1 which I'll fix first. And when
> > > that's accepted, I'll push this one for v6.6. This may be two patches as
> > > one if the d_invalidate() issue, and another is a memory leak fix.
> > >
> > > -- Steve
> > >
> > > diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> > > index 5fcfb634fec2..b60048469df1 100644
> > > --- a/fs/tracefs/event_inode.c
> > > +++ b/fs/tracefs/event_inode.c
> > > @@ -289,6 +289,8 @@ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry)
> > > ef = dentry->d_fsdata;
> > > if (ef)
> > > free_ef(ef);
> > > + else
> > > + kfree(ei);
> > > return;
> > > }
> > >
> > > @@ -342,7 +344,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> > > static struct dentry *
> > > create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> > > {
> > > - bool invalidate = false;
> > > struct dentry *dentry;
> > >
> > > mutex_lock(&eventfs_mutex);
> > > @@ -387,23 +388,24 @@ create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> > > eventfs_post_create_dir(ef);
> > > dentry->d_fsdata = ef;
> > > } else {
> > > - /* A race here, should try again (unless freed) */
> > > - invalidate = true;
> > > -
> > > /*
> > > + * If we are here then the directory is being freed.
> > > + * The simple_recursive_removal() will get rid of the dentry
> > > + * here.
> > > + */
> > > + dentry = NULL;
> > > + /*
> > > * Should never happen unless we get here due to being freed.
> > > * Otherwise it means two dentries exist with the same name.
> > > */
> > > WARN_ON_ONCE(!ef->is_freed);
> > > }
> > > mutex_unlock(&eventfs_mutex);
> > > - if (invalidate)
> > > - d_invalidate(dentry);
> > >
> > > - if (lookup || invalidate)
> > > + if (lookup)
> > > dput(dentry);
> > >
> > > - return invalidate ? NULL : dentry;
> > > + return dentry;
> > > }
> > >
> > > static bool match_event_file(struct eventfs_file *ef, const char *name)
> > > --
> > > 2.42.0
> > >
>

2023-11-24 08:56:11

by Naresh Kamboju

[permalink] [raw]
Subject: Re: selftests: ftrace: WARNING: __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))

Hi Mark,

On Thu, 23 Nov 2023 at 23:20, Mark Rutland <[email protected]> wrote:
>
> On Wed, Nov 22, 2023 at 10:12:51AM -0500, Steven Rostedt wrote:
> > On Wed, 22 Nov 2023 19:49:43 +0530
> > Naresh Kamboju <[email protected]> wrote:
> >
> > > Hi Steven,
> > >
> > >
> > >
> > > On Tue, 21 Nov 2023 at 02:06, Steven Rostedt <[email protected]> wrote:
> > > >
> > > > On Thu, 16 Nov 2023 18:00:16 +0530
> > > > Naresh Kamboju <[email protected]> wrote:
> > > [ 282.726999] Unexpected kernel BRK exception at EL1
> >
> > What's a "BRK exception"?
>
> That's triggered by a BRK instruction (software breakpoint), we use it like UD2
> on x86.
>
> > > [ 282.731840] Internal error: BRK handler: 00000000f20003e8 [#1] PREEMPT SMP
>
> That lump of hex here means "this was triggered by a BRK #1000" instruction.
>
> That immediate (0x3e8 / 1000) is what GCC/Clang use for __builtin_trap(), which
> is generated for a bunch of reasons, usually a sanitizer like UBSAN, or options
> to trap on runtime issues.

The reported "Internal error: BRK handler:'' has been noticed several times
on arm64 Juno-r2, Raspberry-pi-4 and qemu-arm64 in the past but the occurrence
is very rare (1%) and hard to reproduce.

The old report links [1] [2] [3].

As you suspected, the following configs are enabled on this build.

CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
CONFIG_UBSAN=y
CONFIG_UBSAN_TRAP=y
CONFIG_CC_HAS_UBSAN_BOUNDS_STRICT=y
CONFIG_UBSAN_BOUNDS=y
CONFIG_UBSAN_BOUNDS_STRICT=y
CONFIG_UBSAN_SHIFT=y
# CONFIG_UBSAN_DIV_ZERO is not set
# CONFIG_UBSAN_UNREACHABLE is not set
CONFIG_UBSAN_BOOL=y
CONFIG_UBSAN_ENUM=y
CONFIG_UBSAN_SANITIZE_ALL=y
# CONFIG_TEST_UBSAN is not set

>
> Naresh, where can I find the config used for this run? I can try to reproduce
> this and investigate.

The build artifacts are located in this link [4] and attached.
- https://storage.tuxsuite.com/public/linaro/naresh/builds/2YTWTd3fPQdanNEe0Oz59XoRfa1/

May be useful links for the previous reports on this warning / internal error:

Following boot warnings and crashes noticed on arm64 Rpi4 device running
Linux next-20230621 kernel.
[1] - https://lore.kernel.org/all/CA+G9fYuifLivwhCh33kedtpU=6zUpTQ_uSkESyzdRKYp8WbTFQ@mail.gmail.com/

While running selftest clone3 test cases on qemu-arm64 the following
kernel crash reported on Linux mainline kernel 6.4.0.
[2] - https://lore.kernel.org/all/CA+G9fYsuc8D98BtW9rX0ahS9Rfqyn-5CALYWTy6fr_ypJqEErA@mail.gmail.com/

The kernel crash reported on arm64 juno-r2 device with kselftest-merge config
while booting Linux next-20220513 kernel.
[3] - https://lore.kernel.org/all/CA+G9fYtsp-1pi6d4J71BPYh-msjzbVt_-v3YrUu12dXPeyqTDg@mail.gmail.com/

>
> Thanks,
> Mark.

- Naresh


Attachments:
arm64-Juno-r2.config (296.64 kB)