2018-08-23 02:39:50

by Chen, Rong A

[permalink] [raw]
Subject: [lkp-robot] [tracing] c3bc8fd637: WARNING:at_kernel/rcu/tree.c:#rcu_irq_enter

FYI, we noticed the following commit (built with gcc-6):

commit: c3bc8fd637a9623f5c507bd18f9677effbddf584 ("tracing: Centralize preemptirq tracepoints and unify their usage")
https://git.kernel.org/cgit/linux/kernel/git/rostedt/linux-trace.git ftrace/core

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 4G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------------------------------+------------+------------+
| | e6753f23d9 | c3bc8fd637 |
+------------------------------------------------------------------+------------+------------+
| boot_successes | 19 | 4 |
| boot_failures | 2 | 15 |
| invoked_oom-killer:gfp_mask=0x | 2 | 3 |
| Mem-Info | 2 | 3 |
| Out_of_memory:Kill_process | 1 | 3 |
| RIP:copy_user_generic_string | 1 | 1 |
| RIP:__clear_user | 1 | |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 1 | 1 |
| WARNING:at_kernel/rcu/tree.c:#rcu_irq_enter | 0 | 12 |
| RIP:rcu_irq_enter | 0 | 12 |
| RIP:arch_local_irq_restore | 0 | 12 |
| WARNING:at_kernel/events/core.c:#perf_event_task_tick | 0 | 12 |
| RIP:perf_event_task_tick | 0 | 12 |
| WARNING:at_kernel/rcu/tree.c:#rcu_irq_exit | 0 | 11 |
| RIP:rcu_irq_exit | 0 | 11 |
| RIP:check_poison_obj | 0 | 1 |
| RIP:get_page_from_freelist | 0 | 1 |
| RIP:locks_remove_posix | 0 | 1 |
+------------------------------------------------------------------+------------+------------+



[ 11.209512] WARNING: CPU: 0 PID: 24 at kernel/rcu/tree.c:999 rcu_irq_enter+0x55/0x77
[ 11.209512] CPU: 0 PID: 24 Comm: kworker/0:1 Tainted: G T 4.18.0-rc6-00023-gc3bc8fd #1
[ 11.209512] Workqueue: events test_mutex_work
[ 11.209512] RIP: 0010:rcu_irq_enter+0x55/0x77
[ 11.209512] Code: 00 00 00 75 27 83 b8 d8 0a 00 00 00 74 1e 80 3d 48 e3 f0 01 00 75 15 48 c7 c7 1a 1a 8b 82 c6 05 38 e3 f0 01 01 e8 fc f5 f9 ff <0f> 0b 48 83 7b 08 00 75 13 65 48 8b 04 25 c0 4e 01 00 c7 80 40 19
[ 11.209512] RSP: 0000:ffff88013f403fc8 EFLAGS: 00010082
[ 11.209512] RAX: 0000000000000000 RBX: ffff88013f419c60 RCX: ffff88013f403e44
[ 11.209512] RDX: 0000000000000003 RSI: 0000000000000003 RDI: 0000000000000046
[ 11.209512] RBP: ffff88013f403fe8 R08: 0000000000000001 R09: 0000000000000001
[ 11.209512] R10: 0000000000000000 R11: ffffffff82c98b38 R12: 0000000000000000
[ 11.209512] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 11.209512] FS: 0000000000000000(0000) GS:ffff88013f400000(0000) knlGS:0000000000000000
[ 11.209512] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.209512] CR2: 0000000000000000 CR3: 0000000002c57001 CR4: 00000000001606f0
[ 11.209512] Call Trace:
[ 11.209512] <IRQ>
[ 11.209512] irq_enter+0xa/0x60
[ 11.209512] scheduler_ipi+0x4b/0x8a
[ 11.209512] reschedule_interrupt+0xf/0x20
[ 11.209512] </IRQ>
[ 11.209512] RIP: 0010:arch_local_irq_restore+0x2/0x8
[ 11.209512] Code: 15 48 c7 c6 19 8c 8c 82 48 c7 c7 ca 58 8b 82 e8 da 0b fc ff 0f 0b 48 c7 43 58 00 00 00 00 5b c3 9c 58 0f 1f 44 00 00 c3 57 9d <0f> 1f 44 00 00 c3 e8 eb ff ff ff 48 89 c2 fa 66 0f 1f 44 00 00 48
[ 11.209512] RSP: 0000:ffffc900000cbe60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02
[ 11.209512] RAX: ffff88013ead8100 RBX: ffff88013ea0aa80 RCX: 0000000000000000
[ 11.209512] RDX: 0000000000000001 RSI: ffffffff83e00190 RDI: 0000000000000246
[ 11.209512] RBP: ffffc900000cbed0 R08: ffffffff83e00190 R09: ffffffff8323b3ec
[ 11.209512] R10: ffffc900000cbdf8 R11: 0000000000000002 R12: ffff88013f4217c0
[ 11.209512] R13: ffff88013f424600 R14: 0000000000000000 R15: 0000000000000000
[ 11.209512] process_one_work+0x1ff/0x4a9
[ 11.209512] ? process_one_work+0x199/0x4a9
[ 11.209512] ? process_scheduled_works+0x2c/0x2c
[ 11.209512] worker_thread+0x1b2/0x261
[ 11.209512] ? process_scheduled_works+0x2c/0x2c
[ 11.209512] kthread+0xf5/0xfa
[ 11.209512] ? queuing_blocked+0x32/0x32
[ 11.209512] ret_from_fork+0x3a/0x50
[ 11.209512] irq event stamp: 71539
[ 11.209512] hardirqs last enabled at (71539): [<0000000000000000>] (null)
[ 11.209512] hardirqs last disabled at (71538): [<0000000000000000>] (null)
[ 11.209512] softirqs last enabled at (71496): [<ffffffff8220030d>] __do_softirq+0x30d/0x38f
[ 11.209512] softirqs last disabled at (71489): [<ffffffff810daae3>] irq_exit+0x63/0xd4
[ 11.209512] _warn_unseeded_randomness: 1 callbacks suppressed
[ 11.209512] random: get_random_bytes called from init_oops_id+0x22/0x31 with crng_init=0
[ 11.209512] ---[ end trace ad4d30e5b1ea3d93 ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
lkp


Attachments:
(No filename) (5.97 kB)
config-4.18.0-rc6-00023-gc3bc8fd (124.77 kB)
job-script (3.98 kB)
dmesg.xz (20.45 kB)
trinity (6.22 kB)
Download all attachments

2018-08-28 19:55:16

by Joel Fernandes

[permalink] [raw]
Subject: Re: [lkp-robot] [tracing] c3bc8fd637: WARNING:at_kernel/rcu/tree.c:#rcu_irq_enter

On Thu, Aug 23, 2018 at 10:38:39AM +0800, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-6):
>
> commit: c3bc8fd637a9623f5c507bd18f9677effbddf584 ("tracing: Centralize preemptirq tracepoints and unify their usage")
> https://git.kernel.org/cgit/linux/kernel/git/rostedt/linux-trace.git ftrace/core
>
> in testcase: trinity
> with following parameters:
>
> runtime: 300s
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/

This appears to not be testing the tip of ftrace/core or Linus's tree which
contains several other patches/fixes ontop.

thanks,

- Joel


2020-09-25 04:46:36

by Naresh Kamboju

[permalink] [raw]
Subject: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

From stable rc 4.18.1 onwards to today's stable rc 4.19.147

There are two problems while running LTP tracing tests
1) kernel panic on i386, qemu_i386, x86_64 and qemu_x86_64 [1]
2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 [2]
Please refer to the full test logs from below links.

The first bad commit found by git bisect.
commit: c3bc8fd637a9623f5c507bd18f9677effbddf584
tracing: Centralize preemptirq tracepoints and unify their usage

Reported-by: Naresh Kamboju <[email protected]>

easily reproducible on qemu
steps to reproduce:
# Boot qemu x86_64 with trace configs enabled.
# cd /opt/ltp
# ./runltp -f tracing

metadata:
git branch: linux-4.19.y
git repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
make_kernelversion: 4.19.147
kernel-config:
https://builds.tuxbuild.com/lOpUmeYR2e1pzvYdlLgGqw/kernel.config


Crash log on qemu_i386
---------------------------------

ftrace-stress-test 1 TINFO: Start pid15=2414
/opt/ltp/testcases/bin/ftrace_stress/ftrace_buffer_size_kb.sh
ftrace-stress-test 1 TINFO: Start pid16=2415
/opt/ltp/testcases/bin/ftrace_stress/ftrace_tracing_cpumask.sh
ftrace-stress-test 1 TINFO: Start pid17=2416
/opt/ltp/testcases/bin/ftrace_stress/ftrace_set_ftrace_filter.sh
[ 38.479869] Scheduler tracepoints stat_sleep, stat_iowait,
stat_blocked and stat_runtime require the kernel parameter
schedstats=enable or kernel.sched_schedstats=1
Sep 23 18:39:40 intel-core2-32 user.warn kernel: [ 38.479869]
Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and
stat_runtime require the kernel parameter schedstats=enable or
kernel.sched_schedstats=1
[ 38.549712] cat[2583]: segfault at 0 ip b7f81767 sp bfbb3a20 error
4 in ld-2.27.so[b7f6c000+25000]
[ 38.550427] sh[2582]: segfault at 467 ip b7fba0d8 sp bfacdb04 error
4 in ld-2.27.so[b7f9f000+25000]
[ 38.551386] Code: 50 8d 86 84 62 ff ff 50 e8 86 a9 ff ff 83 c4 10
89 c2 83 f8 ff 0f 84 72 01 00 00 8b b6 e4 08 00 00 83 fe 10 0f 86 56
01 00 00 <81> 38 6c 64 2e 73 0f 85 1d 01 00 00 81 78 04 6f 2d 31 2e 0f
85 10
[ 38.552710] Code: 40 38 d5 74 ea 80 fd 00 74 12 c1 e9 10 40 38 d1
74 dd 80 f9 00 74 05 40 38 d5 74 d3 31 c0 eb cf 66 90 8b 4c 24 04 8b
54 24 08 <8a> 01 3a 02 75 09 41 42 84 c0 75 f4 31 c0 c3 b8 01 00 00 00
b9 ff
[ 38.556010] systemd-journal[1327]: segfault at 5e ip b7c61e12 sp
bff45044 error 6 in libc-2.27.so[b7b29000+1cc000]
[ 38.558971] sh[2584]: segfault at 0 ip b7f30c15 sp bfbce710 error 14
[ 38.559387] audit: type=1701 audit(1600886380.372:3):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2582
comm=\"sh\" exe=\"/bin/bash.bash\" sig=11 res=1
[ 38.559411] audit: type=1701 audit(1600886380.372:4):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2583
comm=\"cat\" exe=\"/bin/cat.coreutils\" sig=11 res=1
[ 38.560079] Code: 66 0f 7f 5c 3a f0 72 30 66 0f 6f 54 38 10 83 e9
20 66 0f 6f 5c 38 20 66 0f 6f cb 66 0f 3a 0f da 08 66 0f 3a 0f d4 08
8d 7f 20 <66> 0f 7f 54 3a e0 66 0f 7f 5c 3a f0 73 a0 8d 49 20 01 cf 01
fa 8d
[ 38.560811] audit: type=1701 audit(1600886380.373:5):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1327
comm=\"systemd-journal\" exe=\"/lib/systemd/systemd-journald\" sig=11
res=1
[ 38.561615] Code: Bad RIP value.
[ 38.564712] Core dump to |/bin/false pipe failed
[ 38.566144] Core dump to |/bin/false pipe failed
[ 38.566213] Core dump to |/bin/false pipe failed
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.549712]
cat[2583]: segfault at 0 ip b7f81767 sp bfbb3a20 error 4 in
ld-2.27.so[b7f6c000+25000]
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.550427]
sh[2582]: segfault at 467 ip b7fba0d8 sp bfacdb04 error 4 in
ld-2.27.so[b7f9f000+25000]
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.551386] Code:
50 8d 86 84 62 ff ff 50 e8 86 a9 ff ff 83 c4 10 89 c2 83 f8 ff 0f 84
72 01 00 00 8b b6 e4 08 00 00 83 fe 10 0f 86 56 01 00 00 <81> 38 6c 64
2e 73 0f 85 1d 01 00 00 81 78 04 6f 2d 31 2e 0f 85 10
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.552710] Code:
40 38 d5 74 ea 80 fd 00 74 12 c1 e9 10 40 38 d1 74 dd 80 f9 00 74 05
40 38 d5 74 d3 31 c0 eb cf 66 90 8b 4c 24 04 8b 54 24 08 <8a> 01 3a 02
75 09 41 42 84 c0 75[ 38.582519] systemd[1]: segfault at 1 ip
b7de036e sp bfd888e0 error 7 in
libsystemd-shared-237.so[b7cd4000+1e2000]
f4 31 c0 c3 b8 [ 38.584227] Code: 46 18 83 e0 1f 83 c8 20 88 46 18
89 f0 e8 ba da ff ff 85 c0 89 c3 0f 88 e0 00 00 00 8b 44 24 24 31 db
85 c0 74 06 8b 44 24 24 <89> 30 83 c4 0c 89 d8 5b 5e 5f 5d c3 8d b6 00
00 00 00 8d 83 1c 1c
01 00 00 00 b9 ff
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.556010]
systemd-journal[1327]: segfau[ 38.587783] systemd[1]: segfault at 0
ip b7a9fbe3 sp bfd88000 error 7 in libc-2.27.so[b79e5000+1cc000]
lt at 5e ip b7c6[ 38.589349] Code: 14 8b 4c 24 10 8b 5c 24 0c b8 72
00 00 00 65 ff 15 10 00 00 00 5b 5e 3d 01 f0 ff ff 0f 83 75 e3 f5 ff
c3 66 90 66 90 55 57 56 <53> 83 ec 1c 8b 5c 24 30 8b 4c 24 34 8b 54 24
38 8b 74 24 3c 65 8b
1e12 sp bff45044 error 6 in libc-2.27.so[b7b29000+1cc000]
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.558971]
sh[2584]: segfault at 0 ip b7f30c15 sp bfbce710 error 14
Sep 23 18:39:40 intel-core2-32 user.notice kernel: [ 38.559387]
audit: type=1701 audit(1600886380.372:3): auid=4294967295 uid=0 gid=0
ses=4294967295 subj=kernel pid=2582 comm=\"sh\" exe=\"/bin/bash.bash\"
sig=11 res=1
Sep 23 18:39:40 intel-core2-32 user.notice kernel: [ 38.559411]
audit: type=1701 audit(1600886380.372:4): auid=4294967295 uid=0 gid=0
ses=4294967295 subj=kernel pid=2583 comm=\"cat\"
exe=\"/bin/cat.coreutils\" sig=11 res=1
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.560079] Code:
66 0f 7f 5c 3a f0 72 30 66 0f 6f 54 38 10 83 e9 20 66 0f 6f 5c 38 20
66 0f 6f [ 38.598779] Core dump to |/bin/false pipe failed
cb 66 0f 3a 0f da 08 66 0f 3a 0f d4 08 8d 7f 20 <66> 0f 7f 54 3a e0 66
0f 7f 5c 3a f0 73 a0 8d 49 20 01 cf 01 fa[ 38.600281] Core dump to
|/bin/false pipe failed
8d
Sep 23 18:39:40 intel-core2-32 user.notice kernel: [ 38.560811]
audit: type=1701 audit(1600886380.373:5): auid=4294967295 uid=0 gid=0
ses=4294967295 subj=kernel pid=1327 comm=\"systemd-j[ 38.602831]
Core dump to |/bin/false pipe failed
ournal\" exe=\"/li[ 38.603778] Kernel panic - not syncing: Attempted
to kill init! exitcode=0x0000000b
[ 38.603778]
[ 38.604715] CPU: 3 PID: 1 Comm: systemd Not tainted 4.19.147 #1
[ 38.605408] Core dump to |/bin/false pipe failed
b/[sy s t e3m8d./6s0y4s7t1e5m] Hardware name: QEMU Standard PC (i440FX
+ PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 38.604715] Call Trace:
[ 38.604715] dump_stack+0x6e/0x9e
[ 38.604715] panic+0x94/0x1d1
[ 38.608596] do_exit.cold+0x8d/0xa6
[ 38.608596] do_group_exit+0x36/0xa0
[ 38.608596] get_signal+0x131/0x6c0
[ 38.608596] do_signal+0x29/0x5a0
[ 38.608596] ? trace_buffer_unlock_commit_regs+0x54/0x80
d-[j o u rna3ld\"8. s60i8g5=96] ? trace_event_buffer_commit+0x5d/0x1c0
[ 38.608596] ? trace_event_buffer_reserve+0x50/0x80
1[1 r e3s8=1
Sep 23.608596] exit_to_usermode_loop+0x8f/0x100
[ 38.608596] ? __do_page_fault+0x470/0x470
[ 38.608596] prepare_exit_to_usermode+0x57/0x80
[ 38.608596] resume_userspace+0xd/0x12
[ 1 8 :3389.:604805 9i6n]t eElI-P: 0xb7a9fbe3
[ 38.608596] Code: 14 8b 4c 24 10 8b 5c 24 0c b8 72 00 00 00 65 ff
15 10 00 00 00 5b 5e 3d 01 f0 ff ff 0f 83 75 e3 f5 ff c3 66 90 66 90
55 57 56 <53> 83 ec 1c 8b 5c 24 30 8b 4c 24 34 8b 54 24 38 8b 74 24 3c
65 8b
[ 38.617637] EAX: 00000000 EBX: b7efb2e4 ECX: 00000000 EDX: 00000a1d
[ 38.617637] ESI: bfd88100 EDI: bfd88180 EBP: 00000000 ESP: bfd88000
[ 38.617637] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00010246
core2-32 user.info kernel: [ 38.561615] Code: Bad RIP value.
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.564712] Core
dump to |/bin/false pipe failed
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.566144] Core
dump to |/bin/false pipe failed
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.566213] Core
dump to |/bin/false pipe failed
Sep 23 18:39:40 intel-core2-32 user.emerg kernel: [ 38.582519]
systemd[1]: segfault at 1 ip b7de036e sp bfd888e0 error 7 in
libsystemd-shared-237.so[b7cd4000+1e2000]
Sep 23 18:39:40 intel-core2-32 user.emerg kernel: [ 38.584227] Code:
46 18 83 e0 1f 83 c8 20 88 46 18 89 f0 e8 ba da ff ff 85 c0 89 c3 0f
88 e0 00 00 00 8b 44 24 24 31 db 85 c0 74 06 8b 44 24 24 <89> 30 83 c4
0c 89 d8 5b 5e 5f 5d c3 8d b6 00 00 00 00 8d 83 1c 1c
Sep 23 18:39:40 intel-core2-32 user.emerg kernel: [ 38.587783]
systemd[1]: segfault at 0 ip b7a9fbe3 sp bfd88000 error 7 in
libc-2.27.so[b79e5000+1cc000]
Sep 23 18:39:40 intel-core2-32 user.emerg kernel: [ 38.589349] Code:
14 8b 4c 24 10 8b 5c 24 0c b8 72 00 00 00 65 ff 15 10 00 00 00 5b 5e
3d 01 f0 ff ff 0f 83 75 e3 f5 ff c3 66 90 66 90 55 57 56 <53> 83 ec 1c
8b 5c 24 30 8b 4c 24 34 8b 54 24 38 8b 74 24 3c 65 8b
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.598779] Core
dump to |/bin/false pipe failed
Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.600281] Core
dump to |/bin/false pipe failed
[ 38.617637] Kernel Offset: 0x10c00000 from 0xc1000000 (relocation
range: 0xc0000000-0xf77fdfff)
[ 38.617637] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x0000000b
[ 38.617637] ]---
[ 38.622610] ------------[ cut here ]------------
[ 38.622610] sched: Unexpected reschedule of offline CPU#0!
[ 38.622610] WARNING: CPU: 3 PID: 1 at arch/x86/kernel/smp.c:128
native_smp_send_reschedule+0x3b/0x50
[ 38.622610] Modules linked in:
[ 38.622610] CPU: 3 PID: 1 Comm: systemd Not tainted 4.19.147 #1
[ 38.622610] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.12.0-1 04/01/2014
[ 38.622610] EIP: native_smp_send_reschedule+0x3b/0x50
[ 38.622610] Code: 8b 15 20 4a b8 d2 8b 4a 18 ba fd 00 00 00 e8 a4
02 b0 00 c9 c3 8d b4 26 00 00 00 00 8d 76 00 50 68 24 a7 a8 d2 e8 9f
f3 01 00 <0f> 0b 58 5a c9 c3 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
90 0f
[ 38.622610] EAX: 0000002e EBX: f456d800 ECX: f51eab88 EDX: 00000007
[ 38.622610] ESI: d2dfcf00 EDI: 00000000 EBP: f4d1dca4 ESP: f4d1dc9c
[ 38.622610] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010092
[ 38.622610] CR0: 80050033 CR2: 00000000 CR3: 33f02000 CR4: 003406d0
[ 38.622610] Call Trace:
[ 38.622610] try_to_wake_up+0x38b/0x400
[ 38.622610] ? wait_woken+0x70/0x70
[ 38.622610] default_wake_function+0x10/0x20
[ 38.622610] autoremove_wake_function+0x14/0x40
[ 38.622610] __wake_up_common+0x6b/0x130
[ 38.622610] __wake_up_common_lock+0x77/0xb0
[ 38.622610] __wake_up+0x12/0x20
[ 38.622610] wake_up_klogd_work_func+0x66/0x70
[ 38.622610] irq_work_run_list+0x48/0x80
[ 38.622610] ? setup_data_read+0xd0/0xd0
[ 38.622610] irq_work_run+0x25/0x40
[ 38.622610] flush_smp_call_function_queue+0x66/0xe0
[ 38.622610] generic_smp_call_function_single_interrupt+0x12/0x2a
[ 38.622610] smp_call_function_interrupt+0x3a/0xd0
[ 38.622610] call_function_interrupt+0xd7/0xdc
[ 38.622610] EIP: panic+0x194/0x1d1
[ 38.622610] Code: 83 c3 64 eb b7 83 3d 4c d5 e1 d2 00 74 05 e8 e6
f7 01 00 68 80 d5 e1 d2 68 f8 04 a9 d2 e8 47 7c 05 00 e8 42 28 0d 00
fb 31 db <58> 5a 39 fb 7c 18 83 f6 01 8b 15 40 d5 e1 d2 89 f0 e8 31 0d
ae 00
[ 38.622610] EAX: d1ccf04e EBX: 00000000 ECX: 40000000 EDX: 80000000
[ 38.622610] ESI: 00000000 EDI: 00000000 EBP: f4d1de20 ESP: f4d1de08
[ 38.622610] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00000246
[ 38.622610] ? panic+0x191/0x1d1
[ 38.622610] do_exit.cold+0x8d/0xa6
[ 38.622610] do_group_exit+0x36/0xa0
[ 38.622610] get_signal+0x131/0x6c0
[ 38.622610] do_signal+0x29/0x5a0
[ 38.622610] ? trace_buffer_unlock_commit_regs+0x54/0x80
[ 38.622610] ? trace_event_buffer_commit+0x5d/0x1c0
[ 38.622610] ? trace_event_buffer_reserve+0x50/0x80
[ 38.622610] exit_to_usermode_loop+0x8f/0x100
[ 38.622610] ? __do_page_fault+0x470/0x470
[ 38.622610] prepare_exit_to_usermode+0x57/0x80
[ 38.622610] resume_userspace+0xd/0x12
[ 38.622610] EIP: 0xb7a9fbe3
[ 38.622610] Code: 14 8b 4c 24 10 8b 5c 24 0c b8 72 00 00 00 65 ff
15 10 00 00 00 5b 5e 3d 01 f0 ff ff 0f 83 75 e3 f5 ff c3 66 90 66 90
55 57 56 <53> 83 ec 1c 8b 5c 24 30 8b 4c 24 34 8b 54 24 38 8b 74 24 3c
65 8b
[ 38.622610] EAX: 00000000 EBX: b7efb2e4 ECX: 00000000 EDX: 00000a1d
[ 38.622610] ESI: bfd88100 EDI: bfd88180 EBP: 00000000 ESP: bfd88000
[ 38.622610] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00010246
[ 38.622610] ---[ end trace 16b4da0e9552fc5a ]---


Crash log on qemu_x86_64
-------------------------------------

ftrace-stress-test 1 TINFO: Start pid16=2519
/opt/ltp/testcases/bin/ftrace_stress/ftrace_tracing_cpumask.sh
ftrace-stress-test 1 TINFO: Start pid17=2520
/opt/ltp/testcases/bin/ftrace_stress/ftrace_set_ftrace_filter.sh
[ 37.455928] Scheduler tracepoints stat_sleep, stat_iowait,
stat_blocked and stat_runtime require the kernel parameter
schedstats=enable or kernel.sched_schedstats=1
Sep 23 18:30:46 intel-corei7-64 user.warn kernel: [ 37.455928]
Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and
stat_runtime require the kernel parameter schedstats=enable or
kernel.sched_schedstats=1
Sep 23 18:30:46 intel-corei7-64 daemon.info avahi-daemon[2069]:
Withdrawing address record for 10.66.17.129 on enp0s3.
Sep 23 18:30:46 intel-corei7-64 daemon.warn avahi-daemon[2069]: Host
name conflict, retrying with intel-corei7-74
Sep 23 18:30:46 intel-corei7-64 daemon.info avahi-daemon[2069]:
Registering new address record for fe80::1b53:d658:9c7c:5b03 on
enp0s3.*.
Sep 23 18:30:46 intel-corei7-64 daemon.info avahi-daemon[2069]:
Registering new address record for 10.66.17.129 on enp0s3.IPv4.
[ 38.065573] cat[2507]: segfault at 0 ip 00007ff47cb93810 sp
00007fff74060ce8 error 14
[ 38.065833] sh[2519]: segfault at 0 ip 000000000042bc9e sp
00007fffeae3b3c0 error 7
[ 38.065894] in libc-2.27.so[7ff47cad2000+1aa000]
[ 38.067380] cat[2532]: segfault at 0 ip 00007f5565234810 sp
00007fffcf0bb1b8 error 14
[ 38.067698] in bash.bash[400000+f3000]
[ 38.068171] Code: 41 56 45 31 f6 41 55 41 54 41 bc c8 00 00 00 55
53 48 81 ec 68 0e 00 00 c7 05 8a 49 2d 00 00 00 00 00 48 8d ac 24 e0
01 00 00 <66> 89 74 24 50 48 8d 5c 24 50 48 89 6c 24 20 c7 05 99 49 2d
00 fe
[ 38.069425] Code: Bad RIP value.
[ 38.069821] klogd[2071]: segfault at 1ff ip 00007fc2806db17b sp
00007ffed1ee9f10 error 6
[ 38.070702] audit: type=1701 audit(1600885846.865:3):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2519
comm=\"sh\" exe=\"/bin/bash.bash\" sig=11 res=1
[ 38.071592] in libc-2.27.so[7f5565173000+1aa000]
[ 38.072685] in libc-2.27.so[7fc28065c000+1aa000]
[ 38.073082] Code: 48 29 d8 31 f6 48 8d 3c 19 49 39 d5 40 0f 95 c6
48 83 cb 01 48 83 c8 01 49 89 7d 60 48 c1 e6 02 48 89 da 48 09 f2 48
89 51 08 <48> 89 47 08 e9 95 fe ff ff 48 8d 3d 9d af 0f 00 e8 60 ca ff
ff 48
[ 38.075649] audit: type=1701 audit(1600885846.868:4):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2071
comm=\"klogd\" exe=\"/bin/busybox.nosuid\" sig=11 res=1
[ 38.076574] Code: Bad RIP value.
[ 38.079615] sh[2518]: segfault at ff ip 00007f580bee417b sp
00007ffce56b4180 error 6 in libc-2.27.so[7f580be65000+1aa000]
[ 38.087568] Code: 48 29 d8 31 f6 48 8d 3c 19 49 39 d5 40 0f 95 c6
48 83 cb 01 48 83 c8 01 49 89 7d 60 48 c1 e6 02 48 89 da 48 09 f2 48
89 51 08 <48> 89 47 08 e9 95 fe ff ff 48 8d 3d 9d af 0f 00 e8 60 ca ff
ff 48
[ 38.094737] cat[2699]: segfault at 0 ip 00007f1d6e9d6030 sp
00007ffdbad5e380 error 14 in ld-2.27.so[7f1d6e9d5000+25000]
[ 38.097246] Code: Bad RIP value.
[ 38.101982] audit: type=1701 audit(1600885846.898:6):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2532
comm=\"cat\" exe=\"/bin/cat.coreutils\" sig=11 res=1
[ 38.102536] Core dump to |/bin/false pipe failed
[ 38.105603] Core dump to |/bin/false pipe failed
ftrace_stack_max_size.sh: line 26: 2698 Segmentation fault cat
\"$TRACING_PATH\"/stack_max_size > /dev/null
ftrace_trace_pipe.sh: line 36: 2507 Segmentation fault (core
dumped) cat \"$TRACING_PATH\"/trace_pipe > /dev/null
ftrace_trace.sh: line 25: 2532 Segmentation fault (core dumped)
cat \"$TRACING_PATH\"/trace > /dev/null
ftrace_trace_stat.sh: line 44: 2699 Segmentation fault (core
dumped) cat \"$TRACING_PATH\"/trace_stat/function${cpu} > /dev/null
2>&1
Sep 23 18:30:49 intel-corei7-64 daemon.info avahi-daemon[2069]: Server
startup complete. Host name is intel-corei7-74.local. Local service
cookie is 673226982.
Sep 23 18:30:49 intel-corei7-64 daemon.info avahi-daemon[2069]:
Service \"intel-corei7-74\" (/services/ssh.service) successfully
established.
Sep 23 18:30:49 intel-corei7-64 daemon.info avahi-daemon[2069]:
Service \"intel-corei7-74\" (/services/sftp-ssh.service) successfully
established.
[ 48.755576] show_signal_msg: 4 callbacks suppressed
[ 48.755753] sh[2912]: segfault at 1 ip 000000000047098f sp
00007ffe9f95d980 error 4 in bash.bash[400000+f3000]
[ 48.760451] Code: 8b 41 20 8b 41 14 48 89 cf 4c 89 45 a8 44 8b 79
20 44 8b 71 08 89 43 14 e8 be 3b fc ff 4c 8b 45 a8 41 83 fc 13 77 34
44 89 e0 <ff> 24 c5 e0 fb 4b 00 66 2e 0f 1f 84 00 00 00 00 00 44 89 f7
e8 58
[ 48.769218] audit: type=1701 audit(1600885857.562:10):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2912
comm=\"sh\" exe=\"/bin/bash.bash\" sig=11 res=1
[ 48.774574] systemd[1]: segfault at 0 ip 00007fb40e076853 sp
00007ffeaa93ee50 error 7 in
libsystemd-shared-237.so[7fb40df64000+1a3000]
[ 48.779661] Code: 9b 63 05 00 e8 be b8 f2 ff 66 0f 1f 44 00 00 41
f6 44 24 28 1f 0f 85 84 04 00 00 41 8b 06 41 f6 44 24 29 01 0f 85 25
04 00 00 <41> 89 44 24 70 be 01 00 00 00 4c 89 e7 e8 0b b7 ff ff 85 c0
0f 88
[ 48.784452] ftrace_stress_t[2479]: segfault at 0 ip
0000000000446999 sp 00007ffeabb6ab00 error 7 in
bash.bash[400000+f3000]
[ 48.789662] Code: 90 48 8b 1b 48 85 db 0f 84 54 02 00 00 44 3b 6b
08 75 ee 8b 74 24 18 44 89 ef e8 42 eb fe ff 8b 44 24 18 31 d2 3d ff
ff 00 00 <89> 43 0c 0f 94 c2 89 53 10 0f 84 68 02 00 00 8b 7c 24 1c 83
05 e5
[ 48.836666] Core dump to |/bin/false pipe failed
[ 48.897839] systemd-journal[1442]: segfault at 24 ip
00007f2600e10913 sp 00007ffe69ebb798 error 6 in
libc-2.27.so[7f2600cdf000+1aa000]
[ 48.903524] Code: 47 d4 4c 89 4f dc 4c 89 57 e4 4c 89 5f ec 48 89
4f f4 89 57 fc c3 90 4c 8b 4e dc 4c 8b 56 e4 4c 8b 5e ec 48 8b 4e f4
8b 56 fc <4c> 89 4f dc 4c 89 57 e4 4c 89 5f ec 48 89 4f f4 89 57 fc c3
66 0f
[ 48.918479] audit: type=1701 audit(1600885857.686:11):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2914
comm=\"systemd\" exe=\"/lib/systemd/systemd\" sig=11 res=1
[ 48.921093] audit: type=1701 audit(1600885857.705:12):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1442
comm=\"systemd-journal\" exe=\"/lib/systemd/systemd-journald\" sig=11
res=1
[ 48.927551] audit: type=1701 audit(1600885857.723:13):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2479
comm=\"ftrace_stress_t\" exe=\"/bin/bash.bash\" sig=11 res=1
ftrace_stack_trace.sh: line 35: 2912 Segmentation fault cat
\"$TRACING_PATH\"/stack_trace > /dev/null
[ 49.101773] (agetty): 41 output lines suppressed due to ratelimiting
[ 49.382601] systemd[2806]: [email protected]: Failed to
connect stdout to the journal socket, ignoring: Connection refused
Sep 23 18:30:58 intel-corei7-64 authpriv.err agetty[2805]: /dev/ttyS2: not a tty
Sep 23 18:30:58 intel-corei7-64 authpriv.err agetty[2806]: /dev/ttyS1: not a tty
[ 49.707082] audit: type=1701 audit(1600885858.503:14):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2919
comm=\"dynamic_debug01\" exe=\"/bin/bash.bash\" sig=11 res=1
[ 49.707084] ltp-pan[2314]: segfault at 1 ip 00000000004037b0 sp
00007ffeb10df110 error 7 in ltp-pan[400000+7000]
[ 49.711151] Code: e8 c5 e0 ff ff 48 85 c0 0f 85 9a 04 00 00 8b bc
24 d8 00 00 00 e8 80 e0 ff ff 83 7c 24 38 00 74 09 8b 7c 24 30 e8 70
e0 ff ff <44> 89 2b 48 8b 3d 5e 4b 20 00 44 89 ee c7 43 04 00 00 00 00
49 8b
[ 49.715540] Core dump to |/bin/false pipe failed
[ 49.718565] Core dump to |/bin/false pipe failed
[ 49.722630] Core dump to |/bin/false pipe failed
[ 49.805761] Core dump to |/bin/false pipe failed
[ 49.807855] Core dump to |/bin/false pipe failed
[ 49.810211] Core dump to |/bin/false pipe failed
[ 49.812494] Core dump to |/bin/false pipe failed
[ 49.815536] Core dump to |/bin/false pipe failed
[ 49.818732] Core dump to |/bin/false pipe failed
[ 49.822770] Core dump to |/bin/false pipe failed
[ 49.826468] Core dump to |/bin/false pipe failed
[ 49.830480] Core dump to |/bin/false pipe failed
[ 50.651176] avahi-daemon (2069) used greatest stack depth: 12184 bytes left
[ 74.858293] show_signal_msg: 22 callbacks suppressed
[ 74.858310] dbus-daemon[2096]: segfault at 1ff ip 00007f86246c717b
sp 00007ffc0e7b6b40 error 6 in libc-2.27.so[7f8624648000+1aa000]
[ 74.859556] systemd-logind[2074]: Failed to abandon session scope,
ignoring: Connection timed out
[ 74.860273] Code: 48 29 d8 31 f6 48 8d 3c 19 49 39 d5 40 0f 95 c6
48 83 cb 01 48 83 c8 01 49 89 7d 60 48 c1 e6 02 48 89 da 48 09 f2 48
89 51 08 <48> 89 47 08 e9 95 fe ff ff 48 8d 3d 9d af 0f 00 e8 60 ca ff
ff 48
[ 74.871558] audit: type=1701 audit(1600885883.668:28):
auid=4294967295 uid=996 gid=994 ses=4294967295 subj=kernel pid=2096
comm=\"dbus-daemon\" exe=\"/usr/bin/dbus-daemon\" sig=11 res=1
Sep 23 18:31:23 intel-corei7-64 [ 74.877646] NetworkManager[2098]:
segfault at 0 ip 00007fe5d0fb30d5 sp 00007fff3c9695e0 error 7 in
libc-2.27.so[7fe5d0f36000+1aa000]
authpriv.notice polkitd[2161]: Lost the name org[ 74.881756] Code:
4c 0f be 04 0e 4c 3b 05 29 02 33 00 4c 89 c7 73 34 48 83 f9 3f 0f 87
ea 03 00 00 48 8d 04 ce 49 83 c4 10 83 c7 01 48 8b 50 40 <49> 89 14 24
4c 89 60 40 40 88 3c 0e 48 83 c4 48 5b 5d 41 5c 41 5d
.freedesktop.PolicyKit1 - exiting[ 74.887291] audit: type=1701
audit(1600885883.684:29): auid=4294967295 uid=0 gid=0 ses=4294967295
subj=kernel pid=2098 comm=\"NetworkManager\"
exe=\"/usr/sbin/NetworkManager\" sig=11 res=1
[ 74.891439] Core dump to |/bin/false pipe failed
[ 74.896814] NetworkManager (2098) used greatest stack depth: 11912 bytes left
[ 258.966773] isc-worker0000[2169]: segfault at 4 ip 000000000043bc05
sp 00007f59aab33c78 error 7 in dhclient[400000+5b000]
[ 258.969115] Code: 00 00 00 00 00 b8 03 00 06 00 48 85 ff 74 0b 48
85 f6 74 06 48 83 3f 00 74 0b c3 66 2e 0f 1f 84 00 00 00 00 00 48 89
37 31 c0 <83> 46 1c 01 c3 66 0f 1f 44 00 00 e9 cb 7d fc ff 90 66 2e 0f
1f 84
[ 258.972803] audit: type=1701 audit(1600886067.769:30):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2162
comm=\"isc-worker0000\" exe=\"/sbin/dhclient\" sig=11 res=1
[ 258.975070] Core dump to |/bin/false pipe failed
[ 353.027171] kworker/dying (1398) used greatest stack depth: 11904 bytes left


Full test log links,
[1] https://lkft.validation.linaro.org/scheduler/job/1784423#L1656
[2] https://lkft.validation.linaro.org/scheduler/job/1784270#L1582


The git bisect confirms the first bad patch to cause these problems.

---
commit c3bc8fd637a9623f5c507bd18f9677effbddf584
Author: Joel Fernandes (Google) <[email protected]>
Date: Mon Jul 30 15:24:23 2018 -0700

tracing: Centralize preemptirq tracepoints and unify their usage

This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.

Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.

* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
3 users of the events exist:
- Lockdep
- irqsoff and preemptoff tracers
- irqs and preempt trace events

The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.

In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:

PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING
| | \ | |
\ (selects) / \ \ (selects) /
TRACE_PREEMPT_TOGGLE ----> TRACE_IRQFLAGS
\ /
\ (depends on) /
PREEMPTIRQ_TRACEPOINTS

Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.

I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.

This series + lockdep probes not registered (just to inject errors):
[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/12:FAILED|FAILED| ok |
[ 0.000000] sirq-safe-A => hirqs-on/21:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + irqs-on/12:FAILED|FAILED| ok |
[ 0.000000] soft-safe-A + irqs-on/12:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + irqs-on/21:FAILED|FAILED| ok |
[ 0.000000] soft-safe-A + irqs-on/21:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |

With this series + lockdep probes registered, all locking tests pass:

[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/12: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/21: ok | ok | ok |
[ 0.000000] hard-safe-A + irqs-on/12: ok | ok | ok |
[ 0.000000] soft-safe-A + irqs-on/12: ok | ok | ok |
[ 0.000000] hard-safe-A + irqs-on/21: ok | ok | ok |
[ 0.000000] soft-safe-A + irqs-on/21: ok | ok | ok |
[ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |

Link: http://lkml.kernel.org/r/[email protected]

Acked-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Namhyung Kim <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

include/linux/ftrace.h | 11 +-
include/linux/irqflags.h | 11 +-
include/linux/lockdep.h | 8 +-
include/linux/preempt.h | 2 +-
include/trace/events/preemptirq.h | 23 ++--
init/main.c | 5 +-
kernel/locking/lockdep.c | 35 +++---
kernel/sched/core.c | 2 +-
kernel/trace/Kconfig | 22 +++-
kernel/trace/Makefile | 2 +-
kernel/trace/trace_irqsoff.c | 231 ++++++++++----------------------------
kernel/trace/trace_preemptirq.c | 72 ++++++++++++
12 files changed, 195 insertions(+), 229 deletions(-)
create mode 100644 kernel/trace/trace_preemptirq.c

--
Linaro LKFT
https://lkft.linaro.org

2020-09-25 05:17:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

On Fri, Sep 25, 2020 at 10:13:05AM +0530, Naresh Kamboju wrote:
> >From stable rc 4.18.1 onwards to today's stable rc 4.19.147
>
> There are two problems while running LTP tracing tests
> 1) kernel panic on i386, qemu_i386, x86_64 and qemu_x86_64 [1]
> 2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 [2]
> Please refer to the full test logs from below links.
>
> The first bad commit found by git bisect.
> commit: c3bc8fd637a9623f5c507bd18f9677effbddf584
> tracing: Centralize preemptirq tracepoints and unify their usage
>
> Reported-by: Naresh Kamboju <[email protected]>

So this also is reproducable in 5.4 and Linus's tree right now?

Or are newer kernels working fine?

thanks,

greg k-h

2020-09-25 07:28:46

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

On Fri, 25 Sep 2020 at 10:45, Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Fri, Sep 25, 2020 at 10:13:05AM +0530, Naresh Kamboju wrote:
> > >From stable rc 4.18.1 onwards to today's stable rc 4.19.147
> >
> > There are two problems while running LTP tracing tests
> > 1) kernel panic on i386, qemu_i386, x86_64 and qemu_x86_64 [1]
> > 2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 [2]
> > Please refer to the full test logs from below links.
> >
> > The first bad commit found by git bisect.
> > commit: c3bc8fd637a9623f5c507bd18f9677effbddf584
> > tracing: Centralize preemptirq tracepoints and unify their usage
> >
> > Reported-by: Naresh Kamboju <[email protected]>
>
> So this also is reproducable in 5.4 and Linus's tree right now?

No.
The reported issues are not reproducible on 5.4, 5.8 and Linus's tree.

>
> Or are newer kernels working fine?

No.
There are different issues while testing LTP tracing on 5.4, 5.8 and
Linus 's 5.9.

NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out
WARNING: CPU: 1 PID: 331 at net/sched/sch_generic.c:442 dev_watchdog+0x4c7/0x4d0
https://lore.kernel.org/stable/CA+G9fYtS_nAX=sPV8zTTs-nOdpJ4uxk9sqeHOZNuS4WLvBcPGg@mail.gmail.com/

I see this on 5.4, 5.8 and Linus 's 5.9.
rcu: INFO: rcu_sched self-detected stall on CPU
? ftrace_graph_caller+0xc0/0xc0
https://lore.kernel.org/stable/CA+G9fYsdTLRj55_bvod8Sf+0zvK0RRMp5+FeJcOx5oAcAKOGXA@mail.gmail.com/T/#u

>
> thanks,
>
> greg k-h

- Naresh

2020-09-25 14:58:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

On Fri, 25 Sep 2020 12:55:13 +0530
Naresh Kamboju <[email protected]> wrote:

> On Fri, 25 Sep 2020 at 10:45, Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Fri, Sep 25, 2020 at 10:13:05AM +0530, Naresh Kamboju wrote:
> > > >From stable rc 4.18.1 onwards to today's stable rc 4.19.147
> > >
> > > There are two problems while running LTP tracing tests
> > > 1) kernel panic on i386, qemu_i386, x86_64 and qemu_x86_64 [1]
> > > 2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 [2]
> > > Please refer to the full test logs from below links.
> > >
> > > The first bad commit found by git bisect.
> > > commit: c3bc8fd637a9623f5c507bd18f9677effbddf584
> > > tracing: Centralize preemptirq tracepoints and unify their usage
> > >
> > > Reported-by: Naresh Kamboju <[email protected]>
> >
> > So this also is reproducable in 5.4 and Linus's tree right now?
>
> No.
> The reported issues are not reproducible on 5.4, 5.8 and Linus's tree.

The crash looks like its cr3 related, which I believe Peter Zijlstra
did a restructuring of that code to not let it be an issue anymore.
I'll have to look deeper. The rework may be too intrusive to backport,
but we do have other work arounds for this issue if that would be
acceptable for backporting.

>
> >
> > Or are newer kernels working fine?
>
> No.
> There are different issues while testing LTP tracing on 5.4, 5.8 and
> Linus 's 5.9.
>
> NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out
> WARNING: CPU: 1 PID: 331 at net/sched/sch_generic.c:442 dev_watchdog+0x4c7/0x4d0
> https://lore.kernel.org/stable/CA+G9fYtS_nAX=sPV8zTTs-nOdpJ4uxk9sqeHOZNuS4WLvBcPGg@mail.gmail.com/
>
> I see this on 5.4, 5.8 and Linus 's 5.9.
> rcu: INFO: rcu_sched self-detected stall on CPU
> ? ftrace_graph_caller+0xc0/0xc0
> https://lore.kernel.org/stable/CA+G9fYsdTLRj55_bvod8Sf+0zvK0RRMp5+FeJcOx5oAcAKOGXA@mail.gmail.com/T/#u

I've seen that too and couldn't bisect it down to any such commit. I'm
not sure if it is even a bug per-se, because in my test suite, I've
commented out the warning, and the system still remains stable.

-- Steve

2020-09-25 15:01:39

by Steven Rostedt

[permalink] [raw]
Subject: Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

On Fri, 25 Sep 2020 10:54:58 -0400
Steven Rostedt <[email protected]> wrote:


> The crash looks like its cr3 related, which I believe Peter Zijlstra

s/cr3/cr2/

-- Steve


> did a restructuring of that code to not let it be an issue anymore.
> I'll have to look deeper. The rework may be too intrusive to backport,
> but we do have other work arounds for this issue if that would be
> acceptable for backporting.
>

2020-09-25 15:08:34

by Steven Rostedt

[permalink] [raw]
Subject: Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

On Fri, 25 Sep 2020 10:59:14 -0400
Steven Rostedt <[email protected]> wrote:

> On Fri, 25 Sep 2020 10:54:58 -0400
> Steven Rostedt <[email protected]> wrote:
>
>
> > The crash looks like its cr3 related, which I believe Peter Zijlstra
>
> s/cr3/cr2/
>

Specifically, commits:

a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 corruption")
6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent clobbering of saved CR2 value")
b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to do_async_page_fault()")

(which are in 5.4 but not 4.19)

But again, is this too intrusive. There was a workaround that was
original proposed, but Peter didn't want any more band-aids, and did
the restructuring, but as you can see from the two other patches, it
makes it a bit more high risk.

-- Steve

2020-09-25 15:14:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

On Fri, Sep 25, 2020 at 11:07:06AM -0400, Steven Rostedt wrote:
> On Fri, 25 Sep 2020 10:59:14 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > On Fri, 25 Sep 2020 10:54:58 -0400
> > Steven Rostedt <[email protected]> wrote:
> >
> >
> > > The crash looks like its cr3 related, which I believe Peter Zijlstra
> >
> > s/cr3/cr2/
> >
>
> Specifically, commits:
>
> a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 corruption")
> 6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent clobbering of saved CR2 value")
> b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to do_async_page_fault()")
>
> (which are in 5.4 but not 4.19)
>
> But again, is this too intrusive. There was a workaround that was
> original proposed, but Peter didn't want any more band-aids, and did
> the restructuring, but as you can see from the two other patches, it
> makes it a bit more high risk.

If those are known to work, why can't I take them as-is?

thanks,

greg k-h

2020-09-25 15:34:25

by Steven Rostedt

[permalink] [raw]
Subject: Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

On Fri, 25 Sep 2020 17:12:45 +0200
Greg Kroah-Hartman <[email protected]> wrote:

> > Specifically, commits:
> >
> > a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 corruption")
> > 6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent clobbering of saved CR2 value")
> > b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to do_async_page_fault()")
> >
> > (which are in 5.4 but not 4.19)
> >
> > But again, is this too intrusive. There was a workaround that was
> > original proposed, but Peter didn't want any more band-aids, and did
> > the restructuring, but as you can see from the two other patches, it
> > makes it a bit more high risk.
>
> If those are known to work, why can't I take them as-is?

If they apply without tweaks, I say "Go for it" ;-)

My worry is that they may have other unknown dependencies. And I only
looked at what was applied between 4.19 and 5.4 mainline. I haven't
looked at what else may have been backported to fix the above three
commits.

-- Steve

2020-10-05 13:19:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

On Fri, Sep 25, 2020 at 11:30:49AM -0400, Steven Rostedt wrote:
> On Fri, 25 Sep 2020 17:12:45 +0200
> Greg Kroah-Hartman <[email protected]> wrote:
>
> > > Specifically, commits:
> > >
> > > a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 corruption")
> > > 6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent clobbering of saved CR2 value")
> > > b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to do_async_page_fault()")
> > >
> > > (which are in 5.4 but not 4.19)
> > >
> > > But again, is this too intrusive. There was a workaround that was
> > > original proposed, but Peter didn't want any more band-aids, and did
> > > the restructuring, but as you can see from the two other patches, it
> > > makes it a bit more high risk.
> >
> > If those are known to work, why can't I take them as-is?
>
> If they apply without tweaks, I say "Go for it" ;-)
>
> My worry is that they may have other unknown dependencies. And I only
> looked at what was applied between 4.19 and 5.4 mainline. I haven't
> looked at what else may have been backported to fix the above three
> commits.

I tried to backport the above series, and quickly gave up, as yes, you
are right, the dependencies are deep and messy from what I can tell.

WHat's wrong with just moving to 5.4? :)

thanks,

greg k-h