Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: b031a684bfd01d633c79d281bd0cf11c2f834ada ("printk: remove logbuf_lock writer-protection of ringbuffer")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: rcutorture
version:
with following parameters:
runtime: 300s
test: cpuhotplug
torture_type: tasks
test-description: rcutorture is rcutorture kernel module load/unload test.
test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
+------------------------------------------------+------------+------------+
| | 6b916706f8 | b031a684bf |
+------------------------------------------------+------------+------------+
| boot_successes | 66 | 18 |
| boot_failures | 2 | 26 |
| INFO:rcu_sched_detected_stalls_on_CPUs/tasks | 1 | |
| RIP:enqueue_hrtimer | 1 | |
| RIP:__memset | 1 | |
| RIP:clear_page_rep | 2 | 2 |
| BUG:kernel_hang_in_boot_stage | 2 | |
| INFO:rcu_sched_self-detected_stall_on_CPU | 1 | 4 |
| INFO:rcu_tasks_detected_stalls_on_tasks | 0 | 22 |
| RIP:kernel_init_free_pages | 0 | 1 |
| IP-Config:Auto-configuration_of_network_failed | 0 | 3 |
| RIP:zone_page_state | 0 | 1 |
+------------------------------------------------+------------+------------+
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 952.147986] INFO: rcu_tasks detected stalls on tasks:
[ 952.149313] 000000008d44d6d1: .. nvcsw: 0/0 holdout: 1 idle_cpu: -1/1
[ 952.150866] task:dmesg state:R running task stack: 0 pid: 1771 ppid: 573 flags:0x20020000
[ 952.153385] Call Trace:
[ 952.154090] __schedule (kbuild/src/consumer/kernel/sched/core.c:3779 kbuild/src/consumer/kernel/sched/core.c:4528)
[ 952.154990] ? firmware_map_remove (kbuild/src/consumer/kernel/sched/core.c:4411)
[ 952.156159] ? ksys_read (kbuild/src/consumer/fs/read_write.c:637)
[ 952.157063] schedule (kbuild/src/consumer/include/linux/thread_info.h:84 (discriminator 1) kbuild/src/consumer/include/linux/sched.h:1897 (discriminator 1) kbuild/src/consumer/kernel/sched/core.c:4608 (discriminator 1))
[ 952.157895] exit_to_user_mode_prepare (kbuild/src/consumer/kernel/entry/common.c:160 kbuild/src/consumer/kernel/entry/common.c:191)
[ 952.158984] syscall_exit_to_user_mode (kbuild/src/consumer/arch/x86/include/asm/jump_label.h:41 kbuild/src/consumer/arch/x86/include/asm/nospec-branch.h:288 kbuild/src/consumer/arch/x86/include/asm/entry-common.h:80 kbuild/src/consumer/kernel/entry/common.c:133 kbuild/src/consumer/kernel/entry/common.c:268)
[ 952.160252] entry_INT80_compat (kbuild/src/consumer/arch/x86/entry/entry_64_compat.S:412)
[ 952.161424] RIP: 0023:0xf7eeca02
[ 952.162254] RSP: 002b:00000000fff98e04 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 952.164203] RAX: ffffffffffffffe0 RBX: 0000000000000004 RCX: 0000000056573234
[ 952.165898] RDX: 0000000000001fff RSI: 00000000000001e0 RDI: 0000000056573234
[ 952.167484] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
[ 952.169182] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 952.170897] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 952.189825] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.191987]
[ 352.522192] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.192011]
[ 952.222122] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.224619]
[ 352.539354]
[ 98.566609] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.224648]
[ 952.264449]
[ 352.555366] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.264484]
[ 952.267017] 00000000974cbae1: .. nvcsw: 2/2 holdout: 1 idle_cpu: -1/1
[ 952.268469] task:sed state:R running task stack: 0 pid: 424 ppid: 396 flags:0x20020000
[ 952.270501] Call Trace:
[ 952.270986] __schedule (kbuild/src/consumer/kernel/sched/core.c:3779 kbuild/src/consumer/kernel/sched/core.c:4528)
[ 952.271861] ? firmware_map_remove (kbuild/src/consumer/kernel/sched/core.c:4411)
[ 952.272870] ? ksys_write (kbuild/src/consumer/fs/read_write.c:661)
[ 952.273709] schedule (kbuild/src/consumer/include/linux/thread_info.h:84 (discriminator 1) kbuild/src/consumer/include/linux/sched.h:1897 (discriminator 1) kbuild/src/consumer/kernel/sched/core.c:4608 (discriminator 1))
[ 952.274495] exit_to_user_mode_prepare (kbuild/src/consumer/kernel/entry/common.c:160 kbuild/src/consumer/kernel/entry/common.c:191)
[ 952.275516] syscall_exit_to_user_mode (kbuild/src/consumer/arch/x86/include/asm/jump_label.h:41 kbuild/src/consumer/arch/x86/include/asm/nospec-branch.h:288 kbuild/src/consumer/arch/x86/include/asm/entry-common.h:80 kbuild/src/consumer/kernel/entry/common.c:133 kbuild/src/consumer/kernel/entry/common.c:268)
[ 952.276650] entry_INT80_compat (kbuild/src/consumer/arch/x86/entry/entry_64_compat.S:412)
[ 952.277673] RIP: 0023:0xf7f93a02
[ 952.278497] RSP: 002b:00000000ff8db2f4 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
[ 952.280235] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 000000005666f220
[ 952.281813] RDX: 0000000000000001 RSI: 00000000f7f59d60 RDI: 0000000000000001
[ 952.283347] RBP: 000000005666f220 R08: 0000000000000000 R09: 0000000000000000
[ 952.284919] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 952.286507] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 952.294673] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.328047] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.336658]
[ 352.572231]
[ 98.585525] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.336693]
[ 952.359450]
[ 352.594642] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.359483]
[ 952.377678] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.409955] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.412504]
[ 352.610470]
[ 98.605699] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.412531]
[ 952.444293]
[ 352.625170] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.444328]
[ 952.453669] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.480340]
[ 352.638172]
[ 98.622917] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.480372]
[ 952.521255] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.532279]
[ 352.668363] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.532313]
[ 952.546877] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.589718]
[ 352.679476]
[ 98.638686] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.589794]
[ 952.604011] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.622613]
[ 352.694706] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.622648]
[ 952.631723] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.662230]
[ 352.706582]
[ 98.654747] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.662270]
[ 952.692429] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.700017]
[ 352.738289] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.700056]
[ 952.732376] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.762275] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.764584]
[ 352.740552]
[ 98.670545] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.764616]
[ 952.809608] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 952.811420]
[ 352.764442]
To reproduce:
# build kernel
cd linux
cp config-5.10.0-rc5-gb031a684bfd0 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
Thanks,
Oliver Sang
On Fri 2021-01-22 16:13:11, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: b031a684bfd01d633c79d281bd0cf11c2f834ada ("printk: remove logbuf_lock writer-protection of ringbuffer")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
This commit causes that many printk() callers might add messages to
the new lockless ring buffer in parallel.
The printk() calls are still serialized later when they
try to get access to the console. It happens in
console_trylock_spinning(). There are actually two locks used.
sem->lock and console_owner_lock().
The only exception is printk_deferred() where the console handling
is deferred via irq_work.
> in testcase: rcutorture
> version:
> with following parameters:
>
> runtime: 300s
> test: cpuhotplug
> torture_type: tasks
>
> test-description: rcutorture is rcutorture kernel module load/unload test.
> test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
>
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +------------------------------------------------+------------+------------+
> | | 6b916706f8 | b031a684bf |
> +------------------------------------------------+------------+------------+
> | boot_successes | 66 | 18 |
> | boot_failures | 2 | 26 |
> | INFO:rcu_sched_detected_stalls_on_CPUs/tasks | 1 | |
> | RIP:enqueue_hrtimer | 1 | |
> | RIP:__memset | 1 | |
> | RIP:clear_page_rep | 2 | 2 |
> | BUG:kernel_hang_in_boot_stage | 2 | |
> | INFO:rcu_sched_self-detected_stall_on_CPU | 1 | 4 |
> | INFO:rcu_tasks_detected_stalls_on_tasks | 0 | 22 |
> | RIP:kernel_init_free_pages | 0 | 1 |
> | IP-Config:Auto-configuration_of_network_failed | 0 | 3 |
> | RIP:zone_page_state | 0 | 1 |
> +------------------------------------------------+------------+------------+
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> [ 952.147986] INFO: rcu_tasks detected stalls on tasks:
> [ 952.149313] 000000008d44d6d1: .. nvcsw: 0/0 holdout: 1 idle_cpu: -1/1
^^^^^^^^^^ ^^^^
There are no voluntary context switches. I wonder if some code relied
in on the voluntary context switches done by printk(). But it would
be only printk_deferred() caller.
Do I get it correctly that the task has been running on CPU 1 ?
> [ 952.150866] task:dmesg state:R running task stack: 0 pid: 1771 ppid: 573 flags:0x20020000
^^^^^
It is interesting that it is "dmesg". I guess that it did read the
log. So, it still should be synchronized by lockbuf_lock.
I wonder if it is a hint or just a random victim.
> [ 952.153385] Call Trace:
> [ 952.154090] __schedule (kbuild/src/consumer/kernel/sched/core.c:3779 kbuild/src/consumer/kernel/sched/core.c:4528)
> [ 952.154990] ? firmware_map_remove (kbuild/src/consumer/kernel/sched/core.c:4411)
> [ 952.156159] ? ksys_read (kbuild/src/consumer/fs/read_write.c:637)
> [ 952.157063] schedule (kbuild/src/consumer/include/linux/thread_info.h:84 (discriminator 1) kbuild/src/consumer/include/linux/sched.h:1897 (discriminator 1) kbuild/src/consumer/kernel/sched/core.c:4608 (discriminator 1))
> [ 952.157895] exit_to_user_mode_prepare (kbuild/src/consumer/kernel/entry/common.c:160 kbuild/src/consumer/kernel/entry/common.c:191)
> [ 952.158984] syscall_exit_to_user_mode (kbuild/src/consumer/arch/x86/include/asm/jump_label.h:41 kbuild/src/consumer/arch/x86/include/asm/nospec-branch.h:288 kbuild/src/consumer/arch/x86/include/asm/entry-common.h:80 kbuild/src/consumer/kernel/entry/common.c:133 kbuild/src/consumer/kernel/entry/common.c:268)
> [ 952.160252] entry_INT80_compat (kbuild/src/consumer/arch/x86/entry/entry_64_compat.S:412)
> [ 952.161424] RIP: 0023:0xf7eeca02
> [ 952.162254] RSP: 002b:00000000fff98e04 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
> [ 952.164203] RAX: ffffffffffffffe0 RBX: 0000000000000004 RCX: 0000000056573234
> [ 952.165898] RDX: 0000000000001fff RSI: 00000000000001e0 RDI: 0000000056573234
> [ 952.167484] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
> [ 952.169182] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 952.170897] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 952.189825] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.191987]
> [ 352.522192] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.192011]
> [ 952.222122] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.224619]
> [ 352.539354]
> [ 98.566609] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.224648]
> [ 952.264449]
> [ 352.555366] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.264484]
> [ 952.267017] 00000000974cbae1: .. nvcsw: 2/2 holdout: 1 idle_cpu: -1/1
This task did two voluntary context switches.
It is again on CPU 1. The timestamp of the message is about 0.1 sec
after the previous one. It might suggest that it printed by the same
check_all_holdout_tasks() cycle.
I still have to investigate what holdout means. I wonder which task
actually blocked the CPU and caused the RCU stall.
Might there be a missing RCU unlock somewhere?
> [ 952.268469] task:sed state:R running task stack: 0 pid: 424 ppid: 396 flags:0x20020000
Anyway, this is "sed". It is not directly replated to printk() interface.
> [ 952.270501] Call Trace:
> [ 952.270986] __schedule (kbuild/src/consumer/kernel/sched/core.c:3779 kbuild/src/consumer/kernel/sched/core.c:4528)
> [ 952.271861] ? firmware_map_remove (kbuild/src/consumer/kernel/sched/core.c:4411)
> [ 952.272870] ? ksys_write (kbuild/src/consumer/fs/read_write.c:661)
> [ 952.273709] schedule (kbuild/src/consumer/include/linux/thread_info.h:84 (discriminator 1) kbuild/src/consumer/include/linux/sched.h:1897 (discriminator 1) kbuild/src/consumer/kernel/sched/core.c:4608 (discriminator 1))
> [ 952.274495] exit_to_user_mode_prepare (kbuild/src/consumer/kernel/entry/common.c:160 kbuild/src/consumer/kernel/entry/common.c:191)
> [ 952.275516] syscall_exit_to_user_mode (kbuild/src/consumer/arch/x86/include/asm/jump_label.h:41 kbuild/src/consumer/arch/x86/include/asm/nospec-branch.h:288 kbuild/src/consumer/arch/x86/include/asm/entry-common.h:80 kbuild/src/consumer/kernel/entry/common.c:133 kbuild/src/consumer/kernel/entry/common.c:268)
> [ 952.276650] entry_INT80_compat (kbuild/src/consumer/arch/x86/entry/entry_64_compat.S:412)
> [ 952.277673] RIP: 0023:0xf7f93a02
> [ 952.278497] RSP: 002b:00000000ff8db2f4 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
> [ 952.280235] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 000000005666f220
> [ 952.281813] RDX: 0000000000000001 RSI: 00000000f7f59d60 RDI: 0000000000000001
> [ 952.283347] RBP: 000000005666f220 R08: 0000000000000000 R09: 0000000000000000
> [ 952.284919] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 952.286507] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 952.294673] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.328047] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.336658]
> [ 352.572231]
> [ 98.585525] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.336693]
> [ 952.359450]
> [ 352.594642] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.359483]
> [ 952.377678] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.409955] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.412504]
> [ 352.610470]
> [ 98.605699] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.412531]
> [ 952.444293]
> [ 352.625170] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.444328]
> [ 952.453669] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 952.480340]
> [ 352.638172]
>
> To reproduce:
>
> # build kernel
> cd linux
> cp config-5.10.0-rc5-gb031a684bfd0 .config
> make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
I am still struggling with lkp-tests. I have never used them before
and have troubles to get it working. It asks for a newer qemu at
the moment. It means that I am still not able to reproduce it :-/
I wonder if there is a way to reproduce this without lpk-tests
framework.
Best Regards,
Petr
On (21/01/22 16:13), kernel test robot wrote:
[..]
>
> +------------------------------------------------+------------+------------+
> | | 6b916706f8 | b031a684bf |
> +------------------------------------------------+------------+------------+
> | boot_successes | 66 | 18 |
> | boot_failures | 2 | 26 |
> | INFO:rcu_sched_detected_stalls_on_CPUs/tasks | 1 | |
> | RIP:enqueue_hrtimer | 1 | |
> | RIP:__memset | 1 | |
> | RIP:clear_page_rep | 2 | 2 |
> | BUG:kernel_hang_in_boot_stage | 2 | |
> | INFO:rcu_sched_self-detected_stall_on_CPU | 1 | 4 |
> | INFO:rcu_tasks_detected_stalls_on_tasks | 0 | 22 |
> | RIP:kernel_init_free_pages | 0 | 1 |
> | IP-Config:Auto-configuration_of_network_failed | 0 | 3 |
> | RIP:zone_page_state | 0 | 1 |
> +------------------------------------------------+------------+------------+
A side note:
> # printk and dmesg options
> #
> CONFIG_PRINTK_TIME=y
> # CONFIG_PRINTK_CALLER is not set
> CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
> CONFIG_CONSOLE_LOGLEVEL_QUIET=4
Could you please keep CONFIG_PRINTK_CALLER enabled at all times?
-ss
On (21/01/22 17:21), Petr Mladek wrote:
[..]
> > [ 952.271861] ? firmware_map_remove (kbuild/src/consumer/kernel/sched/core.c:4411)
> > [ 952.272870] ? ksys_write (kbuild/src/consumer/fs/read_write.c:661)
> > [ 952.273709] schedule (kbuild/src/consumer/include/linux/thread_info.h:84 (discriminator 1) kbuild/src/consumer/include/linux/sched.h:1897 (discriminator 1) kbuild/src/consumer/kernel/sched/core.c:4608 (discriminator 1))
> > [ 952.274495] exit_to_user_mode_prepare (kbuild/src/consumer/kernel/entry/common.c:160 kbuild/src/consumer/kernel/entry/common.c:191)
> > [ 952.275516] syscall_exit_to_user_mode (kbuild/src/consumer/arch/x86/include/asm/jump_label.h:41 kbuild/src/consumer/arch/x86/include/asm/nospec-branch.h:288 kbuild/src/consumer/arch/x86/include/asm/entry-common.h:80 kbuild/src/consumer/kernel/entry/common.c:133 kbuild/src/consumer/kernel/entry/common.c:268)
>
> > [ 952.276650] entry_INT80_compat (kbuild/src/consumer/arch/x86/entry/entry_64_compat.S:412)
> > [ 952.277673] RIP: 0023:0xf7f93a02
> > [ 952.278497] RSP: 002b:00000000ff8db2f4 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
> > [ 952.280235] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 000000005666f220
> > [ 952.281813] RDX: 0000000000000001 RSI: 00000000f7f59d60 RDI: 0000000000000001
> > [ 952.283347] RBP: 000000005666f220 R08: 0000000000000000 R09: 0000000000000000
> > [ 952.284919] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > [ 952.286507] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > [ 952.294673] tasks-torture:torture_onoff task: online 0 failed: errno -5
> > [ 952.328047] tasks-torture:torture_onoff task: online 0 failed: errno -5
> > [ 952.336658]
> > [ 352.572231]
> > [ 98.585525] tasks-torture:torture_onoff task: online 0 failed: errno -5
> > [ 952.336693]
> > [ 952.359450]
> > [ 352.594642] tasks-torture:torture_onoff task: online 0 failed: errno -5
> > [ 952.359483]
> > [ 952.377678] tasks-torture:torture_onoff task: online 0 failed: errno -5
> > [ 952.409955] tasks-torture:torture_onoff task: online 0 failed: errno -5
> > [ 952.412504]
> > [ 352.610470]
> > [ 98.605699] tasks-torture:torture_onoff task: online 0 failed: errno -5
> > [ 952.412531]
> > [ 952.444293]
> > [ 352.625170] tasks-torture:torture_onoff task: online 0 failed: errno -5
> > [ 952.444328]
> > [ 952.453669] tasks-torture:torture_onoff task: online 0 failed: errno -5
> > [ 952.480340]
> > [ 352.638172]
> >
> > To reproduce:
> >
> > # build kernel
> > cd linux
> > cp config-5.10.0-rc5-gb031a684bfd0 .config
> > make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
>
> I am still struggling with lkp-tests. I have never used them before
> and have troubles to get it working. It asks for a newer qemu at
> the moment. It means that I am still not able to reproduce it :-/
>
> I wonder if there is a way to reproduce this without lpk-tests
> framework.
AFAIU this is just 'modprobe rcutorture'
-ss
On 2021-01-26, Sergey Senozhatsky <[email protected]> wrote:
> AFAIU this is just 'modprobe rcutorture'
I was finally able to trigger this by using my workstation:
Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
Debian/buster
QEMU 3.1.0 (Debian 1:3.1+dfsg-8+deb10u8)
instead of my build server:
Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Debian/bullseye
QEMU 5.2.0 (Debian 1:5.2+dfsg-3)
I don't know what factors contributed, but I wanted to share my success
in reproducing the reported problem. Instead of using lkp, I am manually
booting with:
$ kvm -cpu host -smp 2 -m 8G -nographic \
-kernel bzImage -append "console=ttyS0,115200 loglevel=6" \
-initrd initrd
My initrd is a minimal busybox rootfs that runs:
# modprobe rcutorture onoff_interval=3 onoff_holdoff=30 torture_type=tasks
(Those are the same modprobe parameters used by the lkp job.)
After about a minute I see:
[ 47.268292] tasks-torture: rcu_torture_read_exit: Start of episode
[ 51.273365] tasks-torture: rcu_torture_read_exit: End of episode
[ 55.823306] smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
[ 55.824350] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 55.830661] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 55.848524] tasks-torture:torture_onoff task: online 0 failed: errno -5
I will start to debug this now.
John Ogness
On 2021-01-27, John Ogness <[email protected]> wrote:
> I was finally able to trigger this by using my workstation
Well, I don't know how reliably I can trigger this. I think my time will
be better spent getting lkp to trigger, since that shows to be reliable.
It took me an hour to trigger this one and I have the feeling I got
lucky with it.
[ 1854.183816] INFO: rcu_tasks detected stalls on tasks:
[ 1854.184502] 0000000036ce200f: .. nvcsw: 0/0 holdout: 1 idle_cpu: -1/1
[ 1854.185234] task:dd state:R running task stack: 0 pid:19891 ppid: 668 flags:0x00000000
[ 1854.185273] Call Trace:
[ 1854.185275] __schedule+0xfa9/0x1056
[ 1854.185283] ? firmware_map_remove+0x172/0x172
[ 1854.185300] ? ksys_write+0x147/0x180
[ 1854.185304] schedule+0x170/0x1e3
[ 1854.185307] exit_to_user_mode_prepare+0x27/0xac
[ 1854.185312] syscall_exit_to_user_mode+0x41/0x4f
[ 1854.185316] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1854.185321] RIP: 0033:0x7f99bf52fc00
[ 1854.185324] RSP: 002b:00007ffd75722bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1854.185328] RAX: 0000000000000001 RBX: 00007f99bfa1b690 RCX: 00007f99bf52fc00
[ 1854.185331] RDX: 0000000000000001 RSI: 0000560056ce3010 RDI: 0000000000000001
[ 1854.185335] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[ 1854.185338] R10: 00007ffd757229c0 R11: 0000000000000246 R12: 0000560056ce3010
[ 1854.185341] R13: 0000000000000001 R14: 0000560056ce3010 R15: 0000560056ce3010
To generate syscalls and interrupts, "dd" was doing:
dd if=/dev/zero bs=1 count=100000000 | nc 10.0.2.2 12345
I'm not even sure how to go about debugging this. The only printk load I
can really see are the:
"tasks-torture:torture_onoff task: online 0 failed: errno -5"
error messages. But they are only occurring at a rate of 40-50 per
second. They are coming from the pr_alert() in kernel/torture.c:174, so
it should still be hitting spinlocks (either from the console driver or
@console_owner_lock).
I will re-focus on reproducing this with lkp.
John Ogness
On (21/01/27 22:28), John Ogness wrote:
>
> # modprobe rcutorture onoff_interval=3 onoff_holdoff=30 torture_type=tasks
>
> (Those are the same modprobe parameters used by the lkp job.)
>
> After about a minute I see:
>
> [ 47.268292] tasks-torture: rcu_torture_read_exit: Start of episode
> [ 51.273365] tasks-torture: rcu_torture_read_exit: End of episode
> [ 55.823306] smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
> [ 55.824350] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 55.830661] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 55.848524] tasks-torture:torture_onoff task: online 0 failed: errno -5
Just out of curious, this is seen only with the printk commit in
question applied? do_boot_cpu() doesn't seem to call a lot of printk-s,
only pr_debug("Setting warm reset code and vector.\n"), as far as I can
tell.
I can't repro this on my arm64 board. Maybe it's too slow. Race condition?
-ss
On 2021-01-28, Sergey Senozhatsky <[email protected]> wrote:
>> # modprobe rcutorture onoff_interval=3 onoff_holdoff=30 torture_type=tasks
>>
>> (Those are the same modprobe parameters used by the lkp job.)
>>
>> After about a minute I see:
>>
>> [ 47.268292] tasks-torture: rcu_torture_read_exit: Start of episode
>> [ 51.273365] tasks-torture: rcu_torture_read_exit: End of episode
>> [ 55.823306] smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
>> [ 55.824350] tasks-torture:torture_onoff task: online 0 failed: errno -5
>> [ 55.830661] tasks-torture:torture_onoff task: online 0 failed: errno -5
>> [ 55.848524] tasks-torture:torture_onoff task: online 0 failed: errno -5
>
> Just out of curious, this is seen only with the printk commit in
> question applied?
No. The error messages are not related to the commit. But they are
really the only thing printk'ing when the rcu stall happens. So you
probably do need to see them in order to reproduce this. (Assuming this
is somehow printk related.)
John Ogness
On Thu 2021-01-28 12:32:46, John Ogness wrote:
> On 2021-01-28, Sergey Senozhatsky <[email protected]> wrote:
> >> # modprobe rcutorture onoff_interval=3 onoff_holdoff=30 torture_type=tasks
> >>
> >> (Those are the same modprobe parameters used by the lkp job.)
> >>
> >> After about a minute I see:
> >>
> >> [ 47.268292] tasks-torture: rcu_torture_read_exit: Start of episode
> >> [ 51.273365] tasks-torture: rcu_torture_read_exit: End of episode
> >> [ 55.823306] smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
> >> [ 55.824350] tasks-torture:torture_onoff task: online 0 failed: errno -5
> >> [ 55.830661] tasks-torture:torture_onoff task: online 0 failed: errno -5
> >> [ 55.848524] tasks-torture:torture_onoff task: online 0 failed: errno -5
> >
> > Just out of curious, this is seen only with the printk commit in
> > question applied?
>
> No. The error messages are not related to the commit. But they are
> really the only thing printk'ing when the rcu stall happens. So you
> probably do need to see them in order to reproduce this. (Assuming this
> is somehow printk related.)
If you are able to reproduce this, it might be interesting to see
backtraces from all CPUs when RCU stall gets detected. Or even printing
all processes, even the sleeping ones.
I am still curious how this could be printk() related.
Best Regards,
Petr
On 2021-01-28, Petr Mladek <[email protected]> wrote:
> If you are able to reproduce this, it might be interesting to see
> backtraces from all CPUs when RCU stall gets detected. Or even
> printing all processes, even the sleeping ones.
OK, I now have lkp running on my intel core i5 machine and it reliably
reproduces this. First let me say a word about lkp. The instructions in
the report email are a bit misleading. There are lots of steps that it
doesn't mention.
First I needed to install lkp and its dependencies. I did all of this as
root because lkp seems to get confused when sudo is involved.
# git clone https://github.com/intel/lkp-tests.git
# cd lkp-tests
# make install
# lkp install
Then I built the kernel and modules using a helper script to setup the
environment for me:
----- BEGIN /tmp/mk.sh -----
#!/bin/sh
export INSTALL_MOD_PATH=`pwd`
export HOSTCC=gcc-9
export CC=gcc-9
export ARCH=x86_64
exec make "$@"
----- END /tmp/mk.sh -----
# cd linux
# git checkout -b lkp b031a684bfd01d633c79d281bd0cf11c2f834ada
# cp /tmp/config-5.10.0-rc5-gb031a684bfd0 .config
# /tmp/mk.sh -j `nproc` olddefconfig prepare modules_prepare bzImage modules modules_install
Then I created the modules.cgz:
# find lib/modules | cpio -H newc -o | gzip -9c > modules.cgz
I put kernel and modules in /tmp/:
# cp arch/x86/boot/bzImage modules.cgz /tmp/
And ran lkp inside script(1) so I have a logfile:
# script
# lkp qemu -k /tmp/bzImage -m /tmp/modules.cgz /tmp/job-script
And reliably I see the first stall at about 925 seconds:
[ 926.386441][ T22] INFO: rcu_tasks detected stalls on tasks:
[ 926.387310][ T22] 0000000041eb5240: .. nvcsw: 0/0 holdout: 1 idle_cpu: -1/1
[ 926.388196][ T22] task:dmesg state:R running task stack: 0 pid: 1752 ppid: 511 flags:0x2002000
0
[ 926.389509][ T22] Call Trace:
[ 926.389962][ T22] __schedule+0xa12/0xab5
[ 926.390534][ T22] ? firmware_map_remove+0xd1/0xd1
[ 926.391190][ T22] ? ksys_read+0x116/0x150
[ 926.391773][ T22] schedule+0x16c/0x1df
[ 926.392317][ T22] exit_to_user_mode_loop+0x28/0x84
[ 926.392977][ T22] exit_to_user_mode_prepare+0x1d/0x4a
[ 926.393671][ T22] syscall_exit_to_user_mode+0x41/0x4f
[ 926.394357][ T22] entry_INT80_compat+0x71/0x76
[ 926.394982][ T22] RIP: 0023:0xf7f0fa02
[ 926.395528][ T22] RSP: 002b:00000000ff850364 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 926.396590][ T22] RAX: ffffffffffffffe0 RBX: 0000000000000004 RCX: 00000000565c8234
[ 926.397607][ T22] RDX: 0000000000001fff RSI: 00000000000001e0 RDI: 00000000565c8234
[ 926.398617][ T22] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
[ 926.399629][ T22] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 926.400643][ T22] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
But actually it is not the rcu stall that interests me so much. I am
more interested in some bizarre console output I am seeing. Here is a
snippet:
[ 903.008486][ T356] [ 778.732883] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.008496][ T356]
[ 903.014733][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.030232][ T356] [ 778.735197] [ 655.203106] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.030242][ T356]
[ 903.040809][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.044327][ T356] [ 778.746898] [ 655.214214] [ 531.584847] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.044337][ T356]
[ 903.048732][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.066039][ T356] [ 778.751162] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.066050][ T356]
[ 903.070123][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.077137][ T356] [ 778.768744] [ 655.227204] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.077147][ T356]
[ 903.092792][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.095313][ T356] [ 778.773836] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.095323][ T356]
[ 903.106788][ T356] [ 778.781780] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.106798][ T356]
[ 903.121986][ T356] [ 778.791183] [ 655.230287] [ 531.589973] [ 407.116952] [ 283.068815] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.121997][ T356]
[ 903.134120][ T356] [ 778.797950] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.134130][ T356]
[ 903.148741][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.151626][ T356] [ 778.805164] [ 655.241173] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.151637][ T356]
[ 903.160126][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.166936][ T356] [ 778.818740] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.166946][ T356]
[ 903.175785][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.185955][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.189448][ T356] [ 778.825864] [ 655.250559] [ 531.607066] [ 407.120936] tasks-torture:torture_onoff task: online 0 failed: errno -5
These multi-timestamp lines are odd. And they are almost exactly 124
seconds apart. What is going on there? The investigation continues...
John Ogness
On Thu 2021-01-28 16:48:29, John Ogness wrote:
> On 2021-01-28, Petr Mladek <[email protected]> wrote:
> > If you are able to reproduce this, it might be interesting to see
> > backtraces from all CPUs when RCU stall gets detected. Or even
> > printing all processes, even the sleeping ones.
>
> OK, I now have lkp running on my intel core i5 machine and it reliably
> reproduces this. First let me say a word about lkp. The instructions in
> the report email are a bit misleading. There are lots of steps that it
> doesn't mention.
>
> First I needed to install lkp and its dependencies. I did all of this as
> root because lkp seems to get confused when sudo is involved.
>
> # git clone https://github.com/intel/lkp-tests.git
> # cd lkp-tests
> # make install
> # lkp install
>
> Then I built the kernel and modules using a helper script to setup the
> environment for me:
>
> ----- BEGIN /tmp/mk.sh -----
> #!/bin/sh
>
> export INSTALL_MOD_PATH=`pwd`
> export HOSTCC=gcc-9
> export CC=gcc-9
> export ARCH=x86_64
>
> exec make "$@"
> ----- END /tmp/mk.sh -----
>
> # cd linux
> # git checkout -b lkp b031a684bfd01d633c79d281bd0cf11c2f834ada
> # cp /tmp/config-5.10.0-rc5-gb031a684bfd0 .config
> # /tmp/mk.sh -j `nproc` olddefconfig prepare modules_prepare bzImage modules modules_install
>
> Then I created the modules.cgz:
>
> # find lib/modules | cpio -H newc -o | gzip -9c > modules.cgz
>
> I put kernel and modules in /tmp/:
>
> # cp arch/x86/boot/bzImage modules.cgz /tmp/
>
> And ran lkp inside script(1) so I have a logfile:
>
> # script
> # lkp qemu -k /tmp/bzImage -m /tmp/modules.cgz /tmp/job-script
>
> And reliably I see the first stall at about 925 seconds:
>
> [ 926.386441][ T22] INFO: rcu_tasks detected stalls on tasks:
> [ 926.387310][ T22] 0000000041eb5240: .. nvcsw: 0/0 holdout: 1 idle_cpu: -1/1
> [ 926.388196][ T22] task:dmesg state:R running task stack: 0 pid: 1752 ppid: 511 flags:0x2002000
> 0
> [ 926.389509][ T22] Call Trace:
> [ 926.389962][ T22] __schedule+0xa12/0xab5
> [ 926.390534][ T22] ? firmware_map_remove+0xd1/0xd1
> [ 926.391190][ T22] ? ksys_read+0x116/0x150
> [ 926.391773][ T22] schedule+0x16c/0x1df
> [ 926.392317][ T22] exit_to_user_mode_loop+0x28/0x84
> [ 926.392977][ T22] exit_to_user_mode_prepare+0x1d/0x4a
> [ 926.393671][ T22] syscall_exit_to_user_mode+0x41/0x4f
> [ 926.394357][ T22] entry_INT80_compat+0x71/0x76
> [ 926.394982][ T22] RIP: 0023:0xf7f0fa02
> [ 926.395528][ T22] RSP: 002b:00000000ff850364 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
> [ 926.396590][ T22] RAX: ffffffffffffffe0 RBX: 0000000000000004 RCX: 00000000565c8234
> [ 926.397607][ T22] RDX: 0000000000001fff RSI: 00000000000001e0 RDI: 00000000565c8234
> [ 926.398617][ T22] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
> [ 926.399629][ T22] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 926.400643][ T22] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>
> But actually it is not the rcu stall that interests me so much. I am
> more interested in some bizarre console output I am seeing. Here is a
> snippet:
>
> [ 903.008486][ T356] [ 778.732883] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.008496][ T356]
> [ 903.014733][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.030232][ T356] [ 778.735197] [ 655.203106] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.030242][ T356]
> [ 903.040809][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.044327][ T356] [ 778.746898] [ 655.214214] [ 531.584847] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.044337][ T356]
> [ 903.048732][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.066039][ T356] [ 778.751162] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.066050][ T356]
> [ 903.070123][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.077137][ T356] [ 778.768744] [ 655.227204] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.077147][ T356]
> [ 903.092792][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.095313][ T356] [ 778.773836] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.095323][ T356]
> [ 903.106788][ T356] [ 778.781780] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.106798][ T356]
> [ 903.121986][ T356] [ 778.791183] [ 655.230287] [ 531.589973] [ 407.116952] [ 283.068815] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.121997][ T356]
> [ 903.134120][ T356] [ 778.797950] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.134130][ T356]
> [ 903.148741][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.151626][ T356] [ 778.805164] [ 655.241173] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.151637][ T356]
> [ 903.160126][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.166936][ T356] [ 778.818740] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.166946][ T356]
> [ 903.175785][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.185955][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
> [ 903.189448][ T356] [ 778.825864] [ 655.250559] [ 531.607066] [ 407.120936] tasks-torture:torture_onoff task: online 0 failed: errno -5
>
> These multi-timestamp lines are odd. And they are almost exactly 124
> seconds apart.
Yup, this looks suspicious.
Here are my few ideas. Feel free to ignore them and do your own
debugging. I do not want to distract you.
The test produces the same messages over and over again. It is
possible that something has overflown after some amount of messages.
And the regular intervals are just a coincidence.
Are all messages broken this way?
Or does it start later?
Are this exact messages printed correctly at some points?
Is this console output? Or dmesg?
Are you able to reproduce it with the current Linus's master.
I wonder if it is something that we have already fixed.
Best Regards,
Petr
On 2021-01-28, Petr Mladek <[email protected]> wrote:
>> [ 903.189448][ T356] [ 778.825864] [ 655.250559] [ 531.607066] [ 407.120936] tasks-torture:torture_onoff task: online 0 failed: errno -5
>>
>> These multi-timestamp lines are odd. And they are almost exactly 124
>> seconds apart.
>
> Yup, this looks suspicious.
>
> Here are my few ideas. Feel free to ignore them and do your own
> debugging. I do not want to distract you.
>
> The test produces the same messages over and over again. It is
> possible that something has overflown after some amount of messages.
> And the regular intervals are just a coincidence.
>
> Are all messages broken this way? Or does it start later?
They are all OK until about 325 seconds. From the console output I see
that a couple seconds before there is:
[ 324.855442] 2021-01-28 17:26:24 rmmod rcutorture
> Are this exact messages printed correctly at some points?
Yes. Although once the multi-timestamps show up, there are empty lines
after the correct printing. These have a different timestamp, so I am
thinking they are probably failed prb_reserve() descriptors.
[ 324.817659] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 324.826111] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 324.855442] 2021-01-28 17:26:24 rmmod rcutorture
[ 324.855454]
[ 324.867910] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 324.875689] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 324.886632] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 324.894394] tasks-torture:torture_onoff task: online 0 failed: errno -5
...
[ 325.270579] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 325.278558] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 325.292705] [ 167.091786] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 325.292716]
[ 325.298241] [ 167.098621] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 325.298251]
[ 325.302613] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 325.319995] [ 167.105552] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 325.320006]
[ 325.325588] [ 167.120620] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 325.325597]
[ 325.335480] [ 167.131624] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 325.335490]
[ 325.346576] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 325.357828] [ 167.138722] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 325.357840]
[ 325.362434] [ 167.157591] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 325.362445]
> Is this console output? Or dmesg?
Console output. I still need to figure out how to get at dmesg from
lkp. In particular, the output from /dev/kmsg would be quite
interesting.
> Are you able to reproduce it with the current Linus's master.
> I wonder if it is something that we have already fixed.
Unfortunately when I switch to Linus's master, the error messages go
away. So it probably won't trigger.
However, I did forward port all printk changes and I am still seeing the
same behavior.
I think once I get at /dev/kmsg, there should be some clarity.
John Ogness
On 2021-01-28, John Ogness <[email protected]> wrote:
> [ 903.189448][ T356] [ 778.825864] [ 655.250559] [ 531.607066] [ 407.120936] tasks-torture:torture_onoff task: online 0 failed: errno -5
So I at least found out what these multi-timestamp messages are (thanks
to reading /dev/kmsg). lkp is directing all its test output to
/dev/kmsg. This is why we see messages like:
2021-01-28 21:15:15 rmmod rcutorture
The final line of its test is "dmesg | grep torture:". So it is dumping
dmesg output into /dev/kmsg! This causes the timestamp to be included in
the text of the new messages so we later see 2 timestamps. After 2
minutes it has reached the messages it fed into /dev/kmsg and feeds them
in again (thus a 3rd timestamp).
Here is a snippet from /dev/kmsg when it starts feeding dmesg into
/dev/kmsg. If the first number is >= 8, it is coming from userspace.
1,24066,323390711,-;tasks-torture:torture_onoff task: online 0 failed: errno -5
1,24067,323444452,-;tasks-torture:torture_onoff task: online 0 failed: errno -5
1,24068,323461363,-;tasks-torture:torture_onoff task: online 0 failed: errno -5
13,24069,323469754,-;2021-01-28 21:15:15 rmmod rcutorture
12,24070,323469775,-;
1,24071,323481652,-;tasks-torture:torture_onoff task: online 0 failed: errno -5
1,24072,323495879,-;tasks-torture:torture_onoff task: online 0 failed: errno -5
1,24073,323503276,-;tasks-torture:torture_onoff task: online 0 failed: errno -5
1,24074,323510766,-;tasks-torture:torture_onoff task: online 0 failed: errno -5
1,24075,323519658,-;tasks-torture:torture_onoff task: online 0 failed: errno -5
13,24076,323522146,-;[ 158.513122] tasks-torture:torture_onoff task: online 0 failed: errno -5
12,24077,323522157,-;
13,24078,323526545,-;[ 158.519672] tasks-torture:torture_onoff task: online 0 failed: errno -5
12,24079,323526556,-;
13,24080,323529365,-;[ 158.534695] tasks-torture:torture_onoff task: online 0 failed: errno -5
12,24081,323529374,-;
13,24082,323531951,-;[ 158.549755] tasks-torture:torture_onoff task: online 0 failed: errno -5
12,24083,323531961,-;
I don't know why it is doing the message text with loglevel 5 (NOTICE)
followed by a blank line with loglevel 4 (WARNING), but they are
definitely coming from userspace.
Anyway, enough with that madness. I now will return my focus to the rcu
stall, and see how printk could be responsible.
John Ogness
On 2021-01-22, kernel test robot <[email protected]> wrote:
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: b031a684bfd01d633c79d281bd0cf11c2f834ada ("printk: remove logbuf_lock writer-protection of ringbuffer")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
I am still investigating this in my spare cycles. I have tried to reduce
the changes to determine what is causing the problem.
For example, instead of applying the problematic commit, I can reproduce
the problem with:
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 150bfde41ba1..17a92a76cae2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1943,7 +1943,10 @@ int vprintk_store(int facility, int level,
* prb_reserve_in_last() and prb_reserve() purposely invalidate the
* structure when they fail.
*/
- prb_rec_init_wr(&r, text_len);
+ if (text_len < 200)
+ prb_rec_init_wr(&r, 200);
+ else
+ prb_rec_init_wr(&r, text_len);
if (!prb_reserve(&e, prb, &r)) {
/* truncate the message if it is too long for empty buffer */
truncate_msg(&text_len, &trunc_msg_len);
which simply makes sure at least 200-byte datablocks are allocated. This
totally different change causes the _exact_ same rcu stall
behavior/pattern as the problematic commit that is removing spinlock
usage. That pattern is:
1. printk() is getting called about 65 times a second with:
"tasks-torture:torture_onoff task: online 0 failed: errno -5"
2. About 60 seconds after "rmmod rcutorture" is called, the following
ruc warning appears:
[ 390.440235] ??? Writer stall state RTWS_SYNC(7) g0 f0x0 ->state 0x2 cpu 1
[ 390.441250] task:rcu_torture_wri state:D stack: 0 pid: 505 ppid: 2 flags:0x00004000
[ 390.442473] Call Trace:
[ 390.442938] __schedule+0xa12/0xab5
[ 390.443495] ? firmware_map_remove+0xd1/0xd1
[ 390.444151] ? ttwu_do_wakeup+0x1d/0x2dd
[ 390.444764] schedule+0x16c/0x1df
[ 390.445311] schedule_timeout+0x77/0x1b4
[ 390.445911] ? usleep_range+0x130/0x130
[ 390.446522] ? do_raw_spin_lock+0x74/0xbf
[ 390.447180] ? queued_read_unlock+0x19/0x19
[ 390.447825] ? __prepare_to_swait+0xa4/0x141
[ 390.448491] __wait_for_common+0x22d/0x337
[ 390.449139] ? usleep_range+0x130/0x130
[ 390.450860] ? _raw_spin_lock_irqsave+0x81/0xd1
[ 390.451543] ? bit_wait_io_timeout+0x8a/0x8a
[ 390.452196] __wait_rcu_gp+0x20d/0x229
[ 390.452788] synchronize_rcu_tasks_generic.isra.0+0x7e/0xb2
[ 390.453604] ? rcu_tasks_rude_wait_gp+0x3d/0x3d
[ 390.454334] ? call_rcu_tasks_generic+0x134/0x134
[ 390.455055] ? rcu_torture_alloc+0x50/0xbc [rcutorture]
[ 390.455847] ? rcu_unexpedite_gp+0x19/0x19
[ 390.456488] ? delay_tsc+0x41/0x7c
[ 390.457053] rcu_torture_writer+0x815/0xaea [rcutorture]
[ 390.457889] ? rcu_torture_pipe_update+0x100/0x100 [rcutorture]
[ 390.458801] ? test_bit+0x15/0x1f
[ 390.459349] ? __kthread_parkme+0xb2/0x142
[ 390.459968] kthread+0x2ff/0x30e
[ 390.460539] ? rcu_torture_pipe_update+0x100/0x100 [rcutorture]
[ 390.461408] ? kthread_create_worker_on_cpu+0xce/0xce
[ 390.462171] ret_from_fork+0x22/0x30
3. About 9 minutes later, 2 stalls are reported:
[ 926.880657] INFO: rcu_tasks detected stalls on tasks:
[ 926.881387] 000000003a288f24: .. nvcsw: 0/0 holdout: 1 idle_cpu: -1/1
[ 926.882313] task:dmesg state:R running task stack: 0 pid: 1717 ppid: 497 flags:0x20020000
[ 926.883727] Call Trace:
[ 926.884148] __schedule+0xa12/0xab5
[ 926.884706] ? firmware_map_remove+0xd1/0xd1
[ 926.885355] ? ksys_read+0x116/0x150
[ 926.885930] schedule+0x16c/0x1df
[ 926.886458] exit_to_user_mode_loop+0x28/0x84
[ 926.887123] exit_to_user_mode_prepare+0x1d/0x4a
[ 926.887823] syscall_exit_to_user_mode+0x41/0x4f
[ 926.888513] entry_INT80_compat+0x71/0x76
[ 926.889131] RIP: 0023:0xf7fcca02
[ 926.889658] RSP: 002b:00000000ffd230c4 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 926.890780] RAX: 0000000000000050 RBX: 0000000000000004 RCX: 0000000056601234
[ 926.891720] RDX: 0000000000001fff RSI: 00000000000001e0 RDI: 0000000056601234
[ 926.892724] RBP: 00000000ffd231d8 R08: 0000000000000000 R09: 0000000000000000
[ 926.893733] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 926.894745] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
and
[ 926.931950] 00000000a4109e4e: .. nvcsw: 21326/21326 holdout: 1 idle_cpu: -1/1
[ 926.932883] task:dmesg state:R running task stack: 0 pid: 447 ppid: 442 flags:0x20020000
[ 926.934286] Call Trace:
[ 926.934714] __schedule+0xa12/0xab5
[ 926.935268] ? firmware_map_remove+0xd1/0xd1
[ 926.935919] ? ksys_read+0x116/0x150
[ 926.936478] schedule+0x16c/0x1df
[ 926.937012] exit_to_user_mode_loop+0x28/0x84
[ 926.937674] exit_to_user_mode_prepare+0x1d/0x4a
[ 926.938363] syscall_exit_to_user_mode+0x41/0x4f
[ 926.939059] entry_INT80_compat+0x71/0x76
[ 926.939680] RIP: 0023:0xf7f9da02
[ 926.940193] RSP: 002b:00000000ffdb2864 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 926.941301] RAX: ffffffffffffffe0 RBX: 0000000000000003 RCX: 0000000056659234
[ 926.942307] RDX: 0000000000001fff RSI: 00000000000001e0 RDI: 0000000056659234
[ 926.943312] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
[ 926.944313] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 926.945314] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
This pattern is _always_ the same (using either my simple change or with
the problematic commit applied). Obviously the removal of the spinlock
usage is not the issue. But I am concerned that the ringbuffer is
somehow involved. I have tried to reproduce this problem doing
non-ringbuffer activity, but have not had success.
Also, the problem disappears if a newer kernel is used. So maybe there
was something fixed in rcu or an rcu user. But still, it is very odd
that the ringbuffer is triggering it.
I will continue investigating this.
Also, I plan to send a patch to lkp so that the test script is not
doing:
dmesg > /dev/kmsg
Although this may be a great test for printk, for rcutorture it would be
more appropriate to do something like:
dmesg > /tmpfile
cat /tmpfile > /dev/kmsg
to avoid the endless read/feed cycle.
John Ogness
On 2021-01-22, kernel test robot <[email protected]> wrote:
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: b031a684bfd01d633c79d281bd0cf11c2f834ada ("printk: remove logbuf_lock writer-protection of ringbuffer")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
So I finally tracked down the problem. And yes, it is a problem with the
ringbuffer. And it turns out this is the same problem reported here [0].
If message sizes average larger than expected (more than 32 characters),
the data_ring will wrap before the desc_ring. Once the data_ring wraps,
it will start invalidating descriptors. These invalid descriptors hang
around until they are eventually recycled (when the desc_ring
wraps). Readers do not care about invalid descriptors, but they still
have to iterate past them. If the average message size is much larger
than 32 characters, then there will be many invalid descriptors
preceeding the valid descriptors.
For this particular LKP report, the RCU stalls started happening as the
number of invalid descriptors approached 17000. The reason this causes a
problem is because of the function prb_first_valid_seq(). It starts at
the oldest descriptor and searches to find the oldest _valid_
descriptor. In this case, it had to iterate past 17000 descriptors every
time.
prb_first_valid_seq() is used in devkmsg_read() and in
devkmsg_poll(). And worse, it is called with local interrupts disabled
and logbuf_lock locked.
The solution is to avoid using prb_first_valid_seq() if possible. And
indeed, in both of these cases it is a simple change:
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c8847ee571f0..76e8df20fdf9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -736,9 +736,9 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
logbuf_lock_irq();
}
- if (user->seq < prb_first_valid_seq(prb)) {
+ if (r->info->seq != user->seq) {
/* our last seen message is gone, return error and reset */
- user->seq = prb_first_valid_seq(prb);
+ user->seq = r->info->seq;
ret = -EPIPE;
logbuf_unlock_irq();
goto out;
@@ -813,6 +813,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
{
struct devkmsg_user *user = file->private_data;
+ struct printk_info info;
__poll_t ret = 0;
if (!user)
@@ -821,9 +822,9 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
poll_wait(file, &log_wait, wait);
logbuf_lock_irq();
- if (prb_read_valid(prb, user->seq, NULL)) {
+ if (prb_read_valid_info(prb, user->seq, &info, NULL)) {
/* return error when data has vanished underneath us */
- if (user->seq < prb_first_valid_seq(prb))
+ if (info.seq != user->seq)
ret = EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI;
else
ret = EPOLLIN|EPOLLRDNORM;
Once logbuf_lock and safe buffer usage is removed, this efficiency
wouldn't matter to the kernel anyway. But I am glad we hit it while it
still mattered because we should not be carelessly wasting CPU cycles
for any task.
Interestingly enough, LTP reported a problem with this code back in July
2020. The "invalid descriptor issue" was clearly stated [1] and Petr
even made a suggestion [2] which is nearly identical to how I propose to
fix it here.
prb_first_valid_seq() is used unnecessarily in some syslog and devkmsg
locations as well. And prb_first_valid_seq() itself can also be slightly
improved.
I am preparing a patch against linux-next for this. And although the
current situation is not pretty, I do not think it needs to be rushed
for 5.11. It is an inefficiency that occurs if the average message size
greatly exceeds 32 bytes and the ringbuffer is being blasted by new
messages and userspace is reading the ringbuffer.
John Ogness
[0] https://lkml.kernel.org/r/[email protected]
[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/20200709105906.GC11164@alley
On Thu 2021-02-04 22:38:02, John Ogness wrote:
> On 2021-01-22, kernel test robot <[email protected]> wrote:
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: b031a684bfd01d633c79d281bd0cf11c2f834ada ("printk: remove logbuf_lock writer-protection of ringbuffer")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> So I finally tracked down the problem. And yes, it is a problem with the
> ringbuffer. And it turns out this is the same problem reported here [0].
Uff. Great work!
> If message sizes average larger than expected (more than 32 characters),
> the data_ring will wrap before the desc_ring. Once the data_ring wraps,
> it will start invalidating descriptors. These invalid descriptors hang
> around until they are eventually recycled (when the desc_ring
> wraps). Readers do not care about invalid descriptors, but they still
> have to iterate past them. If the average message size is much larger
> than 32 characters, then there will be many invalid descriptors
> preceeding the valid descriptors.
>
> For this particular LKP report, the RCU stalls started happening as the
> number of invalid descriptors approached 17000. The reason this causes a
> problem is because of the function prb_first_valid_seq(). It starts at
> the oldest descriptor and searches to find the oldest _valid_
> descriptor. In this case, it had to iterate past 17000 descriptors every
> time.
Sigh. Good to know that iterating over these many descriptors has
such a cost.
> prb_first_valid_seq() is used in devkmsg_read() and in
> devkmsg_poll(). And worse, it is called with local interrupts disabled
> and logbuf_lock locked.
>
> The solution is to avoid using prb_first_valid_seq() if possible. And
> indeed, in both of these cases it is a simple change:
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c8847ee571f0..76e8df20fdf9 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -736,9 +736,9 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
> logbuf_lock_irq();
> }
>
> - if (user->seq < prb_first_valid_seq(prb)) {
> + if (r->info->seq != user->seq) {
NOTE: This made me to realize one thing. We return -EPIPE even when
prb_reserve() was able to reserve the desriptor but not the data
space. It means that the message was lost already when it was
being stored.
It makes sense. It seems that we handle correctly even the situation
when empty line is stored. But it is something less obvious.
And we should keep it in mind.
> /* our last seen message is gone, return error and reset */
> - user->seq = prb_first_valid_seq(prb);
> + user->seq = r->info->seq;
> ret = -EPIPE;
> logbuf_unlock_irq();
> goto out;
> @@ -813,6 +813,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
> static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
> {
> struct devkmsg_user *user = file->private_data;
> + struct printk_info info;
> __poll_t ret = 0;
>
> if (!user)
> @@ -821,9 +822,9 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
> poll_wait(file, &log_wait, wait);
>
> logbuf_lock_irq();
> - if (prb_read_valid(prb, user->seq, NULL)) {
> + if (prb_read_valid_info(prb, user->seq, &info, NULL)) {
> /* return error when data has vanished underneath us */
> - if (user->seq < prb_first_valid_seq(prb))
> + if (info.seq != user->seq)
> ret = EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI;
> else
> ret = EPOLLIN|EPOLLRDNORM;
>
The change looks good to me. Please, send it as a proper patch.
> Once logbuf_lock and safe buffer usage is removed, this efficiency
> wouldn't matter to the kernel anyway. But I am glad we hit it while it
> still mattered because we should not be carelessly wasting CPU cycles
> for any task.
+1
> Interestingly enough, LTP reported a problem with this code back in July
> 2020. The "invalid descriptor issue" was clearly stated [1] and Petr
> even made a suggestion [2] which is nearly identical to how I propose to
> fix it here.
>
> prb_first_valid_seq() is used unnecessarily in some syslog and devkmsg
> locations as well. And prb_first_valid_seq() itself can also be slightly
> improved.
I am curious what idea you have ;-) I wonder if it might help to cache
that last known first_valid_seq, e.g. in struct prb_desc_ring.
But it might be tricky because it must be lockless and it is a 64-bit
value. It might be solved by some lazy approach, e.g. trylock.
But there is still the question who should pay the prize for this.
IMHO, it should be on the reader side.
> I am preparing a patch against linux-next for this. And although the
> current situation is not pretty, I do not think it needs to be rushed
> for 5.11. It is an inefficiency that occurs if the average message size
> greatly exceeds 32 bytes and the ringbuffer is being blasted by new
> messages and userspace is reading the ringbuffer.
I agree. 5.11 release is getting close. And it might go in via stable
quite quickly.
Best Regards,
Petr