Greeting,
FYI, we noticed the following commit (built with gcc-11):
commit: e55e386993704ea537d58cf0ded57678f70b1a77 ("md/raid5-cache: Add RCU protection to conf->log accesses")
https://github.com/sbates130272/linux-p2pmem.git md-bug
in testcase: mdadm-selftests
version: mdadm-selftests-x86_64-5f41845-1_20220325
with following parameters:
disk: 1HDD
test_prefix: 20
ucode: 0x28
on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz with 16G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 89.177263][ T1008] BUG: sleeping function called from invalid context at block/blk-core.c:781
[ 89.185915][ T1008] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1008, name: md0_raid5
[ 89.194741][ T1008] preempt_count: 1, expected: 0
[ 89.199480][ T1008] CPU: 3 PID: 1008 Comm: md0_raid5 Not tainted 5.18.0-rc3-00094-ge55e38699370 #1
[ 89.208460][ T1008] Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F9 04/21/2015
[ 89.217939][ T1008] Call Trace:
[ 89.221092][ T1008] <TASK>
[ 89.223893][ T1008] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
[ 89.228269][ T1008] __might_resched.cold (kernel/sched/core.c:9740)
[ 89.233322][ T1008] submit_bio_noacct (block/blk-core.c:781)
[ 89.238113][ T1008] ? bio_init (arch/x86/include/asm/atomic.h:41 include/linux/atomic/atomic-instrumented.h:42 block/bio.c:274)
[ 89.242327][ T1008] r5l_flush_stripe_to_raid (drivers/md/raid5-cache.c:1341) raid456
[ 89.248636][ T1008] ? load_balance (kernel/sched/fair.c:10902)
[ 89.253356][ T1008] handle_active_stripes+0x359/0x580 raid456
[ 89.260422][ T1008] ? retry_aligned_read (drivers/md/raid5.c:6403) raid456
[ 89.266369][ T1008] ? _raw_spin_lock_irq (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170)
[ 89.271246][ T1008] ? _raw_spin_lock_bh (kernel/locking/spinlock.c:169)
[ 89.276211][ T1008] ? release_stripe_list (drivers/md/raid5.c:370) raid456
[ 89.282151][ T1008] raid5d (drivers/md/raid5.c:6551) raid456
[ 89.286881][ T1008] ? raid5_do_work (drivers/md/raid5.c:6505) raid456
[ 89.292367][ T1008] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 89.297680][ T1008] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161)
[ 89.303356][ T1008] ? finish_wait (include/linux/list.h:134 include/linux/list.h:206 kernel/sched/wait.c:407)
[ 89.307716][ T1008] md_thread (drivers/md/md.c:7902)
[ 89.311833][ T1008] ? super_90_load (drivers/md/md.c:909)
[ 89.317067][ T1008] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161)
[ 89.322769][ T1008] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414)
[ 89.328543][ T1008] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4))
[ 89.333352][ T1008] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2153 (discriminator 1) kernel/sched/core.c:6462 (discriminator 1))
[ 89.337460][ T1008] ? super_90_load (drivers/md/md.c:909)
[ 89.342719][ T1008] kthread (kernel/kthread.c:376)
[ 89.346671][ T1008] ? kthread_complete_and_exit (kernel/kthread.c:331)
[ 89.352157][ T1008] ret_from_fork (arch/x86/entry/entry_64.S:304)
[ 89.356449][ T1008] </TASK>
[ 89.658538][ T1054] md0: detected capacity change from 107520 to 0
[ 89.664741][ T1054] md: md0 stopped.
[ 90.016999][ T1066] md/raid:md0: device loop2 operational as raid disk 2
[ 90.023748][ T1066] md/raid:md0: device loop1 operational as raid disk 1
[ 90.030558][ T1066] md/raid:md0: device loop0 operational as raid disk 0
[ 90.038551][ T1066] md/raid:md0: raid level 5 active with 3 out of 4 devices, algorithm 2
[ 90.047912][ T1066] md/raid:md0: starting from clean shutdown
[ 90.070777][ T1066] md0: detected capacity change from 0 to 107520
[ 90.104394][ T1076] md: recovery of RAID array md0
[ 92.038319][ T1076] md: md0: recovery done.
[ 92.466320][ T1073] BUG: sleeping function called from invalid context at block/blk-core.c:781
[ 92.474957][ T1073] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1073, name: md0_raid5
[ 92.483819][ T1073] preempt_count: 1, expected: 0
[ 92.488565][ T1073] CPU: 5 PID: 1073 Comm: md0_raid5 Tainted: G W 5.18.0-rc3-00094-ge55e38699370 #1
[ 92.498942][ T1073] Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F9 04/21/2015
[ 92.508452][ T1073] Call Trace:
[ 92.511620][ T1073] <TASK>
[ 92.514421][ T1073] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
[ 92.518805][ T1073] __might_resched.cold (kernel/sched/core.c:9740)
[ 92.523875][ T1073] submit_bio_noacct (block/blk-core.c:781)
[ 92.528677][ T1073] ? bio_init (arch/x86/include/asm/atomic.h:41 include/linux/atomic/atomic-instrumented.h:42 block/bio.c:274)
[ 92.532879][ T1073] r5l_flush_stripe_to_raid (drivers/md/raid5-cache.c:1341) raid456
[ 92.539176][ T1073] handle_active_stripes+0x359/0x580 raid456
[ 92.546245][ T1073] ? retry_aligned_read (drivers/md/raid5.c:6403) raid456
[ 92.552198][ T1073] ? _raw_spin_lock_irq (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170)
[ 92.557113][ T1073] ? blk_start_plug_nr_ios (block/blk-core.c:1240)
[ 92.562443][ T1073] ? _raw_spin_lock_irq (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170)
[ 92.567342][ T1073] ? _raw_spin_lock_bh (kernel/locking/spinlock.c:169)
[ 92.572339][ T1073] ? release_stripe_list (drivers/md/raid5.c:370) raid456
[ 92.578285][ T1073] raid5d (drivers/md/raid5.c:6551) raid456
[ 92.583002][ T1073] ? raid5_do_work (drivers/md/raid5.c:6505) raid456
[ 92.588503][ T1073] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 92.593832][ T1073] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161)
[ 92.599500][ T1073] ? finish_wait (include/linux/list.h:134 include/linux/list.h:206 kernel/sched/wait.c:407)
[ 92.603877][ T1073] md_thread (drivers/md/md.c:7902)
[ 92.607985][ T1073] ? super_90_load (drivers/md/md.c:909)
[ 92.613219][ T1073] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161)
[ 92.618895][ T1073] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414)
[ 92.624658][ T1073] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4))
[ 92.629462][ T1073] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2153 (discriminator 1) kernel/sched/core.c:6462 (discriminator 1))
[ 92.633568][ T1073] ? super_90_load (drivers/md/md.c:909)
[ 92.638803][ T1073] kthread (kernel/kthread.c:376)
[ 92.642747][ T1073] ? kthread_complete_and_exit (kernel/kthread.c:331)
[ 92.648245][ T1073] ret_from_fork (arch/x86/entry/entry_64.S:304)
[ 92.652559][ T1073] </TASK>
[ 92.948929][ T1113] md0: detected capacity change from 107520 to 0
[ 92.955175][ T1113] md: md0 stopped.
[ 93.026649][ T1120] debugfs: Directory 'md0' with parent 'block' already present!
[ 93.034818][ T1120] md: md0 stopped.
[ 93.038880][ T1120] md: md0 stopped.
[ 93.072111][ T1130] md: md0 stopped.
[ 93.130654][ T1135] debugfs: Directory 'md0' with parent 'block' already present!
[ 93.139203][ T1135] md: md0 stopped.
[ 93.147556][ T1135] md/raid:md0: device loop0 operational as raid disk 0
[ 93.154322][ T1135] md/raid:md0: device loop3 operational as raid disk 3
[ 93.161738][ T1135] md/raid:md0: device loop2 operational as raid disk 2
[ 93.168536][ T1135] md/raid:md0: device loop1 operational as raid disk 1
[ 93.169095][ T372] /lkp/benchmarks/mdadm-selftests/tests/20raid5journal... IPMI BMC is not supported on this machine, skip bmc-watchdog setup!
[ 93.175316][ T372]
[ 93.177818][ T1135] md/raid:md0: journal disk is missing, force array readonly
[ 93.197775][ T1135] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2
[ 93.206200][ T1135] md0: detected capacity change from 0 to 107520
[ 93.280028][ T1155] md0: detected capacity change from 107520 to 0
[ 93.286281][ T1155] md: md0 stopped.
[ 93.655138][ T1164] md/raid:md0: device loop2 operational as raid disk 2
[ 93.661910][ T1164] md/raid:md0: device loop1 operational as raid disk 1
[ 93.668660][ T1164] md/raid:md0: device loop0 operational as raid disk 0
[ 93.676532][ T1164] md/raid:md0: raid level 5 active with 3 out of 4 devices, algorithm 2
[ 93.685650][ T1164] md/raid:md0: starting from clean shutdown
[ 93.717900][ T1164] md0: detected capacity change from 0 to 107520
[ 93.784831][ T1174] md: recovery of RAID array md0
[ 96.004310][ T1174] md: md0: recovery done.
[ 96.655923][ T1171] BUG: sleeping function called from invalid context at block/blk-core.c:781
[ 96.664595][ T1171] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1171, name: md0_raid5
[ 96.673412][ T1171] preempt_count: 1, expected: 0
[ 96.678177][ T1171] CPU: 0 PID: 1171 Comm: md0_raid5 Tainted: G W 5.18.0-rc3-00094-ge55e38699370 #1
[ 96.688614][ T1171] Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F9 04/21/2015
[ 96.698108][ T1171] Call Trace:
[ 96.701262][ T1171] <TASK>
[ 96.704070][ T1171] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
[ 96.708439][ T1171] __might_resched.cold (kernel/sched/core.c:9740)
[ 96.713508][ T1171] submit_bio_noacct (block/blk-core.c:781)
[ 96.718319][ T1171] ? bio_init (arch/x86/include/asm/atomic.h:41 include/linux/atomic/atomic-instrumented.h:42 block/bio.c:274)
[ 96.722548][ T1171] r5l_flush_stripe_to_raid (drivers/md/raid5-cache.c:1341) raid456
[ 96.728849][ T1171] ? load_balance (kernel/sched/fair.c:10902)
[ 96.733562][ T1171] handle_active_stripes+0x359/0x580 raid456
[ 96.740637][ T1171] ? retry_aligned_read (drivers/md/raid5.c:6403) raid456
[ 96.746606][ T1171] ? _raw_spin_lock_irq (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170)
[ 96.751503][ T1171] ? _raw_spin_lock_bh (kernel/locking/spinlock.c:169)
[ 96.756487][ T1171] ? release_stripe_list (drivers/md/raid5.c:370) raid456
[ 96.762441][ T1171] raid5d (drivers/md/raid5.c:6551) raid456
[ 96.767174][ T1171] ? raid5_do_work (drivers/md/raid5.c:6505) raid456
[ 96.772676][ T1171] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 96.778005][ T1171] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161)
[ 96.783683][ T1171] ? finish_wait (include/linux/list.h:134 include/linux/list.h:206 kernel/sched/wait.c:407)
[ 96.788068][ T1171] md_thread (drivers/md/md.c:7902)
[ 96.792176][ T1171] ? super_90_load (drivers/md/md.c:909)
[ 96.797419][ T1171] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161)
[ 96.803087][ T1171] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414)
[ 96.808859][ T1171] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4))
[ 96.813678][ T1171] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2153 (discriminator 1) kernel/sched/core.c:6462 (discriminator 1))
[ 96.817786][ T1171] ? super_90_load (drivers/md/md.c:909)
[ 96.823021][ T1171] kthread (kernel/kthread.c:376)
[ 96.826955][ T1171] ? kthread_complete_and_exit (kernel/kthread.c:331)
[ 96.832459][ T1171] ret_from_fork (arch/x86/entry/entry_64.S:304)
[ 96.836740][ T1171] </TASK>
[ 97.142371][ T1219] md0: detected capacity change from 107520 to 0
[ 97.148616][ T1219] md: md0 stopped.
[ 97.303158][ T1236] md/raid:md0: device loop3 operational as raid disk 3
[ 97.309921][ T1236] md/raid:md0: device loop2 operational as raid disk 2
[ 97.316662][ T1236] md/raid:md0: device loop1 operational as raid disk 1
[ 97.323396][ T1236] md/raid:md0: device loop0 operational as raid disk 0
[ 97.331323][ T1236] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2
[ 97.341004][ T1236] md/raid:md0: recovering 0 data-only stripes and 1 data-parity stripes
[ 97.364607][ T1236] md0: detected capacity change from 0 to 107520
[ 97.595426][ T1246] md0: detected capacity change from 107520 to 0
[ 97.601652][ T1246] md: md0 stopped.
[ 97.681071][ T1250] debugfs: Directory 'md0' with parent 'block' already present!
[ 97.735167][ T1262] md/raid:md0: device loop3 operational as raid disk 3
[ 97.741921][ T1262] md/raid:md0: device loop2 operational as raid disk 2
[ 97.748656][ T1262] md/raid:md0: device loop1 operational as raid disk 1
[ 97.755383][ T1262] md/raid:md0: device loop0 operational as raid disk 0
[ 97.763326][ T1262] md/raid:md0: journal disk is missing, force array readonly
[ 97.770602][ T1262] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2
[ 97.778987][ T1262] md0: detected capacity change from 0 to 107520
[ 97.847467][ T1271] md0: detected capacity change from 107520 to 0
[ 97.853688][ T1271] md: md0 stopped.
[ 97.907870][ T372]
[ 97.907881][ T372]
[ 97.913480][ T372] ERROR: dmesg prints errors when testing 20raid5journal!
[ 97.913486][ T372]
[ 97.922918][ T372]
[ 97.922923][ T372]
[ 97.933286][ T372] FAILED - see /var/tmp/20raid5journal.log and /var/tmp/fail20raid5journal.log for details
[ 97.933317][ T372]
[ 99.026901][ T372] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/jobs/scheduled/lkp-hsw-d05/mdadm-selftests-1HDD-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-e55e386993704ea537d58cf0ded57678f70b1a77-20220508-24454-1hrpobn-5.yaml&job_state=post_run -O /dev/null
[ 99.026912][ T372]
[ 100.754729][ T372] kill 580 vmstat --timestamp -n 10
[ 100.754738][ T372]
[ 100.763240][ T372] kill 578 dmesg --follow --decode
[ 100.763247][ T372]
[ 100.774962][ T372] wait for background processes: 587 582 oom-killer meminfo
[ 100.774969][ T372]
[ 106.728036][ T372] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/jobs/scheduled/lkp-hsw-d05/mdadm-selftests-1HDD-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-e55e386993704ea537d58cf0ded57678f70b1a77-20220508-24454-1hrpobn-5.yaml&loadavg=1.00%200.40%200.15%201/190%201421&start_time=1651964881&end_time=1651964947&version=/lkp/lkp/.src-20220507-195649:72233616:be1045ec3& -O /dev/null
[ 106.728046][ T372]
[ 106.779860][ T372] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/jobs/scheduled/lkp-hsw-d05/mdadm-selftests-1HDD-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-e55e386993704ea537d58cf0ded57678f70b1a77-20220508-24454-1hrpobn-5.yaml&job_state=finished -O /dev/null
[ 106.779867][ T372]
[ 106.820412][ T372] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-post-run?job_file=/lkp/jobs/scheduled/lkp-hsw-d05/mdadm-selftests-1HDD-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-e55e386993704ea537d58cf0ded57678f70b1a77-20220508-24454-1hrpobn-5.yaml -O /dev/null
[ 106.820419][ T372]
[ 107.521530][ T372] LKP: stdout: 358: LKP: rebooting forcely
[ 107.521538][ T372]
[ 107.564739][ T372] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-wtmp?tbox_name=lkp-hsw-d05&tbox_state=rebooting -O /dev/null
[ 107.564749][ T372]
[ 109.025846][ T1] kvm: exiting hardware virtualization
[ 109.032774][ T1] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[ 109.038801][ T1] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 109.062244][ T1] e1000e: EEE TX LPI TIMER: 00000011
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://01.org/lkp