2022-07-25 08:43:00

by kernel test robot

[permalink] [raw]
Subject: [sched] 9ae606bc74: WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]



Greeting,

FYI, we noticed the following commit (built with clang-15):

commit: 9ae606bc74dd0e58d4de894e3c5cbb9d45599267 ("sched: Introduce task_cpu_possible_mask() to limit fallback rq selection")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: rcutorture
version:
with following parameters:

runtime: 300s
test: cpuhotplug
torture_type: trivial

test-description: rcutorture is rcutorture kernel module load/unload test.
test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt


on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+-------------------------------------------------------------------------+------------+------------+
| | 304000390f | 9ae606bc74 |
+-------------------------------------------------------------------------+------------+------------+
| WARNING:at_kernel/rcu/rcutorture.c:#synchronize_rcu_trivial[rcutorture] | 120 | 120 |
| RIP:synchronize_rcu_trivial[rcutorture] | 120 | 120 |
| WARNING:at_kernel/rcu/update.c:#rcutorture_sched_setaffinity | 120 | 120 |
| RIP:rcutorture_sched_setaffinity | 120 | 120 |
| WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture] | 0 | 36 |
| RIP:rcu_torture_stats_print[rcutorture] | 0 | 36 |
+-------------------------------------------------------------------------+------------+------------+


please be noted, since 9ae606bc74 is kind of old, we also tested on a latest
mainline commit:
commit 515f71412bb73ebd7f41f90e1684fc80b8730789
Merge: 301c8949322fe cf5029d5dd7cb
Author: Linus Torvalds <[email protected]>
Date: Sat Jul 23 10:22:26 2022 -0700

and confirmed the
WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
still exists.

attached one dmesg dmesg-515f71412bb7.xz
and the config-5.19.0-rc7-00199-g515f71412bb7
FYI


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 99.503911][ T506] trivial-torture: rtc: 00000000dec4ec70 ver: 16769 tfle: 0 rta: 16769 rtaf: 0 rtf: 16760 rtmbe: 0 rtmbkf: 0/0 rtbe: 0 rtbke: 0 rtb
re: 0 rtbf: 0 rtb: 0 nt: 3554 onoff: 270/270:270/271 3,10:2,25 1465:3523 (HZ=300) barrier: 0/0:0 read-exits: 67 nocb-toggles: 0:0
[ 99.507244][ T506] trivial-torture: !!!
[ 99.507339][ T506] ------------[ cut here ]------------
[ 99.512281][ T506] WARNING: CPU: 0 PID: 506 at kernel/rcu/rcutorture.c:1801 rcu_torture_stats_print+0x1025/0x1260 [rcutorture]
[ 99.513884][ T506] Modules linked in: rcutorture torture
[ 99.516444][ T506] CPU: 0 PID: 506 Comm: rcu_torture_sta Tainted: G W 5.13.0-rc6-00093-g9ae606bc74dd #1
[ 99.517918][ T506] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 99.521583][ T506] RIP: 0010:rcu_torture_stats_print+0x1025/0x1260 [rcutorture]
[ 99.523642][ T506] Code: 60 bc 02 00 00 0f 84 96 f6 ff ff 90 0f 0b 90 48 83 3d 6e bc 02 00 00 0f 84 92 f6 ff ff 90 0f 0b 90 84 db 0f 84 8e f6 ff ff 90 <0f> 0b 90 e9 85 f6 ff ff 48 c7 c1 d8 0b 6b 85 80 e1 07 80 c1 03 38
[ 99.526196][ T506] RSP: 0018:ffff888132a67db0 EFLAGS: 00010202
[ 99.526985][ T506] RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffffa0013653
[ 99.528002][ T506] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa003fbc0
[ 99.528971][ T506] RBP: ffff888132a67ec8 R08: dffffc0000000000 R09: 0000000000000003
[ 99.529974][ T506] R10: ffffffffffffffff R11: dffffc0000000001 R12: 0000000000004181
[ 99.531041][ T506] R13: dffffc0000000000 R14: 0000000000000000 R15: ffffffffa0018ea0
[ 99.532077][ T506] FS: 0000000000000000(0000) GS:ffff8883a0400000(0000) knlGS:0000000000000000
[ 99.533182][ T506] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 99.534043][ T506] CR2: 00007f5d92379020 CR3: 000000012cb70000 CR4: 00000000000406b0
[ 99.535043][ T506] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 99.536037][ T506] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 99.536994][ T506] Call Trace:
[ 99.537480][ T506] ? _raw_spin_unlock_irqrestore+0x1/0x40
[ 99.538257][ T506] rcu_torture_stats+0x4e/0x3c0 [rcutorture]
[ 99.539048][ T506] kthread+0x3b4/0x4a0
[ 99.539558][ T506] ? rcu_nocb_toggle+0x310/0x310 [rcutorture]
[ 99.540313][ T506] ? kthread_unuse_mm+0x1b0/0x1b0
[ 99.541027][ T506] ret_from_fork+0x1f/0x30
[ 99.541620][ T506] irq event stamp: 389
[ 99.542151][ T506] hardirqs last enabled at (397): [<ffffffff81359084>] console_unlock+0xa44/0xb00
[ 99.559466][ T506] hardirqs last disabled at (406): [<ffffffff81358800>] console_unlock+0x1c0/0xb00
[ 99.560696][ T506] softirqs last enabled at (190): [<ffffffff81226a38>] __irq_exit_rcu+0x1d8/0x270
[ 99.561828][ T506] softirqs last disabled at (175): [<ffffffff81226a38>] __irq_exit_rcu+0x1d8/0x270
[ 99.563020][ T506] ---[ end trace 878a4da7320b65f4 ]---
[ 99.563855][ T506] Reader Pipe: 35595715 901 1 0 0 0 0 0 0 0 0
[ 99.564762][ T506] trivial-torture: Reader Batch: 35596617 0 0 0 0 0 0 0 0 0 0
[ 99.565821][ T506] trivial-torture: Free-Block Circulation: 16768 16768 16767 16766 16765 16764 16763 16762 16761 16760 0
[ 99.604335][ T516] smpboot: CPU 1 is now offline
[ 99.617526][ T516] x86: Booting SMP configuration:
[ 99.618193][ T516] smpboot: Booting Node 0 Processor 1 APIC 0x1



To reproduce:

# build kernel
cd linux
cp config-5.13.0-rc6-00093-g9ae606bc74dd .config
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (6.45 kB)
config-5.13.0-rc6-00093-g9ae606bc74dd (132.76 kB)
job-script (4.97 kB)
dmesg.xz (68.52 kB)
rcutorture (10.11 kB)
config-5.19.0-rc7-00199-g515f71412bb7 (138.68 kB)
dmesg-515f71412bb7.xz (52.05 kB)
Download all attachments

2022-07-25 09:33:10

by Will Deacon

[permalink] [raw]
Subject: Re: [sched] 9ae606bc74: WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]

On Mon, Jul 25, 2022 at 04:12:57PM +0800, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed the following commit (built with clang-15):
>
> commit: 9ae606bc74dd0e58d4de894e3c5cbb9d45599267 ("sched: Introduce task_cpu_possible_mask() to limit fallback rq selection")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: rcutorture
> version:
> with following parameters:
>
> runtime: 300s
> test: cpuhotplug
> torture_type: trivial
>
> test-description: rcutorture is rcutorture kernel module load/unload test.
> test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
>
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +-------------------------------------------------------------------------+------------+------------+
> | | 304000390f | 9ae606bc74 |
> +-------------------------------------------------------------------------+------------+------------+
> | WARNING:at_kernel/rcu/rcutorture.c:#synchronize_rcu_trivial[rcutorture] | 120 | 120 |
> | RIP:synchronize_rcu_trivial[rcutorture] | 120 | 120 |
> | WARNING:at_kernel/rcu/update.c:#rcutorture_sched_setaffinity | 120 | 120 |
> | RIP:rcutorture_sched_setaffinity | 120 | 120 |
> | WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture] | 0 | 36 |
> | RIP:rcu_torture_stats_print[rcutorture] | 0 | 36 |
> +-------------------------------------------------------------------------+------------+------------+
>
>
> please be noted, since 9ae606bc74 is kind of old, we also tested on a latest
> mainline commit:
> commit 515f71412bb73ebd7f41f90e1684fc80b8730789
> Merge: 301c8949322fe cf5029d5dd7cb
> Author: Linus Torvalds <[email protected]>
> Date: Sat Jul 23 10:22:26 2022 -0700
>
> and confirmed the
> WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
> still exists.

I'm not convinced by the bisection -- that commit should't have any effect
on x86.

Will

2022-07-29 05:55:29

by kernel test robot

[permalink] [raw]
Subject: Re: [sched] 9ae606bc74: WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]

hi Will,

On Mon, Jul 25, 2022 at 10:20:58AM +0100, Will Deacon wrote:
> On Mon, Jul 25, 2022 at 04:12:57PM +0800, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed the following commit (built with clang-15):
> >
> > commit: 9ae606bc74dd0e58d4de894e3c5cbb9d45599267 ("sched: Introduce task_cpu_possible_mask() to limit fallback rq selection")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: rcutorture
> > version:
> > with following parameters:
> >
> > runtime: 300s
> > test: cpuhotplug
> > torture_type: trivial
> >
> > test-description: rcutorture is rcutorture kernel module load/unload test.
> > test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
> >
> >
> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> > +-------------------------------------------------------------------------+------------+------------+
> > | | 304000390f | 9ae606bc74 |
> > +-------------------------------------------------------------------------+------------+------------+
> > | WARNING:at_kernel/rcu/rcutorture.c:#synchronize_rcu_trivial[rcutorture] | 120 | 120 |
> > | RIP:synchronize_rcu_trivial[rcutorture] | 120 | 120 |
> > | WARNING:at_kernel/rcu/update.c:#rcutorture_sched_setaffinity | 120 | 120 |
> > | RIP:rcutorture_sched_setaffinity | 120 | 120 |
> > | WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture] | 0 | 36 |
> > | RIP:rcu_torture_stats_print[rcutorture] | 0 | 36 |
> > +-------------------------------------------------------------------------+------------+------------+
> >
> >
> > please be noted, since 9ae606bc74 is kind of old, we also tested on a latest
> > mainline commit:
> > commit 515f71412bb73ebd7f41f90e1684fc80b8730789
> > Merge: 301c8949322fe cf5029d5dd7cb
> > Author: Linus Torvalds <[email protected]>
> > Date: Sat Jul 23 10:22:26 2022 -0700
> >
> > and confirmed the
> > WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
> > still exists.
>
> I'm not convinced by the bisection -- that commit should't have any effect
> on x86.

Thanks a lot for your information!
we will do some further tests to see if below part could impact x86.
will update you next week. thanks

@@ -3124,9 +3124,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)

/* Look for allowed, online CPU in same node. */
for_each_cpu(dest_cpu, nodemask) {
- if (!cpu_active(dest_cpu))
- continue;
- if (cpumask_test_cpu(dest_cpu, p->cpus_ptr))
+ if (is_cpu_allowed(p, dest_cpu))
return dest_cpu;
}
}


>
> Will

2022-08-07 06:10:55

by kernel test robot

[permalink] [raw]
Subject: Re: [sched] 9ae606bc74: WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]

hi Will,

On Fri, Jul 29, 2022 at 01:18:49PM +0800, Oliver Sang wrote:
> hi Will,
>
> On Mon, Jul 25, 2022 at 10:20:58AM +0100, Will Deacon wrote:
> > On Mon, Jul 25, 2022 at 04:12:57PM +0800, kernel test robot wrote:
> > >
> > >
> > > Greeting,
> > >
> > > FYI, we noticed the following commit (built with clang-15):
> > >
> > > commit: 9ae606bc74dd0e58d4de894e3c5cbb9d45599267 ("sched: Introduce task_cpu_possible_mask() to limit fallback rq selection")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > in testcase: rcutorture
> > > version:
> > > with following parameters:
> > >
> > > runtime: 300s
> > > test: cpuhotplug
> > > torture_type: trivial
> > >
> > > test-description: rcutorture is rcutorture kernel module load/unload test.
> > > test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
> > >
> > >
> > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > >
> > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > >
> > >
> > > +-------------------------------------------------------------------------+------------+------------+
> > > | | 304000390f | 9ae606bc74 |
> > > +-------------------------------------------------------------------------+------------+------------+
> > > | WARNING:at_kernel/rcu/rcutorture.c:#synchronize_rcu_trivial[rcutorture] | 120 | 120 |
> > > | RIP:synchronize_rcu_trivial[rcutorture] | 120 | 120 |
> > > | WARNING:at_kernel/rcu/update.c:#rcutorture_sched_setaffinity | 120 | 120 |
> > > | RIP:rcutorture_sched_setaffinity | 120 | 120 |
> > > | WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture] | 0 | 36 |
> > > | RIP:rcu_torture_stats_print[rcutorture] | 0 | 36 |
> > > +-------------------------------------------------------------------------+------------+------------+
> > >
> > >
> > > please be noted, since 9ae606bc74 is kind of old, we also tested on a latest
> > > mainline commit:
> > > commit 515f71412bb73ebd7f41f90e1684fc80b8730789
> > > Merge: 301c8949322fe cf5029d5dd7cb
> > > Author: Linus Torvalds <[email protected]>
> > > Date: Sat Jul 23 10:22:26 2022 -0700
> > >
> > > and confirmed the
> > > WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
> > > still exists.
> >
> > I'm not convinced by the bisection -- that commit should't have any effect
> > on x86.


recently we updated our clang to version 16 so we rerun this case, then found
the issue also could be reproduced on parent, though the rate seems quite
smaller than this commit.

304000390f88d049 9ae606bc74dd0e58d4de894e3c5
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
3:300 36% 112:300 dmesg.RIP:rcu_torture_stats_print[rcutorture]
300:300 -0% 299:300 dmesg.RIP:rcutorture_sched_setaffinity
300:300 -0% 299:300 dmesg.RIP:synchronize_rcu_trivial[rcutorture]
3:300 36% 112:300 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
300:300 -0% 299:300 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#synchronize_rcu_trivial[rcutorture]
300:300 -0% 299:300 dmesg.WARNING:at_kernel/rcu/update.c:#rcutorture_sched_setaffinity


we also checked the dmesg, confirmed they have same Call Trace and similar
context when the issue reproduced. so this is a false positive.

sorry if this caused any inconvenience.

>
> Thanks a lot for your information!
> we will do some further tests to see if below part could impact x86.
> will update you next week. thanks
>
> @@ -3124,9 +3124,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
>
> /* Look for allowed, online CPU in same node. */
> for_each_cpu(dest_cpu, nodemask) {
> - if (!cpu_active(dest_cpu))
> - continue;
> - if (cpumask_test_cpu(dest_cpu, p->cpus_ptr))
> + if (is_cpu_allowed(p, dest_cpu))
> return dest_cpu;
> }
> }
>
>
> >
> > Will