LinuxLists.cc - [syzbot] [kernel?] WARNING in c

2023-03-10 00:42:52

Subject: [syzbot] [kernel?] WARNING in c_start (2)

Hello,

syzbot found the following issue on:

HEAD commit: 44889ba56cbb Merge tag 'net-6.3-rc2' of git://git.kernel.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13933824c80000
kernel config: https://syzkaller.appspot.com/x/.config?x=d677dcd743eef09
dashboard link: https://syzkaller.appspot.com/bug?extid=96cae094d90877641f32
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

------------[ cut here ]------------
WARNING: CPU: 3 PID: 5136 at include/linux/cpumask.h:143 cpu_max_bits_warn include/linux/cpumask.h:143 [inline]
WARNING: CPU: 3 PID: 5136 at include/linux/cpumask.h:143 cpumask_check include/linux/cpumask.h:150 [inline]
WARNING: CPU: 3 PID: 5136 at include/linux/cpumask.h:143 cpumask_next include/linux/cpumask.h:212 [inline]
WARNING: CPU: 3 PID: 5136 at include/linux/cpumask.h:143 c_start+0x1da/0x250 arch/x86/kernel/cpu/proc.c:156
Modules linked in:
CPU: 3 PID: 5136 Comm: syz-fuzzer Not tainted 6.3.0-rc1-syzkaller-00106-g44889ba56cbb #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
RIP: 0010:cpu_max_bits_warn include/linux/cpumask.h:143 [inline]
RIP: 0010:cpumask_check include/linux/cpumask.h:150 [inline]
RIP: 0010:cpumask_next include/linux/cpumask.h:212 [inline]
RIP: 0010:c_start+0x1da/0x250 arch/x86/kernel/cpu/proc.c:156
Code: 49 11 8c e8 88 a9 51 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 f3 48 0f bc db e8 72 a9 51 00 89 db e9 38 ff ff ff e8 66 a9 51 00 <0f> 0b e9 c1 fe ff ff 48 c7 c7 80 7f 77 8e e8 d3 49 a2 00 e9 67 ff
RSP: 0018:ffffc90001ee7c68 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000000
RDX: ffff888023996000 RSI: ffffffff813279da RDI: 0000000000000004
RBP: ffffffff8e777f80 R08: 0000000000000004 R09: 0000000000000007
R10: 0000000000000004 R11: 0000000000000000 R12: ffff88801f2ded60
R13: 0000000000000008 R14: 0000000000000004 R15: 0000000000000000
FS: 0000000001ec53d0(0000) GS:ffff88802cb80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f08961ed698 CR3: 00000000141a7000 CR4: 0000000000150ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
seq_read_iter+0x2ca/0x12d0 fs/seq_file.c:225
proc_reg_read_iter+0x115/0x2d0 fs/proc/inode.c:301
call_read_iter include/linux/fs.h:1845 [inline]
new_sync_read fs/read_write.c:389 [inline]
vfs_read+0x681/0x930 fs/read_write.c:470
ksys_read+0x12b/0x250 fs/read_write.c:613
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x403ace
Code: 48 89 6c 24 38 48 8d 6c 24 38 e8 0d 00 00 00 48 8b 6c 24 38 48 83 c4 40 c3 cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
RSP: 002b:000000c000561308 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000403ace
RDX: 0000000000000b49 RSI: 000000c0003ae4b7 RDI: 0000000000000003
RBP: 000000c000561348 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 000000c000561488
R13: 0000000000000000 R14: 000000c0000061a0 R15: 0000000000000001
</TASK>

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

2023-03-12 02:58:16

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [syzbot] [kernel?] WARNING in c_start (2)

Hello, Linus.

syzbot is unable to test kernels due to hitting WARN_ON_ONCE(cpu >= bits) upon
"cat /proc/cpuinfo" request.

Since commit 596ff4a09b898179 ("cpumask: re-introduce constant-sized cpumask optimizations")
changed to pass "small_cpumask_bits" instead of "nr_cpumask_bits" to find_next_bit(),
find_next_bit() returning small_cpumask_bits causes c_next() to go beyond nr_cpumask_bits.
I think that we need to make sure that cpumask_next() and friends would not return cpu id
beyond nr_cpumask_bits.

Before commit 596ff4a09b898179:

----------
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 099b6f0d96bd..d2e49c74284a 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -152,10 +152,11 @@ static int show_cpuinfo(struct seq_file *m, void *v)
}

static void *c_start(struct seq_file *m, loff_t *pos)
{
*pos = cpumask_next(*pos - 1, cpu_online_mask);
+ pr_info("*pos = %u, nr_cpu_ids = %u\n", *pos, nr_cpu_ids);
if ((*pos) < nr_cpu_ids)
return &cpu_data(*pos);
return NULL;
}

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 10c92bd9b807..3d7127ebeb44 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -172,10 +172,11 @@ static inline unsigned int cpumask_last(const struct cpumask *srcp)
* Returns >= nr_cpu_ids if no further cpus set.
*/
static inline
unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
+ pr_info("bits = 0x%016lx, nr_cpumask_bits = %u, n + 1 = %u\n", *cpumask_bits(srcp), nr_cpumask_bits, n + 1);
/* -1 is a legal arg here. */
if (n != -1)
cpumask_check(n);
return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
}
----------

----------
[ 11.626748] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 0
[ 11.626751] *pos = 0, nr_cpu_ids = 10
[ 11.626765] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 1
[ 11.626766] *pos = 1, nr_cpu_ids = 10
[ 11.626775] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 2
[ 11.626775] *pos = 2, nr_cpu_ids = 10
[ 11.626783] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 3
[ 11.626783] *pos = 3, nr_cpu_ids = 10
[ 11.626791] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 4
[ 11.626791] *pos = 4, nr_cpu_ids = 10
[ 11.626819] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 4
[ 11.626820] *pos = 4, nr_cpu_ids = 10
[ 11.626828] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 5
[ 11.626828] *pos = 5, nr_cpu_ids = 10
[ 11.626835] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 6
[ 11.626836] *pos = 6, nr_cpu_ids = 10
[ 11.626843] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 7
[ 11.626844] *pos = 7, nr_cpu_ids = 10
[ 11.626851] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 8
[ 11.626851] *pos = 8, nr_cpu_ids = 10
[ 11.626862] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 8
[ 11.626863] *pos = 8, nr_cpu_ids = 10
[ 11.626870] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 9
[ 11.626871] *pos = 9, nr_cpu_ids = 10
[ 11.626878] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 10
[ 11.626879] *pos = 10, nr_cpu_ids = 10
[ 11.626883] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 10
[ 11.626884] *pos = 10, nr_cpu_ids = 10
[ 11.639390] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 0
[ 11.639392] *pos = 0, nr_cpu_ids = 10
[ 11.639406] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 1
[ 11.639407] *pos = 1, nr_cpu_ids = 10
[ 11.639416] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 2
[ 11.639417] *pos = 2, nr_cpu_ids = 10
[ 11.639424] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 3
[ 11.639425] *pos = 3, nr_cpu_ids = 10
[ 11.639432] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 4
[ 11.639433] *pos = 4, nr_cpu_ids = 10
[ 11.639459] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 4
[ 11.639459] *pos = 4, nr_cpu_ids = 10
[ 11.639467] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 5
[ 11.639467] *pos = 5, nr_cpu_ids = 10
[ 11.639475] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 6
[ 11.639475] *pos = 6, nr_cpu_ids = 10
[ 11.639483] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 7
[ 11.639483] *pos = 7, nr_cpu_ids = 10
[ 11.639490] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 8
[ 11.639491] *pos = 8, nr_cpu_ids = 10
[ 11.639503] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 8
[ 11.639503] *pos = 8, nr_cpu_ids = 10
[ 11.639510] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 9
[ 11.639511] *pos = 9, nr_cpu_ids = 10
[ 11.639518] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 10
[ 11.639519] *pos = 10, nr_cpu_ids = 10
[ 11.639524] bits = 0x00000000000003ff, nr_cpumask_bits = 10, n + 1 = 10
[ 11.639524] *pos = 10, nr_cpu_ids = 10
----------

After commit 596ff4a09b898179:

----------
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 099b6f0d96bd..d2e49c74284a 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -152,10 +152,11 @@ static int show_cpuinfo(struct seq_file *m, void *v)
}

static void *c_start(struct seq_file *m, loff_t *pos)
{
*pos = cpumask_next(*pos - 1, cpu_online_mask);
+ pr_info("*pos = %u, nr_cpu_ids = %u\n", *pos, nr_cpu_ids);
if ((*pos) < nr_cpu_ids)
return &cpu_data(*pos);
return NULL;
}

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 8fbe76607965..0aaa0a32a39f 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -205,10 +205,11 @@ static inline unsigned int cpumask_last(const struct cpumask *srcp)
* Returns >= nr_cpu_ids if no further cpus set.
*/
static inline
unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
+ pr_info("bits = 0x%016lx, small_cpumask_bits = %u, n + 1 = %u\n", *cpumask_bits(srcp), small_cpumask_bits, n + 1);
/* -1 is a legal arg here. */
if (n != -1)
cpumask_check(n);
return find_next_bit(cpumask_bits(srcp), small_cpumask_bits, n + 1);
}
----------

----------
[ 10.954521] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 0
[ 10.954525] *pos = 0, nr_cpu_ids = 10
[ 10.954544] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 1
[ 10.954545] *pos = 1, nr_cpu_ids = 10
[ 10.954557] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 2
[ 10.954557] *pos = 2, nr_cpu_ids = 10
[ 10.954568] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 3
[ 10.954569] *pos = 3, nr_cpu_ids = 10
[ 10.954580] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 4
[ 10.954580] *pos = 4, nr_cpu_ids = 10
[ 10.954625] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 4
[ 10.954627] *pos = 4, nr_cpu_ids = 10
[ 10.954638] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 5
[ 10.954638] *pos = 5, nr_cpu_ids = 10
[ 10.954649] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 6
[ 10.954649] *pos = 6, nr_cpu_ids = 10
[ 10.954660] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 7
[ 10.954660] *pos = 7, nr_cpu_ids = 10
[ 10.954671] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 8
[ 10.954671] *pos = 8, nr_cpu_ids = 10
[ 10.954689] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 8
[ 10.954689] *pos = 8, nr_cpu_ids = 10
[ 10.954700] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 9
[ 10.954701] *pos = 9, nr_cpu_ids = 10
[ 10.954711] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 10
[ 10.954712] *pos = 64, nr_cpu_ids = 10
[ 10.954720] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 64
[ 10.954830] ------------[ cut here ]------------
[ 10.954832] WARNING: CPU: 0 PID: 417 at include/linux/cpumask.h:143 c_start+0xb4/0xe0
[ 10.954838] Modules linked in: sch_fq_codel(E) msr(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) raid6_pq(E) xor(E) libcrc32c(E) raid1(E) raid0(E) multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) ahci(E) cryptd(E) psmouse(E) libahci(E) e1000(E) i2c_piix4(E) pata_acpi(E)
[ 10.954862] CPU: 0 PID: 417 Comm: grep Tainted: G E 6.2.0+ #950
[ 10.954865] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 10.954866] RIP: 0010:c_start+0xb4/0xe0
[ 10.954868] Code: 82 e9 01 48 39 c3 7d 17 48 83 fb 40 73 2a 48 c7 c0 a0 98 01 00 48 03 04 dd e0 c9 c7 a6 eb 02 31 c0 48 83 c4 08 5b 41 5e 5d c3 <0f> 0b 83 fb 3f 76 8e be 40 00 00 00 eb ad 48 c7 c7 10 59 24 a7 48
[ 10.954870] RSP: 0018:ffff9d5d80943d58 EFLAGS: 00010293
[ 10.954872] RAX: 000000000000003f RBX: 0000000000000040 RCX: e79682b5424b6f00
[ 10.954874] RDX: 0000000000001fff RSI: 0000000000000000 RDI: ffff8eb1db821608
[ 10.954877] RBP: ffff9d5d80943d70 R08: 3436203d2031202b R09: 206e202c3436203d
[ 10.954878] R10: 203d20737469625f R11: 6b73616d7570635f R12: ffff8eb1c764e650
[ 10.954879] R13: ffff9d5d80943e20 R14: ffff8eb1c764e640 R15: 0000000000000000
[ 10.954880] FS: 00007ff2c79b7740(0000) GS:ffff8eb1db800000(0000) knlGS:0000000000000000
[ 10.954882] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 10.954884] CR2: 0000559fdf1c6000 CR3: 0000000108b9a004 CR4: 00000000000306f0
[ 10.954887] Call Trace:
[ 10.954888] <TASK>
[ 10.954890] seq_read_iter+0x11c/0x410
[ 10.954895] proc_reg_read_iter+0xe3/0x110
[ 10.954898] vfs_read+0x23c/0x2a0
[ 10.954901] ksys_read+0x6d/0xd0
[ 10.954902] __x64_sys_read+0x1f/0x30
[ 10.954904] do_syscall_64+0x68/0x90
[ 10.954907] ? exc_page_fault+0x92/0x240
[ 10.954909] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 10.954911] RIP: 0033:0x7ff2c7714992
[ 10.954913] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 10.954915] RSP: 002b:00007fff5a069b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 10.954917] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007ff2c7714992
[ 10.954918] RDX: 0000000000016000 RSI: 0000559fdf1c658a RDI: 0000000000000003
[ 10.954919] RBP: 0000559fdf1c658a R08: 0000000000019000 R09: 0000000000000000
[ 10.954920] R10: 0000000000000066 R11: 0000000000000246 R12: 0000000000000003
[ 10.954921] R13: 0000000000016000 R14: 00007fff5a069be0 R15: 0000000000016000
[ 10.954923] </TASK>
[ 10.954924] ---[ end trace 0000000000000000 ]---
[ 10.954925] *pos = 64, nr_cpu_ids = 10
[ 10.977061] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 0
[ 10.977064] *pos = 0, nr_cpu_ids = 10
[ 10.977079] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 1
[ 10.977080] *pos = 1, nr_cpu_ids = 10
[ 10.977089] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 2
[ 10.977090] *pos = 2, nr_cpu_ids = 10
[ 10.977098] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 3
[ 10.977098] *pos = 3, nr_cpu_ids = 10
[ 10.977106] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 4
[ 10.977107] *pos = 4, nr_cpu_ids = 10
[ 10.977133] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 4
[ 10.977134] *pos = 4, nr_cpu_ids = 10
[ 10.977142] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 5
[ 10.977143] *pos = 5, nr_cpu_ids = 10
[ 10.977150] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 6
[ 10.977151] *pos = 6, nr_cpu_ids = 10
[ 10.977159] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 7
[ 10.977159] *pos = 7, nr_cpu_ids = 10
[ 10.977167] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 8
[ 10.977167] *pos = 8, nr_cpu_ids = 10
[ 10.977179] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 8
[ 10.977180] *pos = 8, nr_cpu_ids = 10
[ 10.977187] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 9
[ 10.977188] *pos = 9, nr_cpu_ids = 10
[ 10.977196] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 10
[ 10.977196] *pos = 64, nr_cpu_ids = 10
[ 10.977201] bits = 0x00000000000003ff, small_cpumask_bits = 64, n + 1 = 64
[ 10.979923] *pos = 64, nr_cpu_ids = 10
----------

On 2023/03/10 9:42, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 44889ba56cbb Merge tag 'net-6.3-rc2' of git://git.kernel.o..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=13933824c80000
> kernel config: https://syzkaller.appspot.com/x/.config?x=d677dcd743eef09
> dashboard link: https://syzkaller.appspot.com/bug?extid=96cae094d90877641f32
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]

2023-03-12 07:43:50

by Linus Torvalds

[permalink] [raw]

Subject: Re: [syzbot] [kernel?] WARNING in c_start (2)

On Sat, Mar 11, 2023 at 6:57 PM Tetsuo Handa
<[email protected]> wrote:
>
> syzbot is unable to test kernels due to hitting WARN_ON_ONCE(cpu >= bits) upon
> "cat /proc/cpuinfo" request.
>
> Since commit 596ff4a09b898179 ("cpumask: re-introduce constant-sized cpumask optimizations")
> changed to pass "small_cpumask_bits" instead of "nr_cpumask_bits" to find_next_bit(),
> find_next_bit() returning small_cpumask_bits causes c_next() to go beyond nr_cpumask_bits.
> I think that we need to make sure that cpumask_next() and friends would not return cpu id
> beyond nr_cpumask_bits.

Ahh. yes.

It's the same old "cpumask scanning should be testing >= nr_cpu_ids"
thing, but c_start() does

*pos = cpumask_next(*pos - 1, cpu_online_mask);

and basically assumes that it is "== nr_cpu_ids" for the end
condition, and uses the value next time around.

And if it is *exactly* nr_cpu_ids, then the next time it gets called,
the "*pos - 1" means that it's all ok.

But if it's > nr_cpu_ids, then next time the "-1" doesn't do anything
useful and the input is still larger than the number of CPU ids.

The core *works* correctly, but it triggers that warning because it is
not doing that test properly.

That c_start() function is ugly, but the simplest patch is probably
this one-liner (whitespace-damaged but hopefully really obvious):

--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -156,6 +156,7 @@
*pos = cpumask_next(*pos - 1, cpu_online_mask);
if ((*pos) < nr_cpu_ids)
return &cpu_data(*pos);
+ *pos = nr_cpu_ids;
return NULL;
}

which just caps that ">= nr_cpu_ids" case down to nr_cpu_ids.

Does that fix your test-case for you?

I'm not entirely convinced we shouldn't clean stuff up with a slightly
bigger patch, though. Instead of capping the 'pos', just testing it
seems the kind of more obvious thing. This code had similar problems
before. So an alternative patch (still whitespace-damaged) would be
something like

--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -153,8 +153,12 @@

static void *c_start(struct seq_file *m, loff_t *pos)
{
- *pos = cpumask_next(*pos - 1, cpu_online_mask);
- if ((*pos) < nr_cpu_ids)
+ loff_t prev = *pos;
+
+ if (prev >= nr_cpu_ids)
+ return NULL;
+ *pos = cpumask_next(prev - 1, cpu_online_mask);
+ if (*pos < nr_cpu_ids)
return &cpu_data(*pos);
return NULL;
}

which is a few lines more of patch, but stops depending on that "pos
has to end up exactly at nr_cpu_ids" thing.

Either patch should result in the same thing and hopefully fix your
warning, so I guess it's just a matter of taste.

Linus

2023-03-12 09:50:37

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [syzbot] [kernel?] WARNING in c_start (2)

On 2023/03/12 16:43, Linus Torvalds wrote:
> On Sat, Mar 11, 2023 at 6:57 PM Tetsuo Handa
> <[email protected]> wrote:
>>
>> syzbot is unable to test kernels due to hitting WARN_ON_ONCE(cpu >= bits) upon
>> "cat /proc/cpuinfo" request.
>>
>> Since commit 596ff4a09b898179 ("cpumask: re-introduce constant-sized cpumask optimizations")
>> changed to pass "small_cpumask_bits" instead of "nr_cpumask_bits" to find_next_bit(),
>> find_next_bit() returning small_cpumask_bits causes c_next() to go beyond nr_cpumask_bits.
>> I think that we need to make sure that cpumask_next() and friends would not return cpu id
>> beyond nr_cpumask_bits.
>
> Ahh. yes.
>
> It's the same old "cpumask scanning should be testing >= nr_cpu_ids"
> thing, but c_start() does
>
> *pos = cpumask_next(*pos - 1, cpu_online_mask);
>
> and basically assumes that it is "== nr_cpu_ids" for the end
> condition, and uses the value next time around.
>
> And if it is *exactly* nr_cpu_ids, then the next time it gets called,
> the "*pos - 1" means that it's all ok.
>
> But if it's > nr_cpu_ids, then next time the "-1" doesn't do anything
> useful and the input is still larger than the number of CPU ids.
>
> The core *works* correctly, but it triggers that warning because it is
> not doing that test properly.

Right. The fix that works for "cat /procc/cpuinfo" case is

Subject: cpumask: adjust valid cpu range check in cpumask_next()

Since commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations") changed to pass "small_cpumask_bits" (which is a build-time
constant if NR_CPUS <= BITS_PER_LONG) instead of "nr_cpumask_bits" (which
is not a build-time constant) when cpumask_next() calls find_next_bit(),
the caller of cpumask_next() started observing core id which is beyond
number of available cores.

If all callers of cpumask_next() are prepared for observing core id which
is beyond nr_cpumask_bits, we can preserve optimization introduced by that
commit.

But we need to treat small_cpumask_bits - 1 (which happens when
cpumask_next() is called after cpumask_next() returned small_cpumask_bits)
as well as -1 (which happens when cpumask_next() is called for the first
time while reading /proc/cpuinfo).

Reported-by: syzbot <[email protected]>
Link: https://syzkaller.appspot.com/bug?extid=96cae094d90877641f32
Fixes: 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations")
---
include/linux/cpumask.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 63d637d18e79..f96f46326e32 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -207,8 +207,8 @@ static inline unsigned int cpumask_last(const struct cpumask *srcp)
static inline
unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
- /* -1 is a legal arg here. */
- if (n != -1)
+ /* -1 and small_cpumask_bits-1 are legal here. */
+ if (n != -1 && n != small_cpumask_bits - 1)
cpumask_check(n);
return find_next_bit(cpumask_bits(srcp), small_cpumask_bits, n + 1);
}
--
2.34.1

but we need to audit all cpumask_next*() users. That would take for a while
waiting for responses.

>
> That c_start() function is ugly, but the simplest patch is probably
> this one-liner (whitespace-damaged but hopefully really obvious):
>
> --- a/arch/x86/kernel/cpu/proc.c
> +++ b/arch/x86/kernel/cpu/proc.c
> @@ -156,6 +156,7 @@
> *pos = cpumask_next(*pos - 1, cpu_online_mask);
> if ((*pos) < nr_cpu_ids)
> return &cpu_data(*pos);
> + *pos = nr_cpu_ids;
> return NULL;
> }
>
>
> which just caps that ">= nr_cpu_ids" case down to nr_cpu_ids.
>
> Does that fix your test-case for you?

Yes. But other architectures will need the same fix.

>
> I'm not entirely convinced we shouldn't clean stuff up with a slightly
> bigger patch, though. Instead of capping the 'pos', just testing it
> seems the kind of more obvious thing. This code had similar problems
> before. So an alternative patch (still whitespace-damaged) would be
> something like
>
> --- a/arch/x86/kernel/cpu/proc.c
> +++ b/arch/x86/kernel/cpu/proc.c
> @@ -153,8 +153,12 @@
>
> static void *c_start(struct seq_file *m, loff_t *pos)
> {
> - *pos = cpumask_next(*pos - 1, cpu_online_mask);
> - if ((*pos) < nr_cpu_ids)
> + loff_t prev = *pos;
> +
> + if (prev >= nr_cpu_ids)
> + return NULL;
> + *pos = cpumask_next(prev - 1, cpu_online_mask);
> + if (*pos < nr_cpu_ids)
> return &cpu_data(*pos);
> return NULL;
> }
>
> which is a few lines more of patch, but stops depending on that "pos
> has to end up exactly at nr_cpu_ids" thing.

More worrisome thing for me is subtle changes in other locations.
For example, arch/ia64/kernel/mca.c is comparing the result of cpumask_next()
in different ways; one with nr_cpu_ids and the other with NR_CPUS. Maybe simply
commit 5dd3c9949a3e ("cpumask: prepare for iterators to only go to
nr_cpu_ids/nr_cpumask_bits.: ia64") forgot to update ia64_mca_cpe_int_caller().
Until all callers are checked, I afraid suddenly changing cpumask_next() to
return from nr_cpu_ids to NR_CPUS might break something.

static irqreturn_t
ia64_mca_cmc_int_caller(int cmc_irq, void *arg)
{
static int start_count = -1;
unsigned int cpuid;

cpuid = smp_processor_id();

/* If first cpu, update count */
if (start_count == -1)
start_count = IA64_LOG_COUNT(SAL_INFO_TYPE_CMC);

ia64_mca_cmc_int_handler(cmc_irq, arg);

cpuid = cpumask_next(cpuid+1, cpu_online_mask);

if (cpuid < nr_cpu_ids) {
ia64_send_ipi(cpuid, IA64_CMCP_VECTOR, IA64_IPI_DM_INT, 0);
} else {
/* If no log record, switch out of polling mode */
(...snipped...)
}

static irqreturn_t
ia64_mca_cpe_int_caller(int cpe_irq, void *arg)
{
static int start_count = -1;
static int poll_time = MIN_CPE_POLL_INTERVAL;
unsigned int cpuid;

cpuid = smp_processor_id();

/* If first cpu, update count */
if (start_count == -1)
start_count = IA64_LOG_COUNT(SAL_INFO_TYPE_CPE);

ia64_mca_cpe_int_handler(cpe_irq, arg);

cpuid = cpumask_next(cpuid+1, cpu_online_mask);

if (cpuid < NR_CPUS) {
ia64_send_ipi(cpuid, IA64_CPEP_VECTOR, IA64_IPI_DM_INT, 0);
} else {
/*
* If a log was recorded, increase our polling frequency,
(...snipped...)
}

>
> Either patch should result in the same thing and hopefully fix your
> warning, so I guess it's just a matter of taste.
>
> Linus
>

Since currently 99% of syzbot crashes are this bug, I want to fix as soon as
possible. Thus, regarding this cycle, restoring old behavior seems the safer.