LinuxLists.cc - [6.2.0-rc7] BUG: KASAN: slab-out-of-bounds in hop

2023-02-14 13:24:24

Subject: [6.2.0-rc7] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110

Hello,

recently when testing kernel with debug options set from net-next [1]
and bpf-next [2] the following call trace happens:

[ 92.539335] be2net 0000:04:00.0: FW config: function_mode=0x10003,
function_caps=0x7
[ 92.559345] scsi host1: BC_356 : error in cmd completion: Subsystem
: 1 Opcode : 191 status(compl/extd)=2/30
[ 92.560448] scsi host1: BG_1597 : HBA error recovery not supported
[ 92.587657] be2net 0000:04:00.0: Max: txqs 16, rxqs 17, rss 16, eqs 16, vfs 0
[ 92.588471] be2net 0000:04:00.0: Max: uc-macs 30, mc-macs 64, vlans 64
[ 93.731235] be2net 0000:04:00.0: enabled 8 MSI-x vector(s) for NIC
[ 93.749741] ==================================================================
[ 93.750521] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110
[ 93.751233] Read of size 8 at addr ffff888104719758 by task kworker/0:2/108
[ 93.751601]
[ 93.752087] CPU: 0 PID: 108 Comm: kworker/0:2 Tainted: G I
6.2.0-rc7 #1
[ 93.752549] Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014
[ 93.752884] Workqueue: events work_for_cpu_fn
[ 93.753510] Call Trace:
[ 93.753687] <TASK>
[ 93.754215] dump_stack_lvl+0x55/0x71
[ 93.754449] print_report+0x184/0x4b1
[ 93.754697] ? __virt_addr_valid+0xe8/0x160
[ 93.754972] ? hop_cmp+0x26/0x110
[ 93.755533] kasan_report+0xa5/0xe0
[ 93.756193] ? hop_cmp+0x26/0x110
[ 93.756767] ? __pfx_hop_cmp+0x10/0x10
[ 93.756990] ? hop_cmp+0x26/0x110
[ 93.757556] ? __pfx_hop_cmp+0x10/0x10
[ 93.757774] ? bsearch+0x53/0x80
[ 93.758838] ? sched_numa_find_nth_cpu+0x128/0x360
[ 93.759492] ? __pfx_sched_numa_find_nth_cpu+0x10/0x10
[ 93.759792] ? alloc_cpumask_var_node+0x38/0x60
[ 93.760419] ? rcu_read_lock_sched_held+0x3f/0x80
[ 93.761060] ? trace_kmalloc+0x33/0xf0
[ 93.761306] ? __kmalloc_node+0x76/0xc0
[ 93.761528] ? cpumask_local_spread+0x44/0xc0
[ 93.762192] ? be_setup_queues+0x13b/0x3c0 [be2net]
[ 93.762957] ? be_setup+0x663/0xa60 [be2net]
[ 93.763795] ? __pfx_be_setup+0x10/0x10 [be2net]
[ 93.764523] ? is_module_address+0x2b/0x50
[ 93.764744] ? is_module_address+0x2b/0x50
[ 93.764996] ? static_obj+0x6b/0x80
[ 93.765865] ? lockdep_init_map_type+0xcf/0x370
[ 93.766527] ? be_probe+0x825/0xcd0 [be2net]
[ 93.767224] ? __pfx_be_probe+0x10/0x10 [be2net]
[ 93.767932] ? preempt_count_sub+0xb7/0x100
[ 93.768181] ? _raw_spin_unlock_irqrestore+0x35/0x60
[ 93.768450] ? __pfx_be_probe+0x10/0x10 [be2net]
[ 93.769162] ? local_pci_probe+0x77/0xc0
[ 93.769392] ? __pfx_local_pci_probe+0x10/0x10
[ 93.770007] ? work_for_cpu_fn+0x29/0x40
[ 93.770253] ? process_one_work+0x543/0xa20
[ 93.770490] ? __pfx_process_one_work+0x10/0x10
[ 93.797773pin_lock+0x10/0x10
[ 93.871656] ? __list_add_valid+0x3f/0x70
[ 93.871874] ? move_linked_works+0x103/0x140
[ 93.872487] ? worker_thread+0x364/0x630
[ 93.872704] ? __kthread_parkme+0xd8/0xf0
[ 93.872919] ? __pfx_worker_thread+0x10/0x10
[ 93.873513] ? kthread+0x17e/0x1b0
[ 93.874055] ? __pfx_kthread+0x10/0x10
[ 93.874290] ? ret_from_fork+0x2c/0x50
[ 93.874541] </TASK>
[ 93.874727]
[ 93.875188] Allocated by task 1:
[ 93.875733] kasan_save_stack+0x34/0x60
[ 93.875942] kasan_set_track+0x21/0x30
[ 93.876164] __kasan_kmalloc+0xa9/0xb0
[ 93.876373] __kmalloc+0x57/0xd0
[ 93.876918] sched_init_numa+0x21f/0x7e0
[ 93.877146] sched_init_smp+0x6d/0x113
[ 93.877358] kernel_init_freeable+0x2a3/0x4a0
[ 93.877993] kernel_init+0x18/0x160
[ 93.878592] ret_from_fork+0x2c/0x50
[ 93.878811]
[ 93.879278] The buggy address belongs to the object at ffff888104719760
[ 93.879278] which belongs to the cache kmalloc-16 of size 16
[ 93.879926] The buggy address is located 8 bytes to the left of
[ 93.879926] 16-byte region [ffff888104719760, ffff888104719770)
[ 94.363686] flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
[ 94.381131] raw: 0017ffffc0000200 ffff88810004c580 ffffea000400df50
ffffea0004165190
[ 94.381554] raw: 0000000000000000 00000000001c001c 00000001ffffffff
0000000000000000
[ 94.381958] page dumped because: kasan: bad access detected
[ 94.382249]
[ 94.382710] Memory state around the buggy address:
[ 94.383319] ffff888104719600: fc fc fc fc fc fc fc fc fa fb fc fc
fc fc fc fc
[ 94.384066] ffff888104719680: fc fc fc fc fc fc fc fc fc fc 00 00
fc fc fc fc
[ 94.384841] >ffff888104719700: fc fc fc fc fc fc fc fc fc fc fc fc
00 00 fc fc
[ 94.385573] ^
[ 94.386251] ffff888104719780: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc 00 00
[ 94.386989] ffff888104719800: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 94.387710] ==================================================================

full console log:
https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/3762562309/redhat:776235046/build_x86_64_redhat:776235046-x86_64-kernel-debug/tests/1/results_0001/job.01/recipes/13385613/tasks/5/logs/test_console.log

test logs: https://datawarehouse.cki-project.org/kcidb/tests/7075911
cki issue tracker: https://datawarehouse.cki-project.org/issue/1896

kernel config: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/776235046/build%20x86_64%20debug/3762562279/artifacts/kernel-bpf-next-redhat_776235046-x86_64-kernel-debug.config
kernel tarball:
https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/776235046/publish%20x86_64%20debug/3762562289/artifacts/kernel-bpf-next-redhat_776235046-x86_64-kernel-debug.tar.gz

The first commit we tested that we hit the problem is [3], but we
didn't bisect it to know what commit introduced the issue.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
[2] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
[3] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=0243d3dfe274832aa0a16214499c208122345173

Thanks,
Bruno Goncalves

2023-02-14 14:32:57

by Yury Norov

[permalink] [raw]

Subject: Re: [6.2.0-rc7] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110

On Tue, Feb 14, 2023 at 02:23:06PM +0100, Bruno Goncalves wrote:
> Hello,
>
> recently when testing kernel with debug options set from net-next [1]
> and bpf-next [2] the following call trace happens:
>
Hi Bruno,

Thanks for report.

This looks weird, because the hop_cmp() spent for 3 month in -next till
now. Anyways, can you please share your NUMA configuration so I'll try
to reproduce the bug locally? What 'numactl -H' outputs?

Thanks,
Yury

> [ 92.539335] be2net 0000:04:00.0: FW config: function_mode=0x10003,
> function_caps=0x7
> [ 92.559345] scsi host1: BC_356 : error in cmd completion: Subsystem
> : 1 Opcode : 191 status(compl/extd)=2/30
> [ 92.560448] scsi host1: BG_1597 : HBA error recovery not supported
> [ 92.587657] be2net 0000:04:00.0: Max: txqs 16, rxqs 17, rss 16, eqs 16, vfs 0
> [ 92.588471] be2net 0000:04:00.0: Max: uc-macs 30, mc-macs 64, vlans 64
> [ 93.731235] be2net 0000:04:00.0: enabled 8 MSI-x vector(s) for NIC
> [ 93.749741] ==================================================================
> [ 93.750521] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110
> [ 93.751233] Read of size 8 at addr ffff888104719758 by task kworker/0:2/108
> [ 93.751601]
> [ 93.752087] CPU: 0 PID: 108 Comm: kworker/0:2 Tainted: G I
> 6.2.0-rc7 #1
> [ 93.752549] Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014
> [ 93.752884] Workqueue: events work_for_cpu_fn
> [ 93.753510] Call Trace:
> [ 93.753687] <TASK>
> [ 93.754215] dump_stack_lvl+0x55/0x71
> [ 93.754449] print_report+0x184/0x4b1
> [ 93.754697] ? __virt_addr_valid+0xe8/0x160
> [ 93.754972] ? hop_cmp+0x26/0x110
> [ 93.755533] kasan_report+0xa5/0xe0
> [ 93.756193] ? hop_cmp+0x26/0x110
> [ 93.756767] ? __pfx_hop_cmp+0x10/0x10
> [ 93.756990] ? hop_cmp+0x26/0x110
> [ 93.757556] ? __pfx_hop_cmp+0x10/0x10
> [ 93.757774] ? bsearch+0x53/0x80
> [ 93.758838] ? sched_numa_find_nth_cpu+0x128/0x360
> [ 93.759492] ? __pfx_sched_numa_find_nth_cpu+0x10/0x10
> [ 93.759792] ? alloc_cpumask_var_node+0x38/0x60
> [ 93.760419] ? rcu_read_lock_sched_held+0x3f/0x80
> [ 93.761060] ? trace_kmalloc+0x33/0xf0
> [ 93.761306] ? __kmalloc_node+0x76/0xc0
> [ 93.761528] ? cpumask_local_spread+0x44/0xc0
> [ 93.762192] ? be_setup_queues+0x13b/0x3c0 [be2net]
> [ 93.762957] ? be_setup+0x663/0xa60 [be2net]
> [ 93.763795] ? __pfx_be_setup+0x10/0x10 [be2net]
> [ 93.764523] ? is_module_address+0x2b/0x50
> [ 93.764744] ? is_module_address+0x2b/0x50
> [ 93.764996] ? static_obj+0x6b/0x80
> [ 93.765865] ? lockdep_init_map_type+0xcf/0x370
> [ 93.766527] ? be_probe+0x825/0xcd0 [be2net]
> [ 93.767224] ? __pfx_be_probe+0x10/0x10 [be2net]
> [ 93.767932] ? preempt_count_sub+0xb7/0x100
> [ 93.768181] ? _raw_spin_unlock_irqrestore+0x35/0x60
> [ 93.768450] ? __pfx_be_probe+0x10/0x10 [be2net]
> [ 93.769162] ? local_pci_probe+0x77/0xc0
> [ 93.769392] ? __pfx_local_pci_probe+0x10/0x10
> [ 93.770007] ? work_for_cpu_fn+0x29/0x40
> [ 93.770253] ? process_one_work+0x543/0xa20
> [ 93.770490] ? __pfx_process_one_work+0x10/0x10
> [ 93.797773pin_lock+0x10/0x10
> [ 93.871656] ? __list_add_valid+0x3f/0x70
> [ 93.871874] ? move_linked_works+0x103/0x140
> [ 93.872487] ? worker_thread+0x364/0x630
> [ 93.872704] ? __kthread_parkme+0xd8/0xf0
> [ 93.872919] ? __pfx_worker_thread+0x10/0x10
> [ 93.873513] ? kthread+0x17e/0x1b0
> [ 93.874055] ? __pfx_kthread+0x10/0x10
> [ 93.874290] ? ret_from_fork+0x2c/0x50
> [ 93.874541] </TASK>
> [ 93.874727]
> [ 93.875188] Allocated by task 1:
> [ 93.875733] kasan_save_stack+0x34/0x60
> [ 93.875942] kasan_set_track+0x21/0x30
> [ 93.876164] __kasan_kmalloc+0xa9/0xb0
> [ 93.876373] __kmalloc+0x57/0xd0
> [ 93.876918] sched_init_numa+0x21f/0x7e0
> [ 93.877146] sched_init_smp+0x6d/0x113
> [ 93.877358] kernel_init_freeable+0x2a3/0x4a0
> [ 93.877993] kernel_init+0x18/0x160
> [ 93.878592] ret_from_fork+0x2c/0x50
> [ 93.878811]
> [ 93.879278] The buggy address belongs to the object at ffff888104719760
> [ 93.879278] which belongs to the cache kmalloc-16 of size 16
> [ 93.879926] The buggy address is located 8 bytes to the left of
> [ 93.879926] 16-byte region [ffff888104719760, ffff888104719770)
> [ 94.363686] flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
> [ 94.381131] raw: 0017ffffc0000200 ffff88810004c580 ffffea000400df50
> ffffea0004165190
> [ 94.381554] raw: 0000000000000000 00000000001c001c 00000001ffffffff
> 0000000000000000
> [ 94.381958] page dumped because: kasan: bad access detected
> [ 94.382249]
> [ 94.382710] Memory state around the buggy address:
> [ 94.383319] ffff888104719600: fc fc fc fc fc fc fc fc fa fb fc fc
> fc fc fc fc
> [ 94.384066] ffff888104719680: fc fc fc fc fc fc fc fc fc fc 00 00
> fc fc fc fc
> [ 94.384841] >ffff888104719700: fc fc fc fc fc fc fc fc fc fc fc fc
> 00 00 fc fc
> [ 94.385573] ^
> [ 94.386251] ffff888104719780: fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc 00 00
> [ 94.386989] ffff888104719800: fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc
> [ 94.387710] ==================================================================
>
> full console log:
> https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/3762562309/redhat:776235046/build_x86_64_redhat:776235046-x86_64-kernel-debug/tests/1/results_0001/job.01/recipes/13385613/tasks/5/logs/test_console.log
>
> test logs: https://datawarehouse.cki-project.org/kcidb/tests/7075911
> cki issue tracker: https://datawarehouse.cki-project.org/issue/1896
>
> kernel config: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/776235046/build%20x86_64%20debug/3762562279/artifacts/kernel-bpf-next-redhat_776235046-x86_64-kernel-debug.config
> kernel tarball:
> https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/776235046/publish%20x86_64%20debug/3762562289/artifacts/kernel-bpf-next-redhat_776235046-x86_64-kernel-debug.tar.gz
>
> The first commit we tested that we hit the problem is [3], but we
> didn't bisect it to know what commit introduced the issue.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=0243d3dfe274832aa0a16214499c208122345173
>
> Thanks,
> Bruno Goncalves

2023-02-15 08:25:58

by Bruno Goncalves

[permalink] [raw]

Subject: Re: [6.2.0-rc7] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110

On Tue, 14 Feb 2023 at 15:32, Yury Norov <[email protected]> wrote:
>
> On Tue, Feb 14, 2023 at 02:23:06PM +0100, Bruno Goncalves wrote:
> > Hello,
> >
> > recently when testing kernel with debug options set from net-next [1]
> > and bpf-next [2] the following call trace happens:
> >
> Hi Bruno,
>
> Thanks for report.
>
> This looks weird, because the hop_cmp() spent for 3 month in -next till
> now. Anyways, can you please share your NUMA configuration so I'll try
> to reproduce the bug locally? What 'numactl -H' outputs?
>

Here is the output:

numactl -H
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
node 0 size: 32063 MB
node 0 free: 31610 MB
node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
node 1 size: 32248 MB
node 1 free: 31909 MB
node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
node 2 size: 32248 MB
node 2 free: 31551 MB
node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
node 3 size: 32239 MB
node 3 free: 31468 MB
node distances:
node 0 1 2 3
0: 10 21 31 21
1: 21 10 21 31
2: 31 21 10 21
3: 21 31 21 10

Bruno

> Thanks,
> Yury
>
> > [ 92.539335] be2net 0000:04:00.0: FW config: function_mode=0x10003,
> > function_caps=0x7
> > [ 92.559345] scsi host1: BC_356 : error in cmd completion: Subsystem
> > : 1 Opcode : 191 status(compl/extd)=2/30
> > [ 92.560448] scsi host1: BG_1597 : HBA error recovery not supported
> > [ 92.587657] be2net 0000:04:00.0: Max: txqs 16, rxqs 17, rss 16, eqs 16, vfs 0
> > [ 92.588471] be2net 0000:04:00.0: Max: uc-macs 30, mc-macs 64, vlans 64
> > [ 93.731235] be2net 0000:04:00.0: enabled 8 MSI-x vector(s) for NIC
> > [ 93.749741] ==================================================================
> > [ 93.750521] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110
> > [ 93.751233] Read of size 8 at addr ffff888104719758 by task kworker/0:2/108
> > [ 93.751601]
> > [ 93.752087] CPU: 0 PID: 108 Comm: kworker/0:2 Tainted: G I
> > 6.2.0-rc7 #1
> > [ 93.752549] Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014
> > [ 93.752884] Workqueue: events work_for_cpu_fn
> > [ 93.753510] Call Trace:
> > [ 93.753687] <TASK>
> > [ 93.754215] dump_stack_lvl+0x55/0x71
> > [ 93.754449] print_report+0x184/0x4b1
> > [ 93.754697] ? __virt_addr_valid+0xe8/0x160
> > [ 93.754972] ? hop_cmp+0x26/0x110
> > [ 93.755533] kasan_report+0xa5/0xe0
> > [ 93.756193] ? hop_cmp+0x26/0x110
> > [ 93.756767] ? __pfx_hop_cmp+0x10/0x10
> > [ 93.756990] ? hop_cmp+0x26/0x110
> > [ 93.757556] ? __pfx_hop_cmp+0x10/0x10
> > [ 93.757774] ? bsearch+0x53/0x80
> > [ 93.758838] ? sched_numa_find_nth_cpu+0x128/0x360
> > [ 93.759492] ? __pfx_sched_numa_find_nth_cpu+0x10/0x10
> > [ 93.759792] ? alloc_cpumask_var_node+0x38/0x60
> > [ 93.760419] ? rcu_read_lock_sched_held+0x3f/0x80
> > [ 93.761060] ? trace_kmalloc+0x33/0xf0
> > [ 93.761306] ? __kmalloc_node+0x76/0xc0
> > [ 93.761528] ? cpumask_local_spread+0x44/0xc0
> > [ 93.762192] ? be_setup_queues+0x13b/0x3c0 [be2net]
> > [ 93.762957] ? be_setup+0x663/0xa60 [be2net]
> > [ 93.763795] ? __pfx_be_setup+0x10/0x10 [be2net]
> > [ 93.764523] ? is_module_address+0x2b/0x50
> > [ 93.764744] ? is_module_address+0x2b/0x50
> > [ 93.764996] ? static_obj+0x6b/0x80
> > [ 93.765865] ? lockdep_init_map_type+0xcf/0x370
> > [ 93.766527] ? be_probe+0x825/0xcd0 [be2net]
> > [ 93.767224] ? __pfx_be_probe+0x10/0x10 [be2net]
> > [ 93.767932] ? preempt_count_sub+0xb7/0x100
> > [ 93.768181] ? _raw_spin_unlock_irqrestore+0x35/0x60
> > [ 93.768450] ? __pfx_be_probe+0x10/0x10 [be2net]
> > [ 93.769162] ? local_pci_probe+0x77/0xc0
> > [ 93.769392] ? __pfx_local_pci_probe+0x10/0x10
> > [ 93.770007] ? work_for_cpu_fn+0x29/0x40
> > [ 93.770253] ? process_one_work+0x543/0xa20
> > [ 93.770490] ? __pfx_process_one_work+0x10/0x10
> > [ 93.797773pin_lock+0x10/0x10
> > [ 93.871656] ? __list_add_valid+0x3f/0x70
> > [ 93.871874] ? move_linked_works+0x103/0x140
> > [ 93.872487] ? worker_thread+0x364/0x630
> > [ 93.872704] ? __kthread_parkme+0xd8/0xf0
> > [ 93.872919] ? __pfx_worker_thread+0x10/0x10
> > [ 93.873513] ? kthread+0x17e/0x1b0
> > [ 93.874055] ? __pfx_kthread+0x10/0x10
> > [ 93.874290] ? ret_from_fork+0x2c/0x50
> > [ 93.874541] </TASK>
> > [ 93.874727]
> > [ 93.875188] Allocated by task 1:
> > [ 93.875733] kasan_save_stack+0x34/0x60
> > [ 93.875942] kasan_set_track+0x21/0x30
> > [ 93.876164] __kasan_kmalloc+0xa9/0xb0
> > [ 93.876373] __kmalloc+0x57/0xd0
> > [ 93.876918] sched_init_numa+0x21f/0x7e0
> > [ 93.877146] sched_init_smp+0x6d/0x113
> > [ 93.877358] kernel_init_freeable+0x2a3/0x4a0
> > [ 93.877993] kernel_init+0x18/0x160
> > [ 93.878592] ret_from_fork+0x2c/0x50
> > [ 93.878811]
> > [ 93.879278] The buggy address belongs to the object at ffff888104719760
> > [ 93.879278] which belongs to the cache kmalloc-16 of size 16
> > [ 93.879926] The buggy address is located 8 bytes to the left of
> > [ 93.879926] 16-byte region [ffff888104719760, ffff888104719770)
> > [ 94.363686] flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
> > [ 94.381131] raw: 0017ffffc0000200 ffff88810004c580 ffffea000400df50
> > ffffea0004165190
> > [ 94.381554] raw: 0000000000000000 00000000001c001c 00000001ffffffff
> > 0000000000000000
> > [ 94.381958] page dumped because: kasan: bad access detected
> > [ 94.382249]
> > [ 94.382710] Memory state around the buggy address:
> > [ 94.383319] ffff888104719600: fc fc fc fc fc fc fc fc fa fb fc fc
> > fc fc fc fc
> > [ 94.384066] ffff888104719680: fc fc fc fc fc fc fc fc fc fc 00 00
> > fc fc fc fc
> > [ 94.384841] >ffff888104719700: fc fc fc fc fc fc fc fc fc fc fc fc
> > 00 00 fc fc
> > [ 94.385573] ^
> > [ 94.386251] ffff888104719780: fc fc fc fc fc fc fc fc fc fc fc fc
> > fc fc 00 00
> > [ 94.386989] ffff888104719800: fc fc fc fc fc fc fc fc fc fc fc fc
> > fc fc fc fc
> > [ 94.387710] ==================================================================
> >
> > full console log:
> > https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/3762562309/redhat:776235046/build_x86_64_redhat:776235046-x86_64-kernel-debug/tests/1/results_0001/job.01/recipes/13385613/tasks/5/logs/test_console.log
> >
> > test logs: https://datawarehouse.cki-project.org/kcidb/tests/7075911
> > cki issue tracker: https://datawarehouse.cki-project.org/issue/1896
> >
> > kernel config: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/776235046/build%20x86_64%20debug/3762562279/artifacts/kernel-bpf-next-redhat_776235046-x86_64-kernel-debug.config
> > kernel tarball:
> > https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/776235046/publish%20x86_64%20debug/3762562289/artifacts/kernel-bpf-next-redhat_776235046-x86_64-kernel-debug.tar.gz
> >
> > The first commit we tested that we hit the problem is [3], but we
> > didn't bisect it to know what commit introduced the issue.
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=0243d3dfe274832aa0a16214499c208122345173
> >
> > Thanks,
> > Bruno Goncalves
>

2023-02-16 00:31:42

by Yury Norov

[permalink] [raw]

Subject: Re: [6.2.0-rc7] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110

+ Kees Cook <[email protected]>
+ Miguel Ojeda <[email protected]>
+ Nick Desaulniers <[email protected]>

On Wed, Feb 15, 2023 at 09:24:52AM +0100, Bruno Goncalves wrote:
> On Tue, 14 Feb 2023 at 15:32, Yury Norov <[email protected]> wrote:
> >
> > On Tue, Feb 14, 2023 at 02:23:06PM +0100, Bruno Goncalves wrote:
> > > Hello,
> > >
> > > recently when testing kernel with debug options set from net-next [1]
> > > and bpf-next [2] the following call trace happens:
> > >
> > Hi Bruno,
> >
> > Thanks for report.
> >
> > This looks weird, because the hop_cmp() spent for 3 month in -next till
> > now. Anyways, can you please share your NUMA configuration so I'll try
> > to reproduce the bug locally? What 'numactl -H' outputs?
> >
>
> Here is the output:
>
> numactl -H
> available: 4 nodes (0-3)
> node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
> node 0 size: 32063 MB
> node 0 free: 31610 MB
> node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
> node 1 size: 32248 MB
> node 1 free: 31909 MB
> node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
> node 2 size: 32248 MB
> node 2 free: 31551 MB
> node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
> node 3 size: 32239 MB
> node 3 free: 31468 MB
> node distances:
> node 0 1 2 3
> 0: 10 21 31 21
> 1: 21 10 21 31
> 2: 31 21 10 21
> 3: 21 31 21 10
>
> Bruno

So, I was able to reproduce it, and it seems like a compiler issue.

The problem is that hop_cmp() calculates pointer to a previous hop
object unconditionally at the beginning of the function:

struct cpumask **prev_hop = *((struct cpumask ***)b - 1);

Obviously, for the first hop, there's no such thing like a previous
one, and later in the code 'prev_hop' is used conditionally on that:

k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);

To me the code above looks like it instructs the compiler to dereference
'b - 1' only if b != k->masks, i.e. when b is not the first hop. But GCC
does that unconditionally, which looks wrong.

If I defer dereferencing manually like in the snippet below, the kasan
warning goes away.

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 48838a05c008..5f297f81c574 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2081,14 +2081,14 @@ struct __cmp_key {

static int hop_cmp(const void *a, const void *b)
{
- struct cpumask **prev_hop = *((struct cpumask ***)b - 1);
struct cpumask **cur_hop = *(struct cpumask ***)b;
struct __cmp_key *k = (struct __cmp_key *)a;

if (cpumask_weight_and(k->cpus, cur_hop[k->node]) <= k->cpu)
return 1;

- k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);
+ k->w = (b == k->masks) ? 0 :
+ cpumask_weight_and(k->cpus, (*((struct cpumask ***)b - 1))[k->node]);
if (k->w <= k->cpu)
return 0;

I don't understand why GCC doesn't optimize out unneeded dereferencing.
It does that even if I replace ternary operator with if-else construction.
To me it looks like a compiler bug.

However, I acknowledge that I'm not a great expert in C standard, so
it's quite possible that there may be some rule that prevents from
doing such optimizations, even for non-volatile variables.

Adding compiler people. Guys, could you please clarify on that?
If it's my fault, I'll submit fix shortly.

Thanks,
Yury

2023-02-16 01:51:56

by Kees Cook

[permalink] [raw]

Subject: Re: [6.2.0-rc7] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110

On February 15, 2023 4:31:32 PM PST, Yury Norov <[email protected]> wrote:
>+ Kees Cook <[email protected]>
>+ Miguel Ojeda <[email protected]>
>+ Nick Desaulniers <[email protected]>
>
>On Wed, Feb 15, 2023 at 09:24:52AM +0100, Bruno Goncalves wrote:
>> On Tue, 14 Feb 2023 at 15:32, Yury Norov <[email protected]> wrote:
>> >
>> > On Tue, Feb 14, 2023 at 02:23:06PM +0100, Bruno Goncalves wrote:
>> > > Hello,
>> > >
>> > > recently when testing kernel with debug options set from net-next [1]
>> > > and bpf-next [2] the following call trace happens:
>> > >
>> > Hi Bruno,
>> >
>> > Thanks for report.
>> >
>> > This looks weird, because the hop_cmp() spent for 3 month in -next till
>> > now. Anyways, can you please share your NUMA configuration so I'll try
>> > to reproduce the bug locally? What 'numactl -H' outputs?
>> >
>>
>> Here is the output:
>>
>> numactl -H
>> available: 4 nodes (0-3)
>> node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
>> node 0 size: 32063 MB
>> node 0 free: 31610 MB
>> node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
>> node 1 size: 32248 MB
>> node 1 free: 31909 MB
>> node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
>> node 2 size: 32248 MB
>> node 2 free: 31551 MB
>> node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
>> node 3 size: 32239 MB
>> node 3 free: 31468 MB
>> node distances:
>> node 0 1 2 3
>> 0: 10 21 31 21
>> 1: 21 10 21 31
>> 2: 31 21 10 21
>> 3: 21 31 21 10
>>
>> Bruno
>
>So, I was able to reproduce it, and it seems like a compiler issue.
>
>The problem is that hop_cmp() calculates pointer to a previous hop
>object unconditionally at the beginning of the function:
>
> struct cpumask **prev_hop = *((struct cpumask ***)b - 1);
>
>Obviously, for the first hop, there's no such thing like a previous
>one, and later in the code 'prev_hop' is used conditionally on that:
>
> k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);
>
>To me the code above looks like it instructs the compiler to dereference
>'b - 1' only if b != k->masks, i.e. when b is not the first hop. But GCC
>does that unconditionally, which looks wrong.
>
>If I defer dereferencing manually like in the snippet below, the kasan
>warning goes away.
>
>diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>index 48838a05c008..5f297f81c574 100644
>--- a/kernel/sched/topology.c
>+++ b/kernel/sched/topology.c
>@@ -2081,14 +2081,14 @@ struct __cmp_key {
>
> static int hop_cmp(const void *a, const void *b)
> {
>- struct cpumask **prev_hop = *((struct cpumask ***)b - 1);
> struct cpumask **cur_hop = *(struct cpumask ***)b;
> struct __cmp_key *k = (struct __cmp_key *)a;
>
> if (cpumask_weight_and(k->cpus, cur_hop[k->node]) <= k->cpu)
> return 1;
>
>- k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);
>+ k->w = (b == k->masks) ? 0 :
>+ cpumask_weight_and(k->cpus, (*((struct cpumask ***)b - 1))[k->node]);
> if (k->w <= k->cpu)
> return 0;
>
>I don't understand why GCC doesn't optimize out unneeded dereferencing.
>It does that even if I replace ternary operator with if-else construction.
>To me it looks like a compiler bug.
>
>However, I acknowledge that I'm not a great expert in C standard, so
>it's quite possible that there may be some rule that prevents from
>doing such optimizations, even for non-volatile variables.
>
>Adding compiler people. Guys, could you please clarify on that?
>If it's my fault, I'll submit fix shortly.

My understanding is that without getting inlined, the compiler cannot evaluate "b == k->masks" at compile time (if it can at all). So since the dereference is part of variable initialization, it's not executed later: it's executed at function entry.

Regardless, this whole function looks kind of hard to read. Why not fully expand it with the if/else logic and put any needed variables into the respective clauses? Then humans can read it and the compiler will optimize it down just as efficiently.

-Kees

--
Kees Cook

2023-02-16 16:34:09

by Yury Norov

[permalink] [raw]

Subject: Re: [6.2.0-rc7] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110

On Wed, Feb 15, 2023 at 05:51:39PM -0800, Kees Cook wrote:
> On February 15, 2023 4:31:32 PM PST, Yury Norov <[email protected]> wrote:
> >+ Kees Cook <[email protected]>
> >+ Miguel Ojeda <[email protected]>
> >+ Nick Desaulniers <[email protected]>
> >
> >On Wed, Feb 15, 2023 at 09:24:52AM +0100, Bruno Goncalves wrote:
> >> On Tue, 14 Feb 2023 at 15:32, Yury Norov <[email protected]> wrote:
> >> >
> >> > On Tue, Feb 14, 2023 at 02:23:06PM +0100, Bruno Goncalves wrote:
> >> > > Hello,
> >> > >
> >> > > recently when testing kernel with debug options set from net-next [1]
> >> > > and bpf-next [2] the following call trace happens:
> >> > >
> >> > Hi Bruno,
> >> >
> >> > Thanks for report.
> >> >
> >> > This looks weird, because the hop_cmp() spent for 3 month in -next till
> >> > now. Anyways, can you please share your NUMA configuration so I'll try
> >> > to reproduce the bug locally? What 'numactl -H' outputs?
> >> >
> >>
> >> Here is the output:
> >>
> >> numactl -H
> >> available: 4 nodes (0-3)
> >> node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
> >> node 0 size: 32063 MB
> >> node 0 free: 31610 MB
> >> node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
> >> node 1 size: 32248 MB
> >> node 1 free: 31909 MB
> >> node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
> >> node 2 size: 32248 MB
> >> node 2 free: 31551 MB
> >> node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
> >> node 3 size: 32239 MB
> >> node 3 free: 31468 MB
> >> node distances:
> >> node 0 1 2 3
> >> 0: 10 21 31 21
> >> 1: 21 10 21 31
> >> 2: 31 21 10 21
> >> 3: 21 31 21 10
> >>
> >> Bruno
> >
> >So, I was able to reproduce it, and it seems like a compiler issue.
> >
> >The problem is that hop_cmp() calculates pointer to a previous hop
> >object unconditionally at the beginning of the function:
> >
> > struct cpumask **prev_hop = *((struct cpumask ***)b - 1);
> >
> >Obviously, for the first hop, there's no such thing like a previous
> >one, and later in the code 'prev_hop' is used conditionally on that:
> >
> > k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);
> >
> >To me the code above looks like it instructs the compiler to dereference
> >'b - 1' only if b != k->masks, i.e. when b is not the first hop. But GCC
> >does that unconditionally, which looks wrong.
> >
> >If I defer dereferencing manually like in the snippet below, the kasan
> >warning goes away.
> >
> >diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> >index 48838a05c008..5f297f81c574 100644
> >--- a/kernel/sched/topology.c
> >+++ b/kernel/sched/topology.c
> >@@ -2081,14 +2081,14 @@ struct __cmp_key {
> >
> > static int hop_cmp(const void *a, const void *b)
> > {
> >- struct cpumask **prev_hop = *((struct cpumask ***)b - 1);
> > struct cpumask **cur_hop = *(struct cpumask ***)b;
> > struct __cmp_key *k = (struct __cmp_key *)a;
> >
> > if (cpumask_weight_and(k->cpus, cur_hop[k->node]) <= k->cpu)
> > return 1;
> >
> >- k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);
> >+ k->w = (b == k->masks) ? 0 :
> >+ cpumask_weight_and(k->cpus, (*((struct cpumask ***)b - 1))[k->node]);
> > if (k->w <= k->cpu)
> > return 0;
> >
> >I don't understand why GCC doesn't optimize out unneeded dereferencing.
> >It does that even if I replace ternary operator with if-else construction.
> >To me it looks like a compiler bug.
> >
> >However, I acknowledge that I'm not a great expert in C standard, so
> >it's quite possible that there may be some rule that prevents from
> >doing such optimizations, even for non-volatile variables.
> >
> >Adding compiler people. Guys, could you please clarify on that?
> >If it's my fault, I'll submit fix shortly.
>
> My understanding is that without getting inlined, the compiler cannot evaluate "b == k->masks" at compile time (if it can at all). So since the dereference is part of variable initialization, it's not executed later: it's executed at function entry.
>
> Regardless, this whole function looks kind of hard to read. Why not fully expand it with the if/else logic and put any needed variables into the respective clauses? Then humans can read it and the compiler will optimize it down just as efficiently.

Yes, changing it to if-else would be a simplest solution.

Thanks,
Yury