bpf_sk_storage maps use multiple spin locks to reduce contention.
The number of locks to use is determined by the number of possible CPUs.
With only 1 possible CPU, bucket_log == 0, and 2^0 = 1 locks are used.
When updating elements, the correct lock is determined with hash_ptr().
Calling hash_ptr() with 0 bits is undefined behavior, as it does:
x >> (64 - bits)
Using the value results in an out of bounds memory access.
In my case, this manifested itself as a page fault when raw_spin_lock_bh()
is called later, when running the self tests:
./tools/testing/selftests/bpf/test_verifier 773 775
[ 16.366342] BUG: unable to handle page fault for address: ffff8fe7a66f93f8
[ 16.367139] #PF: supervisor write access in kernel mode
[ 16.367751] #PF: error_code(0x0002) - not-present page
[ 16.368323] PGD 35a01067 P4D 35a01067 PUD 0
[ 16.368796] Oops: 0002 [#1] SMP PTI
[ 16.369175] CPU: 0 PID: 189 Comm: test_verifier Not tainted 5.2.0-rc2+ #10
[ 16.369960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 16.371021] RIP: 0010:_raw_spin_lock_bh (/home/afabre/linux/./include/trace/events/initcall.h:48)
[ 16.371571] Code: 02 00 00 31 c0 ba ff 00 00 00 3e 0f b1 17 75 01 c3 e9 82 12 5f ff 66 90 65 81 05 ad 14 6f 41 00 02 00 00 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 01 c3 89 c6 e9 f0 02 5f ff b8 00 02 00 00 3e 0f c1
All code
========
0: 02 00 add (%rax),%al
2: 00 31 add %dh,(%rcx)
4: c0 ba ff 00 00 00 3e sarb $0x3e,0xff(%rdx)
b: 0f b1 17 cmpxchg %edx,(%rdi)
e: 75 01 jne 0x11
10: c3 retq
11: e9 82 12 5f ff jmpq 0xffffffffff5f1298
16: 66 90 xchg %ax,%ax
18: 65 81 05 ad 14 6f 41 addl $0x200,%gs:0x416f14ad(%rip) # 0x416f14d0
1f: 00 02 00 00
23: 31 c0 xor %eax,%eax
25: ba 01 00 00 00 mov $0x1,%edx
2a: 3e 0f b1 17 cmpxchg %edx,%ds:*(%rdi) <-- trapping instruction
2e: 75 01 jne 0x31
30: c3 retq
31: 89 c6 mov %eax,%esi
33: e9 f0 02 5f ff jmpq 0xffffffffff5f0328
38: b8 00 02 00 00 mov $0x200,%eax
3d: 3e ds
3e: 0f .byte 0xf
3f: c1 .byte 0xc1
Code starting with the faulting instruction
===========================================
0: 3e 0f b1 17 cmpxchg %edx,%ds:(%rdi)
4: 75 01 jne 0x7
6: c3 retq
7: 89 c6 mov %eax,%esi
9: e9 f0 02 5f ff jmpq 0xffffffffff5f02fe
e: b8 00 02 00 00 mov $0x200,%eax
13: 3e ds
14: 0f .byte 0xf
15: c1 .byte 0xc1
[ 16.373398] RSP: 0018:ffffa759809d3be0 EFLAGS: 00010246
[ 16.373954] RAX: 0000000000000000 RBX: ffff8fe7a66f93f0 RCX: 0000000000000040
[ 16.374645] RDX: 0000000000000001 RSI: ffff8fdaf9f0d180 RDI: ffff8fe7a66f93f8
[ 16.375338] RBP: ffff8fdaf9f0d180 R08: ffff8fdafba2c320 R09: ffff8fdaf9f0d0c0
[ 16.376028] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fdafa346700
[ 16.376719] R13: ffff8fe7a66f93f8 R14: ffff8fdaf9f0d0c0 R15: 0000000000000001
[ 16.377413] FS: 00007fda724c0740(0000) GS:ffff8fdafba00000(0000) knlGS:0000000000000000
[ 16.378204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.378763] CR2: ffff8fe7a66f93f8 CR3: 0000000139d1c006 CR4: 0000000000360ef0
[ 16.379453] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 16.380144] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 16.380864] Call Trace:
[ 16.381112] selem_link_map (/home/afabre/linux/./include/linux/compiler.h:221 /home/afabre/linux/net/core/bpf_sk_storage.c:243)
[ 16.381476] sk_storage_update (/home/afabre/linux/net/core/bpf_sk_storage.c:355 /home/afabre/linux/net/core/bpf_sk_storage.c:414)
[ 16.381888] bpf_sk_storage_get (/home/afabre/linux/net/core/bpf_sk_storage.c:760 /home/afabre/linux/net/core/bpf_sk_storage.c:741)
[ 16.382285] ___bpf_prog_run (/home/afabre/linux/kernel/bpf/core.c:1447)
[ 16.382679] ? __bpf_prog_run32 (/home/afabre/linux/kernel/bpf/core.c:1603)
[ 16.383074] ? alloc_file_pseudo (/home/afabre/linux/fs/file_table.c:232)
[ 16.383486] ? kvm_clock_get_cycles (/home/afabre/linux/arch/x86/kernel/kvmclock.c:98)
[ 16.383906] ? ktime_get (/home/afabre/linux/kernel/time/timekeeping.c:265 /home/afabre/linux/kernel/time/timekeeping.c:369 /home/afabre/linux/kernel/time/timekeeping.c:754)
[ 16.384243] ? bpf_test_run (/home/afabre/linux/net/bpf/test_run.c:47)
[ 16.384613] ? bpf_prog_test_run_skb (/home/afabre/linux/net/bpf/test_run.c:313)
[ 16.385065] ? security_capable (/home/afabre/linux/security/security.c:696 (discriminator 19))
[ 16.385460] ? __do_sys_bpf (/home/afabre/linux/kernel/bpf/syscall.c:2072 /home/afabre/linux/kernel/bpf/syscall.c:2848)
[ 16.385854] ? __handle_mm_fault (/home/afabre/linux/mm/memory.c:3507 /home/afabre/linux/mm/memory.c:3532 /home/afabre/linux/mm/memory.c:3666 /home/afabre/linux/mm/memory.c:3897 /home/afabre/linux/mm/memory.c:4021)
[ 16.386273] ? __dentry_kill (/home/afabre/linux/fs/dcache.c:595)
[ 16.386652] ? do_syscall_64 (/home/afabre/linux/arch/x86/entry/common.c:301)
[ 16.387031] ? entry_SYSCALL_64_after_hwframe (/home/afabre/linux/./include/trace/events/initcall.h:10 /home/afabre/linux/./include/trace/events/initcall.h:10)
[ 16.387541] Modules linked in:
[ 16.387846] CR2: ffff8fe7a66f93f8
[ 16.388175] ---[ end trace 891cf27b5b9c9cc6 ]---
[ 16.388628] RIP: 0010:_raw_spin_lock_bh (/home/afabre/linux/./include/trace/events/initcall.h:48)
[ 16.389089] Code: 02 00 00 31 c0 ba ff 00 00 00 3e 0f b1 17 75 01 c3 e9 82 12 5f ff 66 90 65 81 05 ad 14 6f 41 00 02 00 00 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 01 c3 89 c6 e9 f0 02 5f ff b8 00 02 00 00 3e 0f c1
All code
========
0: 02 00 add (%rax),%al
2: 00 31 add %dh,(%rcx)
4: c0 ba ff 00 00 00 3e sarb $0x3e,0xff(%rdx)
b: 0f b1 17 cmpxchg %edx,(%rdi)
e: 75 01 jne 0x11
10: c3 retq
11: e9 82 12 5f ff jmpq 0xffffffffff5f1298
16: 66 90 xchg %ax,%ax
18: 65 81 05 ad 14 6f 41 addl $0x200,%gs:0x416f14ad(%rip) # 0x416f14d0
1f: 00 02 00 00
23: 31 c0 xor %eax,%eax
25: ba 01 00 00 00 mov $0x1,%edx
2a: 3e 0f b1 17 cmpxchg %edx,%ds:*(%rdi) <-- trapping instruction
2e: 75 01 jne 0x31
30: c3 retq
31: 89 c6 mov %eax,%esi
33: e9 f0 02 5f ff jmpq 0xffffffffff5f0328
38: b8 00 02 00 00 mov $0x200,%eax
3d: 3e ds
3e: 0f .byte 0xf
3f: c1 .byte 0xc1
Code starting with the faulting instruction
===========================================
0: 3e 0f b1 17 cmpxchg %edx,%ds:(%rdi)
4: 75 01 jne 0x7
6: c3 retq
7: 89 c6 mov %eax,%esi
9: e9 f0 02 5f ff jmpq 0xffffffffff5f02fe
e: b8 00 02 00 00 mov $0x200,%eax
13: 3e ds
14: 0f .byte 0xf
15: c1 .byte 0xc1
[ 16.390899] RSP: 0018:ffffa759809d3be0 EFLAGS: 00010246
[ 16.391410] RAX: 0000000000000000 RBX: ffff8fe7a66f93f0 RCX: 0000000000000040
[ 16.392102] RDX: 0000000000000001 RSI: ffff8fdaf9f0d180 RDI: ffff8fe7a66f93f8
[ 16.392795] RBP: ffff8fdaf9f0d180 R08: ffff8fdafba2c320 R09: ffff8fdaf9f0d0c0
[ 16.393481] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fdafa346700
[ 16.394169] R13: ffff8fe7a66f93f8 R14: ffff8fdaf9f0d0c0 R15: 0000000000000001
[ 16.394870] FS: 00007fda724c0740(0000) GS:ffff8fdafba00000(0000) knlGS:0000000000000000
[ 16.395641] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.396193] CR2: ffff8fe7a66f93f8 CR3: 0000000139d1c006 CR4: 0000000000360ef0
[ 16.396876] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 16.397557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 16.398246] Kernel panic - not syncing: Fatal exception in interrupt
[ 16.399067] Kernel Offset: 0x3ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 16.400098] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Signed-off-by: Arthur Fabre <[email protected]>
Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
---
net/core/bpf_sk_storage.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index f40e3d35fd9c..7ae0686c5418 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -90,7 +90,13 @@ struct bpf_sk_storage {
static struct bucket *select_bucket(struct bpf_sk_storage_map *smap,
struct bpf_sk_storage_elem *selem)
{
- return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
+ /* hash_ptr is undefined behavior with 0 bits */
+ int bucket = 0;
+ if (smap->bucket_log != 0) {
+ bucket = hash_ptr(selem, smap->bucket_log);
+ }
+
+ return &smap->buckets[bucket];
}
static int omem_charge(struct sock *sk, unsigned int size)
--
2.20.1
On Thu, Jun 13, 2019 at 8:16 AM Arthur Fabre <[email protected]> wrote:
>
> bpf_sk_storage maps use multiple spin locks to reduce contention.
> The number of locks to use is determined by the number of possible CPUs.
> With only 1 possible CPU, bucket_log == 0, and 2^0 = 1 locks are used.
>
> When updating elements, the correct lock is determined with hash_ptr().
> Calling hash_ptr() with 0 bits is undefined behavior, as it does:
>
> x >> (64 - bits)
>
> Using the value results in an out of bounds memory access.
> In my case, this manifested itself as a page fault when raw_spin_lock_bh()
> is called later, when running the self tests:
>
> ./tools/testing/selftests/bpf/test_verifier 773 775
>
> [ 16.366342] BUG: unable to handle page fault for address: ffff8fe7a66f93f8
> [ 16.367139] #PF: supervisor write access in kernel mode
> [ 16.367751] #PF: error_code(0x0002) - not-present page
> [ 16.368323] PGD 35a01067 P4D 35a01067 PUD 0
> [ 16.368796] Oops: 0002 [#1] SMP PTI
> [ 16.369175] CPU: 0 PID: 189 Comm: test_verifier Not tainted 5.2.0-rc2+ #10
> [ 16.369960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [ 16.371021] RIP: 0010:_raw_spin_lock_bh (/home/afabre/linux/./include/trace/events/initcall.h:48)
> [ 16.371571] Code: 02 00 00 31 c0 ba ff 00 00 00 3e 0f b1 17 75 01 c3 e9 82 12 5f ff 66 90 65 81 05 ad 14 6f 41 00 02 00 00 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 01 c3 89 c6 e9 f0 02 5f ff b8 00 02 00 00 3e 0f c1
> All code
> ========
> 0: 02 00 add (%rax),%al
> 2: 00 31 add %dh,(%rcx)
> 4: c0 ba ff 00 00 00 3e sarb $0x3e,0xff(%rdx)
> b: 0f b1 17 cmpxchg %edx,(%rdi)
> e: 75 01 jne 0x11
> 10: c3 retq
> 11: e9 82 12 5f ff jmpq 0xffffffffff5f1298
> 16: 66 90 xchg %ax,%ax
> 18: 65 81 05 ad 14 6f 41 addl $0x200,%gs:0x416f14ad(%rip) # 0x416f14d0
> 1f: 00 02 00 00
> 23: 31 c0 xor %eax,%eax
> 25: ba 01 00 00 00 mov $0x1,%edx
> 2a: 3e 0f b1 17 cmpxchg %edx,%ds:*(%rdi) <-- trapping instruction
> 2e: 75 01 jne 0x31
> 30: c3 retq
> 31: 89 c6 mov %eax,%esi
> 33: e9 f0 02 5f ff jmpq 0xffffffffff5f0328
> 38: b8 00 02 00 00 mov $0x200,%eax
> 3d: 3e ds
> 3e: 0f .byte 0xf
> 3f: c1 .byte 0xc1
>
> Code starting with the faulting instruction
> ===========================================
> 0: 3e 0f b1 17 cmpxchg %edx,%ds:(%rdi)
> 4: 75 01 jne 0x7
> 6: c3 retq
> 7: 89 c6 mov %eax,%esi
> 9: e9 f0 02 5f ff jmpq 0xffffffffff5f02fe
> e: b8 00 02 00 00 mov $0x200,%eax
> 13: 3e ds
> 14: 0f .byte 0xf
> 15: c1 .byte 0xc1
> [ 16.373398] RSP: 0018:ffffa759809d3be0 EFLAGS: 00010246
> [ 16.373954] RAX: 0000000000000000 RBX: ffff8fe7a66f93f0 RCX: 0000000000000040
> [ 16.374645] RDX: 0000000000000001 RSI: ffff8fdaf9f0d180 RDI: ffff8fe7a66f93f8
> [ 16.375338] RBP: ffff8fdaf9f0d180 R08: ffff8fdafba2c320 R09: ffff8fdaf9f0d0c0
> [ 16.376028] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fdafa346700
> [ 16.376719] R13: ffff8fe7a66f93f8 R14: ffff8fdaf9f0d0c0 R15: 0000000000000001
> [ 16.377413] FS: 00007fda724c0740(0000) GS:ffff8fdafba00000(0000) knlGS:0000000000000000
> [ 16.378204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 16.378763] CR2: ffff8fe7a66f93f8 CR3: 0000000139d1c006 CR4: 0000000000360ef0
> [ 16.379453] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 16.380144] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 16.380864] Call Trace:
> [ 16.381112] selem_link_map (/home/afabre/linux/./include/linux/compiler.h:221 /home/afabre/linux/net/core/bpf_sk_storage.c:243)
> [ 16.381476] sk_storage_update (/home/afabre/linux/net/core/bpf_sk_storage.c:355 /home/afabre/linux/net/core/bpf_sk_storage.c:414)
> [ 16.381888] bpf_sk_storage_get (/home/afabre/linux/net/core/bpf_sk_storage.c:760 /home/afabre/linux/net/core/bpf_sk_storage.c:741)
> [ 16.382285] ___bpf_prog_run (/home/afabre/linux/kernel/bpf/core.c:1447)
> [ 16.382679] ? __bpf_prog_run32 (/home/afabre/linux/kernel/bpf/core.c:1603)
> [ 16.383074] ? alloc_file_pseudo (/home/afabre/linux/fs/file_table.c:232)
> [ 16.383486] ? kvm_clock_get_cycles (/home/afabre/linux/arch/x86/kernel/kvmclock.c:98)
> [ 16.383906] ? ktime_get (/home/afabre/linux/kernel/time/timekeeping.c:265 /home/afabre/linux/kernel/time/timekeeping.c:369 /home/afabre/linux/kernel/time/timekeeping.c:754)
> [ 16.384243] ? bpf_test_run (/home/afabre/linux/net/bpf/test_run.c:47)
> [ 16.384613] ? bpf_prog_test_run_skb (/home/afabre/linux/net/bpf/test_run.c:313)
> [ 16.385065] ? security_capable (/home/afabre/linux/security/security.c:696 (discriminator 19))
> [ 16.385460] ? __do_sys_bpf (/home/afabre/linux/kernel/bpf/syscall.c:2072 /home/afabre/linux/kernel/bpf/syscall.c:2848)
> [ 16.385854] ? __handle_mm_fault (/home/afabre/linux/mm/memory.c:3507 /home/afabre/linux/mm/memory.c:3532 /home/afabre/linux/mm/memory.c:3666 /home/afabre/linux/mm/memory.c:3897 /home/afabre/linux/mm/memory.c:4021)
> [ 16.386273] ? __dentry_kill (/home/afabre/linux/fs/dcache.c:595)
> [ 16.386652] ? do_syscall_64 (/home/afabre/linux/arch/x86/entry/common.c:301)
> [ 16.387031] ? entry_SYSCALL_64_after_hwframe (/home/afabre/linux/./include/trace/events/initcall.h:10 /home/afabre/linux/./include/trace/events/initcall.h:10)
> [ 16.387541] Modules linked in:
> [ 16.387846] CR2: ffff8fe7a66f93f8
> [ 16.388175] ---[ end trace 891cf27b5b9c9cc6 ]---
> [ 16.388628] RIP: 0010:_raw_spin_lock_bh (/home/afabre/linux/./include/trace/events/initcall.h:48)
> [ 16.389089] Code: 02 00 00 31 c0 ba ff 00 00 00 3e 0f b1 17 75 01 c3 e9 82 12 5f ff 66 90 65 81 05 ad 14 6f 41 00 02 00 00 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 01 c3 89 c6 e9 f0 02 5f ff b8 00 02 00 00 3e 0f c1
> All code
> ========
> 0: 02 00 add (%rax),%al
> 2: 00 31 add %dh,(%rcx)
> 4: c0 ba ff 00 00 00 3e sarb $0x3e,0xff(%rdx)
> b: 0f b1 17 cmpxchg %edx,(%rdi)
> e: 75 01 jne 0x11
> 10: c3 retq
> 11: e9 82 12 5f ff jmpq 0xffffffffff5f1298
> 16: 66 90 xchg %ax,%ax
> 18: 65 81 05 ad 14 6f 41 addl $0x200,%gs:0x416f14ad(%rip) # 0x416f14d0
> 1f: 00 02 00 00
> 23: 31 c0 xor %eax,%eax
> 25: ba 01 00 00 00 mov $0x1,%edx
> 2a: 3e 0f b1 17 cmpxchg %edx,%ds:*(%rdi) <-- trapping instruction
> 2e: 75 01 jne 0x31
> 30: c3 retq
> 31: 89 c6 mov %eax,%esi
> 33: e9 f0 02 5f ff jmpq 0xffffffffff5f0328
> 38: b8 00 02 00 00 mov $0x200,%eax
> 3d: 3e ds
> 3e: 0f .byte 0xf
> 3f: c1 .byte 0xc1
>
> Code starting with the faulting instruction
> ===========================================
> 0: 3e 0f b1 17 cmpxchg %edx,%ds:(%rdi)
> 4: 75 01 jne 0x7
> 6: c3 retq
> 7: 89 c6 mov %eax,%esi
> 9: e9 f0 02 5f ff jmpq 0xffffffffff5f02fe
> e: b8 00 02 00 00 mov $0x200,%eax
> 13: 3e ds
> 14: 0f .byte 0xf
> 15: c1 .byte 0xc1
> [ 16.390899] RSP: 0018:ffffa759809d3be0 EFLAGS: 00010246
> [ 16.391410] RAX: 0000000000000000 RBX: ffff8fe7a66f93f0 RCX: 0000000000000040
> [ 16.392102] RDX: 0000000000000001 RSI: ffff8fdaf9f0d180 RDI: ffff8fe7a66f93f8
> [ 16.392795] RBP: ffff8fdaf9f0d180 R08: ffff8fdafba2c320 R09: ffff8fdaf9f0d0c0
> [ 16.393481] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fdafa346700
> [ 16.394169] R13: ffff8fe7a66f93f8 R14: ffff8fdaf9f0d0c0 R15: 0000000000000001
> [ 16.394870] FS: 00007fda724c0740(0000) GS:ffff8fdafba00000(0000) knlGS:0000000000000000
> [ 16.395641] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 16.396193] CR2: ffff8fe7a66f93f8 CR3: 0000000139d1c006 CR4: 0000000000360ef0
> [ 16.396876] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 16.397557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 16.398246] Kernel panic - not syncing: Fatal exception in interrupt
> [ 16.399067] Kernel Offset: 0x3ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 16.400098] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
> Signed-off-by: Arthur Fabre <[email protected]>
> Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
> ---
> net/core/bpf_sk_storage.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index f40e3d35fd9c..7ae0686c5418 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -90,7 +90,13 @@ struct bpf_sk_storage {
> static struct bucket *select_bucket(struct bpf_sk_storage_map *smap,
> struct bpf_sk_storage_elem *selem)
> {
> - return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
> + /* hash_ptr is undefined behavior with 0 bits */
> + int bucket = 0;
> + if (smap->bucket_log != 0) {
> + bucket = hash_ptr(selem, smap->bucket_log);
> + }
Would it be better instead to make sure that bucket_log is always at
least 1? Having bucket_log as zero can bite us in the future again.
> +
> + return &smap->buckets[bucket];
> }
>
> static int omem_charge(struct sock *sk, unsigned int size)
> --
> 2.20.1
>
On Thu, Jun 13, 2019 at 10:15:38AM -0700, Andrii Nakryiko wrote:
> On Thu, Jun 13, 2019 at 8:16 AM Arthur Fabre <[email protected]> wrote:
> >
> > bpf_sk_storage maps use multiple spin locks to reduce contention.
> > The number of locks to use is determined by the number of possible CPUs.
> > With only 1 possible CPU, bucket_log == 0, and 2^0 = 1 locks are used.
Thanks for report.
> >
> > When updating elements, the correct lock is determined with hash_ptr().
> > Calling hash_ptr() with 0 bits is undefined behavior, as it does:
> >
> > x >> (64 - bits)
> >
> > Using the value results in an out of bounds memory access.
> > In my case, this manifested itself as a page fault when raw_spin_lock_bh()
> > is called later, when running the self tests:
> >
> > ./tools/testing/selftests/bpf/test_verifier 773 775
> >
> > [ 16.366342] BUG: unable to handle page fault for address: ffff8fe7a66f93f8
> > [ 16.367139] #PF: supervisor write access in kernel mode
> > [ 16.367751] #PF: error_code(0x0002) - not-present page
> > [ 16.368323] PGD 35a01067 P4D 35a01067 PUD 0
> > [ 16.368796] Oops: 0002 [#1] SMP PTI
> > [ 16.369175] CPU: 0 PID: 189 Comm: test_verifier Not tainted 5.2.0-rc2+ #10
> > [ 16.369960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > [ 16.371021] RIP: 0010:_raw_spin_lock_bh (/home/afabre/linux/./include/trace/events/initcall.h:48)
> > [ 16.371571] Code: 02 00 00 31 c0 ba ff 00 00 00 3e 0f b1 17 75 01 c3 e9 82 12 5f ff 66 90 65 81 05 ad 14 6f 41 00 02 00 00 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 01 c3 89 c6 e9 f0 02 5f ff b8 00 02 00 00 3e 0f c1
> > All code
> > ========
> > 0: 02 00 add (%rax),%al
> > 2: 00 31 add %dh,(%rcx)
> > 4: c0 ba ff 00 00 00 3e sarb $0x3e,0xff(%rdx)
> > b: 0f b1 17 cmpxchg %edx,(%rdi)
> > e: 75 01 jne 0x11
> > 10: c3 retq
> > 11: e9 82 12 5f ff jmpq 0xffffffffff5f1298
> > 16: 66 90 xchg %ax,%ax
> > 18: 65 81 05 ad 14 6f 41 addl $0x200,%gs:0x416f14ad(%rip) # 0x416f14d0
> > 1f: 00 02 00 00
> > 23: 31 c0 xor %eax,%eax
> > 25: ba 01 00 00 00 mov $0x1,%edx
> > 2a: 3e 0f b1 17 cmpxchg %edx,%ds:*(%rdi) <-- trapping instruction
> > 2e: 75 01 jne 0x31
> > 30: c3 retq
> > 31: 89 c6 mov %eax,%esi
> > 33: e9 f0 02 5f ff jmpq 0xffffffffff5f0328
> > 38: b8 00 02 00 00 mov $0x200,%eax
> > 3d: 3e ds
> > 3e: 0f .byte 0xf
> > 3f: c1 .byte 0xc1
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 3e 0f b1 17 cmpxchg %edx,%ds:(%rdi)
> > 4: 75 01 jne 0x7
> > 6: c3 retq
> > 7: 89 c6 mov %eax,%esi
> > 9: e9 f0 02 5f ff jmpq 0xffffffffff5f02fe
> > e: b8 00 02 00 00 mov $0x200,%eax
> > 13: 3e ds
> > 14: 0f .byte 0xf
> > 15: c1 .byte 0xc1
> > [ 16.373398] RSP: 0018:ffffa759809d3be0 EFLAGS: 00010246
> > [ 16.373954] RAX: 0000000000000000 RBX: ffff8fe7a66f93f0 RCX: 0000000000000040
> > [ 16.374645] RDX: 0000000000000001 RSI: ffff8fdaf9f0d180 RDI: ffff8fe7a66f93f8
> > [ 16.375338] RBP: ffff8fdaf9f0d180 R08: ffff8fdafba2c320 R09: ffff8fdaf9f0d0c0
> > [ 16.376028] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fdafa346700
> > [ 16.376719] R13: ffff8fe7a66f93f8 R14: ffff8fdaf9f0d0c0 R15: 0000000000000001
> > [ 16.377413] FS: 00007fda724c0740(0000) GS:ffff8fdafba00000(0000) knlGS:0000000000000000
> > [ 16.378204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 16.378763] CR2: ffff8fe7a66f93f8 CR3: 0000000139d1c006 CR4: 0000000000360ef0
> > [ 16.379453] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 16.380144] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 16.380864] Call Trace:
> > [ 16.381112] selem_link_map (/home/afabre/linux/./include/linux/compiler.h:221 /home/afabre/linux/net/core/bpf_sk_storage.c:243)
> > [ 16.381476] sk_storage_update (/home/afabre/linux/net/core/bpf_sk_storage.c:355 /home/afabre/linux/net/core/bpf_sk_storage.c:414)
> > [ 16.381888] bpf_sk_storage_get (/home/afabre/linux/net/core/bpf_sk_storage.c:760 /home/afabre/linux/net/core/bpf_sk_storage.c:741)
> > [ 16.382285] ___bpf_prog_run (/home/afabre/linux/kernel/bpf/core.c:1447)
> > [ 16.382679] ? __bpf_prog_run32 (/home/afabre/linux/kernel/bpf/core.c:1603)
> > [ 16.383074] ? alloc_file_pseudo (/home/afabre/linux/fs/file_table.c:232)
> > [ 16.383486] ? kvm_clock_get_cycles (/home/afabre/linux/arch/x86/kernel/kvmclock.c:98)
> > [ 16.383906] ? ktime_get (/home/afabre/linux/kernel/time/timekeeping.c:265 /home/afabre/linux/kernel/time/timekeeping.c:369 /home/afabre/linux/kernel/time/timekeeping.c:754)
> > [ 16.384243] ? bpf_test_run (/home/afabre/linux/net/bpf/test_run.c:47)
> > [ 16.384613] ? bpf_prog_test_run_skb (/home/afabre/linux/net/bpf/test_run.c:313)
> > [ 16.385065] ? security_capable (/home/afabre/linux/security/security.c:696 (discriminator 19))
> > [ 16.385460] ? __do_sys_bpf (/home/afabre/linux/kernel/bpf/syscall.c:2072 /home/afabre/linux/kernel/bpf/syscall.c:2848)
> > [ 16.385854] ? __handle_mm_fault (/home/afabre/linux/mm/memory.c:3507 /home/afabre/linux/mm/memory.c:3532 /home/afabre/linux/mm/memory.c:3666 /home/afabre/linux/mm/memory.c:3897 /home/afabre/linux/mm/memory.c:4021)
> > [ 16.386273] ? __dentry_kill (/home/afabre/linux/fs/dcache.c:595)
> > [ 16.386652] ? do_syscall_64 (/home/afabre/linux/arch/x86/entry/common.c:301)
> > [ 16.387031] ? entry_SYSCALL_64_after_hwframe (/home/afabre/linux/./include/trace/events/initcall.h:10 /home/afabre/linux/./include/trace/events/initcall.h:10)
> > [ 16.387541] Modules linked in:
> > [ 16.387846] CR2: ffff8fe7a66f93f8
> > [ 16.388175] ---[ end trace 891cf27b5b9c9cc6 ]---
> > [ 16.388628] RIP: 0010:_raw_spin_lock_bh (/home/afabre/linux/./include/trace/events/initcall.h:48)
> > [ 16.389089] Code: 02 00 00 31 c0 ba ff 00 00 00 3e 0f b1 17 75 01 c3 e9 82 12 5f ff 66 90 65 81 05 ad 14 6f 41 00 02 00 00 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 01 c3 89 c6 e9 f0 02 5f ff b8 00 02 00 00 3e 0f c1
> > All code
> > ========
> > 0: 02 00 add (%rax),%al
> > 2: 00 31 add %dh,(%rcx)
> > 4: c0 ba ff 00 00 00 3e sarb $0x3e,0xff(%rdx)
> > b: 0f b1 17 cmpxchg %edx,(%rdi)
> > e: 75 01 jne 0x11
> > 10: c3 retq
> > 11: e9 82 12 5f ff jmpq 0xffffffffff5f1298
> > 16: 66 90 xchg %ax,%ax
> > 18: 65 81 05 ad 14 6f 41 addl $0x200,%gs:0x416f14ad(%rip) # 0x416f14d0
> > 1f: 00 02 00 00
> > 23: 31 c0 xor %eax,%eax
> > 25: ba 01 00 00 00 mov $0x1,%edx
> > 2a: 3e 0f b1 17 cmpxchg %edx,%ds:*(%rdi) <-- trapping instruction
> > 2e: 75 01 jne 0x31
> > 30: c3 retq
> > 31: 89 c6 mov %eax,%esi
> > 33: e9 f0 02 5f ff jmpq 0xffffffffff5f0328
> > 38: b8 00 02 00 00 mov $0x200,%eax
> > 3d: 3e ds
> > 3e: 0f .byte 0xf
> > 3f: c1 .byte 0xc1
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 3e 0f b1 17 cmpxchg %edx,%ds:(%rdi)
> > 4: 75 01 jne 0x7
> > 6: c3 retq
> > 7: 89 c6 mov %eax,%esi
> > 9: e9 f0 02 5f ff jmpq 0xffffffffff5f02fe
> > e: b8 00 02 00 00 mov $0x200,%eax
> > 13: 3e ds
> > 14: 0f .byte 0xf
> > 15: c1 .byte 0xc1
> > [ 16.390899] RSP: 0018:ffffa759809d3be0 EFLAGS: 00010246
> > [ 16.391410] RAX: 0000000000000000 RBX: ffff8fe7a66f93f0 RCX: 0000000000000040
> > [ 16.392102] RDX: 0000000000000001 RSI: ffff8fdaf9f0d180 RDI: ffff8fe7a66f93f8
> > [ 16.392795] RBP: ffff8fdaf9f0d180 R08: ffff8fdafba2c320 R09: ffff8fdaf9f0d0c0
> > [ 16.393481] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fdafa346700
> > [ 16.394169] R13: ffff8fe7a66f93f8 R14: ffff8fdaf9f0d0c0 R15: 0000000000000001
> > [ 16.394870] FS: 00007fda724c0740(0000) GS:ffff8fdafba00000(0000) knlGS:0000000000000000
> > [ 16.395641] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 16.396193] CR2: ffff8fe7a66f93f8 CR3: 0000000139d1c006 CR4: 0000000000360ef0
> > [ 16.396876] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 16.397557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 16.398246] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 16.399067] Kernel Offset: 0x3ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 16.400098] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> >
> > Signed-off-by: Arthur Fabre <[email protected]>
> > Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
> > ---
> > net/core/bpf_sk_storage.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> > index f40e3d35fd9c..7ae0686c5418 100644
> > --- a/net/core/bpf_sk_storage.c
> > +++ b/net/core/bpf_sk_storage.c
> > @@ -90,7 +90,13 @@ struct bpf_sk_storage {
> > static struct bucket *select_bucket(struct bpf_sk_storage_map *smap,
> > struct bpf_sk_storage_elem *selem)
> > {
> > - return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
> > + /* hash_ptr is undefined behavior with 0 bits */
> > + int bucket = 0;
> > + if (smap->bucket_log != 0) {
> > + bucket = hash_ptr(selem, smap->bucket_log);
> > + }
>
> Would it be better instead to make sure that bucket_log is always at
> least 1? Having bucket_log as zero can bite us in the future again.
I also think it is better off to have a max_t(u32, 1, ...) done in
bpf_sk_storage_map_alloc(). Having an extra bucket should be fine in
this case.
>
> > +
> > + return &smap->buckets[bucket];
> > }
> >
> > static int omem_charge(struct sock *sk, unsigned int size)
> > --
> > 2.20.1
> >
On Thu, Jun 13, 2019 at 9:15 PM Martin Lau <[email protected]> wrote:
>
> On Thu, Jun 13, 2019 at 10:15:38AM -0700, Andrii Nakryiko wrote:
> > On Thu, Jun 13, 2019 at 8:16 AM Arthur Fabre <[email protected]> wrote:
> > >
> > > bpf_sk_storage maps use multiple spin locks to reduce contention.
> > > The number of locks to use is determined by the number of possible CPUs.
> > > With only 1 possible CPU, bucket_log == 0, and 2^0 = 1 locks are used.
> Thanks for report.
>
> > >
> > > When updating elements, the correct lock is determined with hash_ptr().
> > > Calling hash_ptr() with 0 bits is undefined behavior, as it does:
> > >
> > > x >> (64 - bits)
> > >
> > > Using the value results in an out of bounds memory access.
> > > In my case, this manifested itself as a page fault when raw_spin_lock_bh()
> > > is called later, when running the self tests:
> > >
> > > ./tools/testing/selftests/bpf/test_verifier 773 775
> > >
> > > [ 16.366342] BUG: unable to handle page fault for address: ffff8fe7a66f93f8
> > > [ 16.367139] #PF: supervisor write access in kernel mode
> > > [ 16.367751] #PF: error_code(0x0002) - not-present page
> > > [ 16.368323] PGD 35a01067 P4D 35a01067 PUD 0
> > > [ 16.368796] Oops: 0002 [#1] SMP PTI
> > > [ 16.369175] CPU: 0 PID: 189 Comm: test_verifier Not tainted 5.2.0-rc2+ #10
> > > [ 16.369960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > > [ 16.371021] RIP: 0010:_raw_spin_lock_bh (/home/afabre/linux/./include/trace/events/initcall.h:48)
> > > [ 16.371571] Code: 02 00 00 31 c0 ba ff 00 00 00 3e 0f b1 17 75 01 c3 e9 82 12 5f ff 66 90 65 81 05 ad 14 6f 41 00 02 00 00 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 01 c3 89 c6 e9 f0 02 5f ff b8 00 02 00 00 3e 0f c1
> > > All code
> > > ========
> > > 0: 02 00 add (%rax),%al
> > > 2: 00 31 add %dh,(%rcx)
> > > 4: c0 ba ff 00 00 00 3e sarb $0x3e,0xff(%rdx)
> > > b: 0f b1 17 cmpxchg %edx,(%rdi)
> > > e: 75 01 jne 0x11
> > > 10: c3 retq
> > > 11: e9 82 12 5f ff jmpq 0xffffffffff5f1298
> > > 16: 66 90 xchg %ax,%ax
> > > 18: 65 81 05 ad 14 6f 41 addl $0x200,%gs:0x416f14ad(%rip) # 0x416f14d0
> > > 1f: 00 02 00 00
> > > 23: 31 c0 xor %eax,%eax
> > > 25: ba 01 00 00 00 mov $0x1,%edx
> > > 2a: 3e 0f b1 17 cmpxchg %edx,%ds:*(%rdi) <-- trapping instruction
> > > 2e: 75 01 jne 0x31
> > > 30: c3 retq
> > > 31: 89 c6 mov %eax,%esi
> > > 33: e9 f0 02 5f ff jmpq 0xffffffffff5f0328
> > > 38: b8 00 02 00 00 mov $0x200,%eax
> > > 3d: 3e ds
> > > 3e: 0f .byte 0xf
> > > 3f: c1 .byte 0xc1
> > >
> > > Code starting with the faulting instruction
> > > ===========================================
> > > 0: 3e 0f b1 17 cmpxchg %edx,%ds:(%rdi)
> > > 4: 75 01 jne 0x7
> > > 6: c3 retq
> > > 7: 89 c6 mov %eax,%esi
> > > 9: e9 f0 02 5f ff jmpq 0xffffffffff5f02fe
> > > e: b8 00 02 00 00 mov $0x200,%eax
> > > 13: 3e ds
> > > 14: 0f .byte 0xf
> > > 15: c1 .byte 0xc1
> > > [ 16.373398] RSP: 0018:ffffa759809d3be0 EFLAGS: 00010246
> > > [ 16.373954] RAX: 0000000000000000 RBX: ffff8fe7a66f93f0 RCX: 0000000000000040
> > > [ 16.374645] RDX: 0000000000000001 RSI: ffff8fdaf9f0d180 RDI: ffff8fe7a66f93f8
> > > [ 16.375338] RBP: ffff8fdaf9f0d180 R08: ffff8fdafba2c320 R09: ffff8fdaf9f0d0c0
> > > [ 16.376028] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fdafa346700
> > > [ 16.376719] R13: ffff8fe7a66f93f8 R14: ffff8fdaf9f0d0c0 R15: 0000000000000001
> > > [ 16.377413] FS: 00007fda724c0740(0000) GS:ffff8fdafba00000(0000) knlGS:0000000000000000
> > > [ 16.378204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 16.378763] CR2: ffff8fe7a66f93f8 CR3: 0000000139d1c006 CR4: 0000000000360ef0
> > > [ 16.379453] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 16.380144] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 16.380864] Call Trace:
> > > [ 16.381112] selem_link_map (/home/afabre/linux/./include/linux/compiler.h:221 /home/afabre/linux/net/core/bpf_sk_storage.c:243)
> > > [ 16.381476] sk_storage_update (/home/afabre/linux/net/core/bpf_sk_storage.c:355 /home/afabre/linux/net/core/bpf_sk_storage.c:414)
> > > [ 16.381888] bpf_sk_storage_get (/home/afabre/linux/net/core/bpf_sk_storage.c:760 /home/afabre/linux/net/core/bpf_sk_storage.c:741)
> > > [ 16.382285] ___bpf_prog_run (/home/afabre/linux/kernel/bpf/core.c:1447)
> > > [ 16.382679] ? __bpf_prog_run32 (/home/afabre/linux/kernel/bpf/core.c:1603)
> > > [ 16.383074] ? alloc_file_pseudo (/home/afabre/linux/fs/file_table.c:232)
> > > [ 16.383486] ? kvm_clock_get_cycles (/home/afabre/linux/arch/x86/kernel/kvmclock.c:98)
> > > [ 16.383906] ? ktime_get (/home/afabre/linux/kernel/time/timekeeping.c:265 /home/afabre/linux/kernel/time/timekeeping.c:369 /home/afabre/linux/kernel/time/timekeeping.c:754)
> > > [ 16.384243] ? bpf_test_run (/home/afabre/linux/net/bpf/test_run.c:47)
> > > [ 16.384613] ? bpf_prog_test_run_skb (/home/afabre/linux/net/bpf/test_run.c:313)
> > > [ 16.385065] ? security_capable (/home/afabre/linux/security/security.c:696 (discriminator 19))
> > > [ 16.385460] ? __do_sys_bpf (/home/afabre/linux/kernel/bpf/syscall.c:2072 /home/afabre/linux/kernel/bpf/syscall.c:2848)
> > > [ 16.385854] ? __handle_mm_fault (/home/afabre/linux/mm/memory.c:3507 /home/afabre/linux/mm/memory.c:3532 /home/afabre/linux/mm/memory.c:3666 /home/afabre/linux/mm/memory.c:3897 /home/afabre/linux/mm/memory.c:4021)
> > > [ 16.386273] ? __dentry_kill (/home/afabre/linux/fs/dcache.c:595)
> > > [ 16.386652] ? do_syscall_64 (/home/afabre/linux/arch/x86/entry/common.c:301)
> > > [ 16.387031] ? entry_SYSCALL_64_after_hwframe (/home/afabre/linux/./include/trace/events/initcall.h:10 /home/afabre/linux/./include/trace/events/initcall.h:10)
> > > [ 16.387541] Modules linked in:
> > > [ 16.387846] CR2: ffff8fe7a66f93f8
> > > [ 16.388175] ---[ end trace 891cf27b5b9c9cc6 ]---
> > > [ 16.388628] RIP: 0010:_raw_spin_lock_bh (/home/afabre/linux/./include/trace/events/initcall.h:48)
> > > [ 16.389089] Code: 02 00 00 31 c0 ba ff 00 00 00 3e 0f b1 17 75 01 c3 e9 82 12 5f ff 66 90 65 81 05 ad 14 6f 41 00 02 00 00 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 01 c3 89 c6 e9 f0 02 5f ff b8 00 02 00 00 3e 0f c1
> > > All code
> > > ========
> > > 0: 02 00 add (%rax),%al
> > > 2: 00 31 add %dh,(%rcx)
> > > 4: c0 ba ff 00 00 00 3e sarb $0x3e,0xff(%rdx)
> > > b: 0f b1 17 cmpxchg %edx,(%rdi)
> > > e: 75 01 jne 0x11
> > > 10: c3 retq
> > > 11: e9 82 12 5f ff jmpq 0xffffffffff5f1298
> > > 16: 66 90 xchg %ax,%ax
> > > 18: 65 81 05 ad 14 6f 41 addl $0x200,%gs:0x416f14ad(%rip) # 0x416f14d0
> > > 1f: 00 02 00 00
> > > 23: 31 c0 xor %eax,%eax
> > > 25: ba 01 00 00 00 mov $0x1,%edx
> > > 2a: 3e 0f b1 17 cmpxchg %edx,%ds:*(%rdi) <-- trapping instruction
> > > 2e: 75 01 jne 0x31
> > > 30: c3 retq
> > > 31: 89 c6 mov %eax,%esi
> > > 33: e9 f0 02 5f ff jmpq 0xffffffffff5f0328
> > > 38: b8 00 02 00 00 mov $0x200,%eax
> > > 3d: 3e ds
> > > 3e: 0f .byte 0xf
> > > 3f: c1 .byte 0xc1
> > >
> > > Code starting with the faulting instruction
> > > ===========================================
> > > 0: 3e 0f b1 17 cmpxchg %edx,%ds:(%rdi)
> > > 4: 75 01 jne 0x7
> > > 6: c3 retq
> > > 7: 89 c6 mov %eax,%esi
> > > 9: e9 f0 02 5f ff jmpq 0xffffffffff5f02fe
> > > e: b8 00 02 00 00 mov $0x200,%eax
> > > 13: 3e ds
> > > 14: 0f .byte 0xf
> > > 15: c1 .byte 0xc1
> > > [ 16.390899] RSP: 0018:ffffa759809d3be0 EFLAGS: 00010246
> > > [ 16.391410] RAX: 0000000000000000 RBX: ffff8fe7a66f93f0 RCX: 0000000000000040
> > > [ 16.392102] RDX: 0000000000000001 RSI: ffff8fdaf9f0d180 RDI: ffff8fe7a66f93f8
> > > [ 16.392795] RBP: ffff8fdaf9f0d180 R08: ffff8fdafba2c320 R09: ffff8fdaf9f0d0c0
> > > [ 16.393481] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fdafa346700
> > > [ 16.394169] R13: ffff8fe7a66f93f8 R14: ffff8fdaf9f0d0c0 R15: 0000000000000001
> > > [ 16.394870] FS: 00007fda724c0740(0000) GS:ffff8fdafba00000(0000) knlGS:0000000000000000
> > > [ 16.395641] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 16.396193] CR2: ffff8fe7a66f93f8 CR3: 0000000139d1c006 CR4: 0000000000360ef0
> > > [ 16.396876] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 16.397557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 16.398246] Kernel panic - not syncing: Fatal exception in interrupt
> > > [ 16.399067] Kernel Offset: 0x3ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > [ 16.400098] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > >
> > > Signed-off-by: Arthur Fabre <[email protected]>
> > > Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
> > > ---
> > > net/core/bpf_sk_storage.c | 8 +++++++-
> > > 1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> > > index f40e3d35fd9c..7ae0686c5418 100644
> > > --- a/net/core/bpf_sk_storage.c
> > > +++ b/net/core/bpf_sk_storage.c
> > > @@ -90,7 +90,13 @@ struct bpf_sk_storage {
> > > static struct bucket *select_bucket(struct bpf_sk_storage_map *smap,
> > > struct bpf_sk_storage_elem *selem)
> > > {
> > > - return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
> > > + /* hash_ptr is undefined behavior with 0 bits */
> > > + int bucket = 0;
> > > + if (smap->bucket_log != 0) {
> > > + bucket = hash_ptr(selem, smap->bucket_log);
> > > + }
> >
> > Would it be better instead to make sure that bucket_log is always at
> > least 1? Having bucket_log as zero can bite us in the future again.
> I also think it is better off to have a max_t(u32, 1, ...) done in
> bpf_sk_storage_map_alloc(). Having an extra bucket should be fine in
> this case.
Makes sense, done in v2.
> >
> > > +
> > > + return &smap->buckets[bucket];
> > > }
> > >
> > > static int omem_charge(struct sock *sk, unsigned int size)
> > > --
> > > 2.20.1
> > >