2024-05-07 06:36:18

by Sam Sun

[permalink] [raw]
Subject: [Linux kernel bug] general protection fault in alloc_object

Dear developers and maintainers,

We encountered a general protection fault in function alloc_object. It
was tested against the latest upstream linux (tag 6.9-rc7). C repro
and kernel config are attached to this email.
Kernel crash log is listed below.
```
general protection fault, probably for non-canonical address
0xdffffc0040000001: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: probably user-memory-access in range
[0x0000000200000008-0x000000020000000f]
CPU: 1 PID: 8107 Comm: systemd-sysctl Not tainted 6.9.0-rc6 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:__hlist_del include/linux/list.h:990 [inline]
RIP: 0010:hlist_del include/linux/list.h:1002 [inline]
RIP: 0010:__alloc_object lib/debugobjects.c:213 [inline]
RIP: 0010:alloc_object+0x124/0x6c0 lib/debugobjects.c:226
Code: 3c 08 00 74 08 4c 89 f7 e8 69 41 58 fd 49 89 2e 48 85 ed 74 27
48 83 c5 08 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c
08 00 74 08 48 89 ef e8 3e 41 58 fd 4c 89 75 00 49 be 00 00
RSP: 0018:ffffc900027ff510 EFLAGS: 00010002
RAX: 0000000040000001 RBX: ffff88802bb21e38 RCX: dffffc0000000000
RDX: ffffffff8b6de400 RSI: ffffffff8bcdd080 RDI: ffffffff8bcdd040
RBP: 0000000200000009 R08: 0000000000000003 R09: fffff520004ffe90
R10: dffffc0000000000 R11: ffffffff81746170 R12: 1ffff110057643c7
R13: ffff88802bb21e40 R14: ffff8880be43a1e0 R15: 1ffff110057643c8
FS: 0000000000000000(0000) GS:ffff8880be400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff335b18298 CR3: 000000006660c000 CR4: 0000000000750ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
lookup_object_or_alloc lib/debugobjects.c:587 [inline]
debug_object_activate+0x23f/0x510 lib/debugobjects.c:710
debug_rcu_head_queue kernel/rcu/rcu.h:227 [inline]
__call_rcu_common kernel/rcu/tree.c:2719 [inline]
call_rcu+0x97/0xa30 kernel/rcu/tree.c:2838
remove_vma mm/mmap.c:148 [inline]
remove_mt mm/mmap.c:2289 [inline]
do_vmi_align_munmap+0x157e/0x1900 mm/mmap.c:2632
do_vmi_munmap+0x24c/0x2d0 mm/mmap.c:2696
mmap_region+0x866/0x1fb0 mm/mmap.c:2747
do_mmap+0x7d4/0xe60 mm/mmap.c:1385
vm_mmap_pgoff+0x1a3/0x400 mm/util.c:573
ksys_mmap_pgoff+0x4fb/0x6d0 mm/mmap.c:1431
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xe4/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x67/0x6f
RIP: 0033:0x7ff335b9be82
Code: eb aa 66 0f 1f 44 00 00 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd
53 89 cb 48 85 ff 74 33 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d
00 f0 ff ff 77 56 5b 5d c3 0f 1f 00 c7 05 ae 02 01 00 16 00
RSP: 002b:00007ffc44458f18 EFLAGS: 00000206 ORIG_RAX: 0000000000000009
RAX: ffffffffffffffda RBX: 0000000000000812 RCX: 00007ff335b9be82
RDX: 0000000000000003 RSI: 0000000000002000 RDI: 00007ff335b16000
RBP: 00007ff335b16000 R08: 0000000000000003 R09: 0000000000008000
R10: 0000000000000812 R11: 0000000000000206 R12: 00007ff335b79460
R13: 00007ffc44458f30 R14: 00007ffc44458fc0 R15: 00007ffc444592c0
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__hlist_del include/linux/list.h:990 [inline]
RIP: 0010:hlist_del include/linux/list.h:1002 [inline]
RIP: 0010:__alloc_object lib/debugobjects.c:213 [inline]
RIP: 0010:alloc_object+0x124/0x6c0 lib/debugobjects.c:226
Code: 3c 08 00 74 08 4c 89 f7 e8 69 41 58 fd 49 89 2e 48 85 ed 74 27
48 83 c5 08 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c
08 00 74 08 48 89 ef e8 3e 41 58 fd 4c 89 75 00 49 be 00 00
RSP: 0018:ffffc900027ff510 EFLAGS: 00010002

RAX: 0000000040000001 RBX: ffff88802bb21e38 RCX: dffffc0000000000
RDX: ffffffff8b6de400 RSI: ffffffff8bcdd080 RDI: ffffffff8bcdd040
RBP: 0000000200000009 R08: 0000000000000003 R09: fffff520004ffe90
R10: dffffc0000000000 R11: ffffffff81746170 R12: 1ffff110057643c7
R13: ffff88802bb21e40 R14: ffff8880be43a1e0 R15: 1ffff110057643c8
FS: 0000000000000000(0000) GS:ffff8880be400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff335b18298 CR3: 000000006660c000 CR4: 0000000000750ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
----------------
Code disassembly (best guess):
0: 3c 08 cmp $0x8,%al
2: 00 74 08 4c add %dh,0x4c(%rax,%rcx,1)
6: 89 f7 mov %esi,%edi
8: e8 69 41 58 fd callq 0xfd584176
d: 49 89 2e mov %rbp,(%r14)
10: 48 85 ed test %rbp,%rbp
13: 74 27 je 0x3c
15: 48 83 c5 08 add $0x8,%rbp
19: 48 89 e8 mov %rbp,%rax
1c: 48 c1 e8 03 shr $0x3,%rax
20: 48 b9 00 00 00 00 00 movabs $0xdffffc0000000000,%rcx
27: fc ff df
* 2a: 80 3c 08 00 cmpb $0x0,(%rax,%rcx,1) <-- trapping instruction
2e: 74 08 je 0x38
30: 48 89 ef mov %rbp,%rdi
33: e8 3e 41 58 fd callq 0xfd584176
38: 4c 89 75 00 mov %r14,0x0(%rbp)
3c: 49 rex.WB
3d: be .byte 0xbe
```
If you have any questions, please contact us.

Reported by Yue Sun <[email protected]>
Reported by xingwei lee <[email protected]>

Best Regards,
Yue


Attachments:
config (242.11 kB)
alloc_object.c (41.45 kB)
Download all attachments

2024-05-08 14:21:03

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in alloc_object

On Tue, May 07 2024 at 14:32, Sam Sun wrote:
> ```
> general protection fault, probably for non-canonical address
> 0xdffffc0040000001: 0000 [#1] PREEMPT SMP KASAN NOPTI
> KASAN: probably user-memory-access in range
> [0x0000000200000008-0x000000020000000f]

This is a reiserfs issue. It crashes at random places:

[ 348.634665][ T5992] REISERFS (device loop0): Using tea hash to sort names
[ 348.780602][ T5993] (udev-worker)[5993]: segfault at 200000001 ip 0000000200000001 sp 00007fffca0e6190 error 14 in udevadm[5613a8f19000+1a000] likely on CPU 3 (core 0, socket 3)
[ 348.796165][ T5993] Code: Unable to access opcode bytes at 0x1ffffffd7.
[ 348.831600][ T5016] systemd-journald[5016]: /var/log/journal/a042c4e41bfd4c9697a628486ba7707d/system.journal: Journal file corrupted, rotating.
[ 348.840565][ T6004] systemd-udevd[6004]: segfault at 100040048 ip 00007fde601b58a3 sp 00007fffca0e6250 error 4 in libc.so.6[7fde60108000+155000] likely on CPU 5 (core 0, socket 5)
[ 348.844214][ T6004] Code: 89 10 49 8b b4 24 a8 10 00 00 eb 34 0f 1f 00 4c 8b 2d 69 f5 0f 00 64 45 8b 75 00 e8 27 42 fc ff e8 52 fe fa ff e9 01 fe ff ff <48> 8b 0a 48 8b 42 08 48 89 41 08 48 89 08 49 8b b4 24 a8 10 00 00
[ 356.765557][ T5992] ==================================================================
[ 356.767188][ T5992] BUG: unable to handle page fault for address: 0000000100040058
[ 356.767204][ T5992] #PF: supervisor read access in kernel mode
[ 356.767219][ T5992] #PF: error_code(0x0000) - not-present page
[ 356.767233][ T5992] PGD 80000004ca01f067 P4D 80000004ca01f067 PUD 0
[ 356.767266][ T5992] Oops: 0000 [#1] PREEMPT SMP KASAN PTI
[ 356.767294][ T5992] CPU: 4 PID: 5992 Comm: a Not tainted 6.9.0-rc7-00012-gdccb07f2914c-dirty #43
[ 356.767325][ T5992] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 356.767342][ T5992] RIP: 0010:stack_depot_save_flags+0x14b/0x8e0

Can we just get rid of this mess?

Thanks,

tglx

2024-05-09 15:45:43

by David Sterba

[permalink] [raw]
Subject: Re: [Linux kernel bug] general protection fault in alloc_object

On Wed, May 08, 2024 at 04:20:53PM +0200, Thomas Gleixner wrote:
> On Tue, May 07 2024 at 14:32, Sam Sun wrote:
> > ```
> > general protection fault, probably for non-canonical address
> > 0xdffffc0040000001: 0000 [#1] PREEMPT SMP KASAN NOPTI
> > KASAN: probably user-memory-access in range
> > [0x0000000200000008-0x000000020000000f]
>
> This is a reiserfs issue. It crashes at random places:
>
> [ 348.634665][ T5992] REISERFS (device loop0): Using tea hash to sort names
> [ 348.780602][ T5993] (udev-worker)[5993]: segfault at 200000001 ip 0000000200000001 sp 00007fffca0e6190 error 14 in udevadm[5613a8f19000+1a000] likely on CPU 3 (core 0, socket 3)
> [ 348.796165][ T5993] Code: Unable to access opcode bytes at 0x1ffffffd7.
> [ 348.831600][ T5016] systemd-journald[5016]: /var/log/journal/a042c4e41bfd4c9697a628486ba7707d/system.journal: Journal file corrupted, rotating.
> [ 348.840565][ T6004] systemd-udevd[6004]: segfault at 100040048 ip 00007fde601b58a3 sp 00007fffca0e6250 error 4 in libc.so.6[7fde60108000+155000] likely on CPU 5 (core 0, socket 5)
> [ 348.844214][ T6004] Code: 89 10 49 8b b4 24 a8 10 00 00 eb 34 0f 1f 00 4c 8b 2d 69 f5 0f 00 64 45 8b 75 00 e8 27 42 fc ff e8 52 fe fa ff e9 01 fe ff ff <48> 8b 0a 48 8b 42 08 48 89 41 08 48 89 08 49 8b b4 24 a8 10 00 00
> [ 356.765557][ T5992] ==================================================================
> [ 356.767188][ T5992] BUG: unable to handle page fault for address: 0000000100040058
> [ 356.767204][ T5992] #PF: supervisor read access in kernel mode
> [ 356.767219][ T5992] #PF: error_code(0x0000) - not-present page
> [ 356.767233][ T5992] PGD 80000004ca01f067 P4D 80000004ca01f067 PUD 0
> [ 356.767266][ T5992] Oops: 0000 [#1] PREEMPT SMP KASAN PTI
> [ 356.767294][ T5992] CPU: 4 PID: 5992 Comm: a Not tainted 6.9.0-rc7-00012-gdccb07f2914c-dirty #43
> [ 356.767325][ T5992] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 356.767342][ T5992] RIP: 0010:stack_depot_save_flags+0x14b/0x8e0
>
> Can we just get rid of this mess?

It's been on the deprecation and removal path, scheduled for 2025.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eb103a51640ee32ab01c51e13bf8fca211f25f61
I wouldn't be surpised if somebody sends a patch on 1.1. to do that.