2024-06-03 10:29:39

by syzbot

[permalink] [raw]
Subject: [syzbot] [kernel?] BUG: unable to handle kernel NULL pointer dereference in __hrtimer_run_queues

Hello,

syzbot found the following issue on:

HEAD commit: 4a4be1ad3a6e Revert "vfs: Delete the associated dentry whe..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1422a73c980000
kernel config: https://syzkaller.appspot.com/x/.config?x=bd6024aedb15e15c
dashboard link: https://syzkaller.appspot.com/bug?extid=558f67d44ad7f098a3de
compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15583162980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c1b514980000

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/384ffdcca292/non_bootable_disk-4a4be1ad.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/75957361122b/vmlinux-4a4be1ad.xz
kernel image: https://storage.googleapis.com/syzbot-assets/6c766b0ec377/Image-4a4be1ad.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000090
Mem abort info:
ESR = 0x0000000096000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x06: level 2 translation fault
Data abort info:
ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 52-bit VAs, pgdp=000000004605bb80
[0000000000000090] pgd=08000000464ee003, p4d=08000000472aa003, pud=08000000471b8003, pmd=0000000000000000
Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 3192 Comm: syz-executor607 Not tainted 6.10.0-rc1-syzkaller-00027-g4a4be1ad3a6e #0
Hardware name: linux,dummy-virt (DT)
pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : rb_next+0x1c/0x54 lib/rbtree.c:505
lr : rb_erase_cached include/linux/rbtree.h:124 [inline]
lr : timerqueue_del+0x38/0x70 lib/timerqueue.c:57
sp : ffff800080003e70
x29: ffff800080003e70 x28: 0000000000000000 x27: fff000007f8cf780
x26: 0000000000000001 x25: 00000000000000c0 x24: 0000001f0198bc90
x23: fff000007f8cf780 x22: fff000007f8cf7e0 x21: fff000007f8cf780
x20: fff000007f8cf7e0 x19: ffff800088c3bd60 x18: 0000000000000000
x17: fff07ffffd319000 x16: ffff800080000000 x15: 0000ffffef309d38
x14: 00000000000003bb x13: 0000000000000000 x12: ffff8000825e0028
x11: 0000000000000001 x10: 0000000000000200 x9 : 0000000000200000
x8 : 0008000000000000 x7 : ff7ffffffffffbff x6 : 00000000019a23f5
x5 : fff07ffffd319000 x4 : 000000000a2dca90 x3 : ffff800088c3bd60
x2 : ff7000007f8cf8e8 x1 : 0000000000000080 x0 : 0000000000000080
Call trace:
rb_next+0x1c/0x54 lib/rbtree.c:505
__remove_hrtimer kernel/time/hrtimer.c:1118 [inline]
__run_hrtimer kernel/time/hrtimer.c:1667 [inline]
__hrtimer_run_queues+0x104/0x1bc kernel/time/hrtimer.c:1751
hrtimer_interrupt+0xe8/0x244 kernel/time/hrtimer.c:1813
timer_handler drivers/clocksource/arm_arch_timer.c:674 [inline]
arch_timer_handler_phys+0x2c/0x44 drivers/clocksource/arm_arch_timer.c:692
handle_percpu_devid_irq+0x84/0x130 kernel/irq/chip.c:942
generic_handle_irq_desc include/linux/irqdesc.h:173 [inline]
handle_irq_desc kernel/irq/irqdesc.c:691 [inline]
generic_handle_domain_irq+0x2c/0x44 kernel/irq/irqdesc.c:747
gic_handle_irq+0x40/0xc4 drivers/irqchip/irq-gic.c:370
call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:889
do_interrupt_handler+0x80/0x84 arch/arm64/kernel/entry-common.c:310
__el1_irq arch/arm64/kernel/entry-common.c:536 [inline]
el1_interrupt+0x34/0x64 arch/arm64/kernel/entry-common.c:551
el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:556
el1h_64_irq+0x64/0x68 arch/arm64/kernel/entry.S:594
__clear_young_dirty_ptes arch/arm64/include/asm/pgtable.h:1311 [inline]
contpte_clear_young_dirty_ptes+0x68/0x128 arch/arm64/mm/contpte.c:389
walk_pmd_range mm/pagewalk.c:143 [inline]
walk_pud_range mm/pagewalk.c:221 [inline]
walk_p4d_range mm/pagewalk.c:256 [inline]
walk_pgd_range+0x4b0/0x8a4 mm/pagewalk.c:293
__walk_page_range+0x178/0x180 mm/pagewalk.c:395
walk_page_range+0x144/0x224 mm/pagewalk.c:521
madvise_free_single_vma+0x134/0x2bc mm/madvise.c:815
madvise_dontneed_free mm/madvise.c:929 [inline]
madvise_vma_behavior+0x1d0/0x790 mm/madvise.c:1046
madvise_walk_vmas+0xbc/0x12c mm/madvise.c:1268
do_madvise+0x160/0x418 mm/madvise.c:1464
__do_sys_madvise mm/madvise.c:1481 [inline]
__se_sys_madvise mm/madvise.c:1479 [inline]
__arm64_sys_madvise+0x24/0x34 mm/madvise.c:1479
__invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
invoke_syscall+0x48/0x118 arch/arm64/kernel/syscall.c:48
el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:133
do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:152
el0_svc+0x34/0xf8 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
Code: 54000200 f9400401 b4000141 aa0103e0 (f9400821)
---[ end trace 0000000000000000 ]---
----------------
Code disassembly (best guess):
0: 54000200 b.eq 0x40 // b.none
4: f9400401 ldr x1, [x0, #8]
8: b4000141 cbz x1, 0x30
c: aa0103e0 mov x0, x1
* 10: f9400821 ldr x1, [x1, #16] <-- trapping instruction


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


2024-06-03 11:04:52

by Hillf Danton

[permalink] [raw]
Subject: Re: [syzbot] [kernel?] BUG: unable to handle kernel NULL pointer dereference in __hrtimer_run_queues

On Mon, 03 Jun 2024 03:22:29 -0700
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000090
> CPU: 0 PID: 3192 Comm: syz-executor607 Not tainted 6.10.0-rc1-syzkaller-00027-g4a4be1ad3a6e #0
> Hardware name: linux,dummy-virt (DT)
> pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : rb_next+0x1c/0x54 lib/rbtree.c:505
> lr : rb_erase_cached include/linux/rbtree.h:124 [inline]
> lr : timerqueue_del+0x38/0x70 lib/timerqueue.c:57
> sp : ffff800080003e70
> x29: ffff800080003e70 x28: 0000000000000000 x27: fff000007f8cf780
> x26: 0000000000000001 x25: 00000000000000c0 x24: 0000001f0198bc90
> x23: fff000007f8cf780 x22: fff000007f8cf7e0 x21: fff000007f8cf780
> x20: fff000007f8cf7e0 x19: ffff800088c3bd60 x18: 0000000000000000
> x17: fff07ffffd319000 x16: ffff800080000000 x15: 0000ffffef309d38
> x14: 00000000000003bb x13: 0000000000000000 x12: ffff8000825e0028
> x11: 0000000000000001 x10: 0000000000000200 x9 : 0000000000200000
> x8 : 0008000000000000 x7 : ff7ffffffffffbff x6 : 00000000019a23f5
> x5 : fff07ffffd319000 x4 : 000000000a2dca90 x3 : ffff800088c3bd60
> x2 : ff7000007f8cf8e8 x1 : 0000000000000080 x0 : 0000000000000080
> Call trace:
> rb_next+0x1c/0x54 lib/rbtree.c:505
> __remove_hrtimer kernel/time/hrtimer.c:1118 [inline]
> __run_hrtimer kernel/time/hrtimer.c:1667 [inline]
> __hrtimer_run_queues+0x104/0x1bc kernel/time/hrtimer.c:1751
> hrtimer_interrupt+0xe8/0x244 kernel/time/hrtimer.c:1813

After scratching head skin 30 minutes I failed to work out how the timer
was armed.

2024-06-04 12:30:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [syzbot] [kernel?] BUG: unable to handle kernel NULL pointer dereference in __hrtimer_run_queues

On Mon, Jun 03 2024 at 03:22, syzbot wrote:

Cc+ ARM64 folks

Content untrimmed for reference.

> syzbot found the following issue on:
>
> HEAD commit: 4a4be1ad3a6e Revert "vfs: Delete the associated dentry whe..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1422a73c980000
> kernel config: https://syzkaller.appspot.com/x/.config?x=bd6024aedb15e15c
> dashboard link: https://syzkaller.appspot.com/bug?extid=558f67d44ad7f098a3de
> compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> userspace arch: arm64
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15583162980000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c1b514980000
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/384ffdcca292/non_bootable_disk-4a4be1ad.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/75957361122b/vmlinux-4a4be1ad.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/6c766b0ec377/Image-4a4be1ad.gz.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000090
> Mem abort info:
> ESR = 0x0000000096000006
> EC = 0x25: DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> FSC = 0x06: level 2 translation fault
> Data abort info:
> ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
> CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> user pgtable: 4k pages, 52-bit VAs, pgdp=000000004605bb80
> [0000000000000090] pgd=08000000464ee003, p4d=08000000472aa003, pud=08000000471b8003, pmd=0000000000000000
> Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 0 PID: 3192 Comm: syz-executor607 Not tainted 6.10.0-rc1-syzkaller-00027-g4a4be1ad3a6e #0
> Hardware name: linux,dummy-virt (DT)
> pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : rb_next+0x1c/0x54 lib/rbtree.c:505
> lr : rb_erase_cached include/linux/rbtree.h:124 [inline]
> lr : timerqueue_del+0x38/0x70 lib/timerqueue.c:57
> sp : ffff800080003e70
> x29: ffff800080003e70 x28: 0000000000000000 x27: fff000007f8cf780
> x26: 0000000000000001 x25: 00000000000000c0 x24: 0000001f0198bc90
> x23: fff000007f8cf780 x22: fff000007f8cf7e0 x21: fff000007f8cf780
> x20: fff000007f8cf7e0 x19: ffff800088c3bd60 x18: 0000000000000000
> x17: fff07ffffd319000 x16: ffff800080000000 x15: 0000ffffef309d38
> x14: 00000000000003bb x13: 0000000000000000 x12: ffff8000825e0028
> x11: 0000000000000001 x10: 0000000000000200 x9 : 0000000000200000
> x8 : 0008000000000000 x7 : ff7ffffffffffbff x6 : 00000000019a23f5
> x5 : fff07ffffd319000 x4 : 000000000a2dca90 x3 : ffff800088c3bd60
> x2 : ff7000007f8cf8e8 x1 : 0000000000000080 x0 : 0000000000000080
> Call trace:
> rb_next+0x1c/0x54 lib/rbtree.c:505
> __remove_hrtimer kernel/time/hrtimer.c:1118 [inline]
> __run_hrtimer kernel/time/hrtimer.c:1667 [inline]
> __hrtimer_run_queues+0x104/0x1bc kernel/time/hrtimer.c:1751
> hrtimer_interrupt+0xe8/0x244 kernel/time/hrtimer.c:1813
> timer_handler drivers/clocksource/arm_arch_timer.c:674 [inline]
> arch_timer_handler_phys+0x2c/0x44 drivers/clocksource/arm_arch_timer.c:692
> handle_percpu_devid_irq+0x84/0x130 kernel/irq/chip.c:942
> generic_handle_irq_desc include/linux/irqdesc.h:173 [inline]
> handle_irq_desc kernel/irq/irqdesc.c:691 [inline]
> generic_handle_domain_irq+0x2c/0x44 kernel/irq/irqdesc.c:747
> gic_handle_irq+0x40/0xc4 drivers/irqchip/irq-gic.c:370
> call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:889
> do_interrupt_handler+0x80/0x84 arch/arm64/kernel/entry-common.c:310
> __el1_irq arch/arm64/kernel/entry-common.c:536 [inline]
> el1_interrupt+0x34/0x64 arch/arm64/kernel/entry-common.c:551
> el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:556
> el1h_64_irq+0x64/0x68 arch/arm64/kernel/entry.S:594
> __clear_young_dirty_ptes arch/arm64/include/asm/pgtable.h:1311 [inline]
> contpte_clear_young_dirty_ptes+0x68/0x128 arch/arm64/mm/contpte.c:389
> walk_pmd_range mm/pagewalk.c:143 [inline]
> walk_pud_range mm/pagewalk.c:221 [inline]
> walk_p4d_range mm/pagewalk.c:256 [inline]
> walk_pgd_range+0x4b0/0x8a4 mm/pagewalk.c:293
> __walk_page_range+0x178/0x180 mm/pagewalk.c:395
> walk_page_range+0x144/0x224 mm/pagewalk.c:521
> madvise_free_single_vma+0x134/0x2bc mm/madvise.c:815
> madvise_dontneed_free mm/madvise.c:929 [inline]
> madvise_vma_behavior+0x1d0/0x790 mm/madvise.c:1046
> madvise_walk_vmas+0xbc/0x12c mm/madvise.c:1268
> do_madvise+0x160/0x418 mm/madvise.c:1464
> __do_sys_madvise mm/madvise.c:1481 [inline]
> __se_sys_madvise mm/madvise.c:1479 [inline]
> __arm64_sys_madvise+0x24/0x34 mm/madvise.c:1479
> __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
> invoke_syscall+0x48/0x118 arch/arm64/kernel/syscall.c:48
> el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:133
> do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:152
> el0_svc+0x34/0xf8 arch/arm64/kernel/entry-common.c:712
> el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
> el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
> Code: 54000200 f9400401 b4000141 aa0103e0 (f9400821)
> ---[ end trace 0000000000000000 ]---
> ----------------
> Code disassembly (best guess):
> 0: 54000200 b.eq 0x40 // b.none
> 4: f9400401 ldr x1, [x0, #8]
> 8: b4000141 cbz x1, 0x30
> c: aa0103e0 mov x0, x1
> * 10: f9400821 ldr x1, [x1, #16] <-- trapping instruction

So this is the following code in rb_next():

> 4: f9400401 ldr x1, [x0, #8] // Offset 8 in @node
> 8: b4000141 cbz x1, 0x30
if (node->rb_right) {

> c: aa0103e0 mov x0, x1 // Saves node::rb_right
node = node->rb_right;

> * 10: f9400821 ldr x1, [x1, #16] <-- trapping instruction
while (node->rb_left)

> x2 : ff7000007f8cf8e8 x1 : 0000000000000080 x0 : 0000000000000080

which obviously crashes. Now the question is how does the original node
end up with node::rb_right == 0x80?

I doubt that this is a hrtimer or rbtree problem. It smells like random
data corruption caused by whatever. It might not even be an ARM64
specific issue though the C repro does not trigger on x86...

Handing it over to Catalin and Will.

Thanks,

tglx

2024-06-04 12:46:00

by Hillf Danton

[permalink] [raw]
Subject: Re: [syzbot] [kernel?] BUG: unable to handle kernel NULL pointer dereference in __hrtimer_run_queues

On Mon, 03 Jun 2024 03:22:29 -0700
> syzbot found the following issue on:
>
> HEAD commit: 4a4be1ad3a6e Revert "vfs: Delete the associated dentry whe..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c1b514980000

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

--- a/arch/arm64/mm/contpte.c
+++ b/arch/arm64/mm/contpte.c
@@ -376,7 +376,7 @@ void contpte_clear_young_dirty_ptes(struct vm_area_struct *vma,
* clearing access/dirty for the whole block.
*/
unsigned long start = addr;
- unsigned long end = start + nr;
+ unsigned long end = start + nr * PAGE_SIZE;

if (pte_cont(__ptep_get(ptep + nr - 1)))
end = ALIGN(end, CONT_PTE_SIZE);
@@ -386,7 +386,7 @@ void contpte_clear_young_dirty_ptes(struct vm_area_struct *vma,
ptep = contpte_align_down(ptep);
}

- __clear_young_dirty_ptes(vma, start, ptep, end - start, flags);
+ __clear_young_dirty_ptes(vma, start, ptep, (end - start) / PAGE_SIZE, flags);
}
EXPORT_SYMBOL_GPL(contpte_clear_young_dirty_ptes);

--

2024-06-04 13:30:09

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [kernel?] BUG: unable to handle kernel NULL pointer dereference in __hrtimer_run_queues

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: [email protected]

Tested on:

commit: 2ab79514 Merge tag 'cxl-fixes-6.10-rc3' of git://git.k..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=121dd4ac980000
kernel config: https://syzkaller.appspot.com/x/.config?x=48aeb395bedeb71f
dashboard link: https://syzkaller.appspot.com/bug?extid=558f67d44ad7f098a3de
compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=1346c226980000

Note: testing is done by a robot and is best-effort only.

2024-06-04 13:35:23

by Will Deacon

[permalink] [raw]
Subject: Re: [syzbot] [kernel?] BUG: unable to handle kernel NULL pointer dereference in __hrtimer_run_queues

Hi Thomas,

On Tue, Jun 04, 2024 at 02:29:57PM +0200, Thomas Gleixner wrote:
> On Mon, Jun 03 2024 at 03:22, syzbot wrote:
> Cc+ ARM64 folks
>
> Content untrimmed for reference.

Thanks! I'll trim it now...

> > __clear_young_dirty_ptes arch/arm64/include/asm/pgtable.h:1311 [inline]
> > contpte_clear_young_dirty_ptes+0x68/0x128 arch/arm64/mm/contpte.c:389
> > walk_pmd_range mm/pagewalk.c:143 [inline]
> > walk_pud_range mm/pagewalk.c:221 [inline]
> > walk_p4d_range mm/pagewalk.c:256 [inline]
> > walk_pgd_range+0x4b0/0x8a4 mm/pagewalk.c:293
> > __walk_page_range+0x178/0x180 mm/pagewalk.c:395
> > walk_page_range+0x144/0x224 mm/pagewalk.c:521
> > madvise_free_single_vma+0x134/0x2bc mm/madvise.c:815
> > madvise_dontneed_free mm/madvise.c:929 [inline]
> > madvise_vma_behavior+0x1d0/0x790 mm/madvise.c:1046
> > madvise_walk_vmas+0xbc/0x12c mm/madvise.c:1268
> > do_madvise+0x160/0x418 mm/madvise.c:1464
> > __do_sys_madvise mm/madvise.c:1481 [inline]
> > __se_sys_madvise mm/madvise.c:1479 [inline]
> > __arm64_sys_madvise+0x24/0x34 mm/madvise.c:1479
> > __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
> > invoke_syscall+0x48/0x118 arch/arm64/kernel/syscall.c:48
> > el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:133
> > do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:152
> > el0_svc+0x34/0xf8 arch/arm64/kernel/entry-common.c:712
> > el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
> > el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
> > Code: 54000200 f9400401 b4000141 aa0103e0 (f9400821)
> > ---[ end trace 0000000000000000 ]---
> > ----------------
> > Code disassembly (best guess):
> > 0: 54000200 b.eq 0x40 // b.none
> > 4: f9400401 ldr x1, [x0, #8]
> > 8: b4000141 cbz x1, 0x30
> > c: aa0103e0 mov x0, x1
> > * 10: f9400821 ldr x1, [x1, #16] <-- trapping instruction
>
> So this is the following code in rb_next():
>
> > 4: f9400401 ldr x1, [x0, #8] // Offset 8 in @node
> > 8: b4000141 cbz x1, 0x30
> if (node->rb_right) {
>
> > c: aa0103e0 mov x0, x1 // Saves node::rb_right
> node = node->rb_right;
>
> > * 10: f9400821 ldr x1, [x1, #16] <-- trapping instruction
> while (node->rb_left)
>
> > x2 : ff7000007f8cf8e8 x1 : 0000000000000080 x0 : 0000000000000080
>
> which obviously crashes. Now the question is how does the original node
> end up with node::rb_right == 0x80?
>
> I doubt that this is a hrtimer or rbtree problem. It smells like random
> data corruption caused by whatever. It might not even be an ARM64
> specific issue though the C repro does not trigger on x86...
>
> Handing it over to Catalin and Will.

I suspect this is a duplicate of:

https://lore.kernel.org/lkml/20240604110119.GA20284@willie-the-truck/

and there's a fix queued in the -mm tree.

Will

2024-06-04 16:10:13

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [syzbot] [kernel?] BUG: unable to handle kernel NULL pointer dereference in __hrtimer_run_queues

Will!

On Tue, Jun 04 2024 at 14:34, Will Deacon wrote:
> On Tue, Jun 04, 2024 at 02:29:57PM +0200, Thomas Gleixner wrote:
>> On Mon, Jun 03 2024 at 03:22, syzbot wrote:
>> > * 10: f9400821 ldr x1, [x1, #16] <-- trapping instruction
>> while (node->rb_left)
>>
>> > x2 : ff7000007f8cf8e8 x1 : 0000000000000080 x0 : 0000000000000080
>>
>> which obviously crashes. Now the question is how does the original node
>> end up with node::rb_right == 0x80?
>>
>> I doubt that this is a hrtimer or rbtree problem. It smells like random
>> data corruption caused by whatever. It might not even be an ARM64
>> specific issue though the C repro does not trigger on x86...
>>
>> Handing it over to Catalin and Will.
>
> I suspect this is a duplicate of:
>
> https://lore.kernel.org/lkml/20240604110119.GA20284@willie-the-truck/
>
> and there's a fix queued in the -mm tree.

That looks very much so.

Thanks,

tglx