2024-03-21 04:04:34

by syzbot

[permalink] [raw]
Subject: [syzbot] [mm?] kernel BUG in const_folio_flags

Hello,

syzbot found the following issue on:

HEAD commit: 78c3925c048c Merge tag 'soc-late-6.9' of git://git.kernel...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1267d879180000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3c2635ded15fbc9
dashboard link: https://syzkaller.appspot.com/bug?extid=3b9148f91b7869120e81
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-78c3925c.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/cf2bceeccde3/vmlinux-78c3925c.xz
kernel image: https://storage.googleapis.com/syzbot-assets/fc938dfaea6d/bzImage-78c3925c.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

veth_newlink+0x627/0xa10 drivers/net/veth.c:1895
rtnl_newlink_create net/core/rtnetlink.c:3494 [inline]
__rtnl_newlink+0x119c/0x1960 net/core/rtnetlink.c:3714
rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3727
rtnetlink_rcv_msg+0x3c7/0xe60 net/core/rtnetlink.c:6595
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:315!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 PID: 42 Comm: kcompactd0 Not tainted 6.8.0-syzkaller-11725-g78c3925c048c #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:const_folio_flags+0x1bd/0x1f0 include/linux/page-flags.h:315
Code: 41 83 e4 01 44 89 e6 e8 b1 e6 a9 ff 45 84 e4 0f 85 c4 fe ff ff e8 23 ec a9 ff 48 c7 c6 e0 07 1b 8b 48 89 ef e8 34 2e ed ff 90 <0f> 0b e8 8c 6b 06 00 e9 66 fe ff ff 48 89 ef e8 7f 6b 06 00 eb b6
RSP: 0018:ffffc9000068f7f0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc9000068f698
RDX: ffff88801744c880 RSI: ffffffff81e4265c RDI: ffffffff8b6f0060
RBP: ffffea0000a04c00 R08: 0000000000000000 R09: fffffbfff1f3deca
R10: ffffffff8f9ef657 R11: 0000000000000000 R12: 0000000000000000
R13: ffffea0000a04dc0 R14: 0000000000028137 R15: ffffc9000068fbe8
FS: 0000000000000000(0000) GS:ffff88802c300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe623b9138 CR3: 000000001c22c000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
folio_test_hugetlb include/linux/page-flags.h:875 [inline]
PageHuge+0x219/0x2b0 mm/hugetlb.c:2174
isolate_migratepages_block+0x4a0/0x5110 mm/compaction.c:1004
isolate_migratepages mm/compaction.c:2182 [inline]
compact_zone+0x1a5c/0x4280 mm/compaction.c:2629
kcompactd_do_work+0x340/0x720 mm/compaction.c:3100
kcompactd+0x8d7/0xde0 mm/compaction.c:3199
kthread+0x2c1/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:const_folio_flags+0x1bd/0x1f0 include/linux/page-flags.h:315
Code: 41 83 e4 01 44 89 e6 e8 b1 e6 a9 ff 45 84 e4 0f 85 c4 fe ff ff e8 23 ec a9 ff 48 c7 c6 e0 07 1b 8b 48 89 ef e8 34 2e ed ff 90 <0f> 0b e8 8c 6b 06 00 e9 66 fe ff ff 48 89 ef e8 7f 6b 06 00 eb b6
RSP: 0018:ffffc9000068f7f0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc9000068f698
RDX: ffff88801744c880 RSI: ffffffff81e4265c RDI: ffffffff8b6f0060
RBP: ffffea0000a04c00 R08: 0000000000000000 R09: fffffbfff1f3deca
R10: ffffffff8f9ef657 R11: 0000000000000000 R12: 0000000000000000
R13: ffffea0000a04dc0 R14: 0000000000028137 R15: ffffc9000068fbe8
FS: 0000000000000000(0000) GS:ffff88802c300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe623b9138 CR3: 000000001c22c000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


2024-03-21 09:58:30

by David Hildenbrand

[permalink] [raw]
Subject: Re: [syzbot] [mm?] kernel BUG in const_folio_flags

On 21.03.24 10:49, Muchun Song wrote:
>
>
>> On Mar 21, 2024, at 12:04, syzbot <[email protected]> wrote:
>>
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: 78c3925c048c Merge tag 'soc-late-6.9' of git://git.kernel...
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1267d879180000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=f3c2635ded15fbc9
>> dashboard link: https://syzkaller.appspot.com/bug?extid=3b9148f91b7869120e81
>> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>> userspace arch: i386
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> Downloadable assets:
>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-78c3925c.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/cf2bceeccde3/vmlinux-78c3925c.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/fc938dfaea6d/bzImage-78c3925c.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: [email protected]
>>
>> veth_newlink+0x627/0xa10 drivers/net/veth.c:1895
>> rtnl_newlink_create net/core/rtnetlink.c:3494 [inline]
>> __rtnl_newlink+0x119c/0x1960 net/core/rtnetlink.c:3714
>> rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3727
>> rtnetlink_rcv_msg+0x3c7/0xe60 net/core/rtnetlink.c:6595
>> ------------[ cut here ]------------
>> kernel BUG at include/linux/page-flags.h:315!
>
> There are some more page dumping information from console:
>
> [ 61.367144][ T42] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888028132880 pfn:0x28130
> [ 61.371430][ T42] flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
> [ 61.374455][ T42] page_type: 0xffffffff()
> [ 61.376096][ T42] raw: 00fff80000000000 ffff888015ecd540 dead000000000100 0000000000000000
> [ 61.379994][ T42] raw: ffff888028132880 0000000000190000 00000000ffffffff 0000000000000000
>
> Alright, the page is freed (with a refcount of 0).
>
>> invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
>> CPU: 1 PID: 42 Comm: kcompactd0 Not tainted 6.8.0-syzkaller-11725-g78c3925c048c #0
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> RIP: 0010:const_folio_flags+0x1bd/0x1f0 include/linux/page-flags.h:315
>
> The RIP is in const_folio_flags() (called from folio_test_hugetlb()):
>
> VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>
> It is reasonable to WARN because the page is freed (PG_head is not set
> in this case).
>
> The comments from folio_test_hugetlb() says "Caller should have a
> reference on the folio", so the caller of PageHuge() should grab
> a refcount before calling folio_test_hugetlb() since commit
> 9c5ccf2db04b. But it does not mean that the @page must be a HugeTLB page
> even if PageHuge(@page) returns true when the user does not hold
> a extra refcount on the @page. Seems the WARN could be acceptable, so
> should we remove this WARN? I am not sure. Cc more experts.

Isn't this the problem Willy is fixing with the upcoing
folio_test_hugetlb() changes?

We cannot always grab a folio reference on hugetlb folios: free hugetlb
folios have a refcount of 0.

--
Cheers,

David / dhildenb


2024-03-21 10:18:45

by Oscar Salvador

[permalink] [raw]
Subject: Re: [syzbot] [mm?] kernel BUG in const_folio_flags

On Thu, Mar 21, 2024 at 05:49:49PM +0800, Muchun Song wrote:
> There are some more page dumping information from console:
>
> [ 61.367144][ T42] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888028132880 pfn:0x28130
> [ 61.371430][ T42] flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
> [ 61.374455][ T42] page_type: 0xffffffff()
> [ 61.376096][ T42] raw: 00fff80000000000 ffff888015ecd540 dead000000000100 0000000000000000
> [ 61.379994][ T42] raw: ffff888028132880 0000000000190000 00000000ffffffff 0000000000000000
>
> Alright, the page is freed (with a refcount of 0).

Yes, basically the page changed betwen folio_test_large() (returned true
for PG_Head) and the call to const_folio_flags() (which now returned
false for PG_Head).

As David pointed out, Willy is working on making PageHutelb more
robust [1].


[1] https://lore.kernel.org/linux-mm/[email protected]/

--
Oscar Salvador
SUSE Labs

2024-03-21 09:57:18

by Muchun Song

[permalink] [raw]
Subject: Re: [syzbot] [mm?] kernel BUG in const_folio_flags



> On Mar 21, 2024, at 12:04, syzbot <[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 78c3925c048c Merge tag 'soc-late-6.9' of git://git.kernel...
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1267d879180000
> kernel config: https://syzkaller.appspot.com/x/.config?x=f3c2635ded15fbc9
> dashboard link: https://syzkaller.appspot.com/bug?extid=3b9148f91b7869120e81
> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> userspace arch: i386
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-78c3925c.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/cf2bceeccde3/vmlinux-78c3925cxz
> kernel image: https://storage.googleapis.com/syzbot-assets/fc938dfaea6d/bzImage-78c3925cxz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> veth_newlink+0x627/0xa10 drivers/net/veth.c:1895
> rtnl_newlink_create net/core/rtnetlink.c:3494 [inline]
> __rtnl_newlink+0x119c/0x1960 net/core/rtnetlink.c:3714
> rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3727
> rtnetlink_rcv_msg+0x3c7/0xe60 net/core/rtnetlink.c:6595
> ------------[ cut here ]------------
> kernel BUG at include/linux/page-flags.h:315!

There are some more page dumping information from console:

[ 61.367144][ T42] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888028132880 pfn:0x28130
[ 61.371430][ T42] flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
[ 61.374455][ T42] page_type: 0xffffffff()
[ 61.376096][ T42] raw: 00fff80000000000 ffff888015ecd540 dead000000000100 0000000000000000
[ 61.379994][ T42] raw: ffff888028132880 0000000000190000 00000000ffffffff 0000000000000000

Alright, the page is freed (with a refcount of 0).

> invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> CPU: 1 PID: 42 Comm: kcompactd0 Not tainted 6.8.0-syzkaller-11725-g78c3925c048c #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:const_folio_flags+0x1bd/0x1f0 include/linux/page-flags.h:315

The RIP is in const_folio_flags() (called from folio_test_hugetlb()):

VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);

It is reasonable to WARN because the page is freed (PG_head is not set
in this case).

The comments from folio_test_hugetlb() says "Caller should have a
reference on the folio", so the caller of PageHuge() should grab
a refcount before calling folio_test_hugetlb() since commit
9c5ccf2db04b. But it does not mean that the @page must be a HugeTLB page
even if PageHuge(@page) returns true when the user does not hold
a extra refcount on the @page. Seems the WARN could be acceptable, so
should we remove this WARN? I am not sure. Cc more experts.

Thanks.

> Code: 41 83 e4 01 44 89 e6 e8 b1 e6 a9 ff 45 84 e4 0f 85 c4 fe ff ff e8 23 ec a9 ff 48 c7 c6 e0 07 1b 8b 48 89 ef e8 34 2e ed ff 90 <0f> 0b e8 8c 6b 06 00 e9 66 fe ff ff 48 89 ef e8 7f 6b 06 00 eb b6
> RSP: 0018:ffffc9000068f7f0 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc9000068f698
> RDX: ffff88801744c880 RSI: ffffffff81e4265c RDI: ffffffff8b6f0060
> RBP: ffffea0000a04c00 R08: 0000000000000000 R09: fffffbfff1f3deca
> R10: ffffffff8f9ef657 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffea0000a04dc0 R14: 0000000000028137 R15: ffffc9000068fbe8
> FS: 0000000000000000(0000) GS:ffff88802c300000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffe623b9138 CR3: 000000001c22c000 CR4: 0000000000350ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> folio_test_hugetlb include/linux/page-flags.h:875 [inline]
> PageHuge+0x219/0x2b0 mm/hugetlb.c:2174
> isolate_migratepages_block+0x4a0/0x5110 mm/compaction.c:1004
> isolate_migratepages mm/compaction.c:2182 [inline]
> compact_zone+0x1a5c/0x4280 mm/compaction.c:2629
> kcompactd_do_work+0x340/0x720 mm/compaction.c:3100
> kcompactd+0x8d7/0xde0 mm/compaction.c:3199
> kthread+0x2c1/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
> </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:const_folio_flags+0x1bd/0x1f0 include/linux/page-flags.h:315
> Code: 41 83 e4 01 44 89 e6 e8 b1 e6 a9 ff 45 84 e4 0f 85 c4 fe ff ff e8 23 ec a9 ff 48 c7 c6 e0 07 1b 8b 48 89 ef e8 34 2e ed ff 90 <0f> 0b e8 8c 6b 06 00 e9 66 fe ff ff 48 89 ef e8 7f 6b 06 00 eb b6
> RSP: 0018:ffffc9000068f7f0 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc9000068f698
> RDX: ffff88801744c880 RSI: ffffffff81e4265c RDI: ffffffff8b6f0060
> RBP: ffffea0000a04c00 R08: 0000000000000000 R09: fffffbfff1f3deca
> R10: ffffffff8f9ef657 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffea0000a04dc0 R14: 0000000000028137 R15: ffffc9000068fbe8
> FS: 0000000000000000(0000) GS:ffff88802c300000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffe623b9138 CR3: 000000001c22c000 CR4: 0000000000350ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup


2024-03-22 03:25:05

by Muchun Song

[permalink] [raw]
Subject: Re: [syzbot] [mm?] kernel BUG in const_folio_flags



> On Mar 21, 2024, at 18:20, Oscar Salvador <[email protected]> wrote:
>
> On Thu, Mar 21, 2024 at 05:49:49PM +0800, Muchun Song wrote:
>> There are some more page dumping information from console:
>>
>> [ 61.367144][ T42] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888028132880 pfn:0x28130
>> [ 61.371430][ T42] flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
>> [ 61.374455][ T42] page_type: 0xffffffff()
>> [ 61.376096][ T42] raw: 00fff80000000000 ffff888015ecd540 dead000000000100 0000000000000000
>> [ 61.379994][ T42] raw: ffff888028132880 0000000000190000 00000000ffffffff 0000000000000000
>>
>> Alright, the page is freed (with a refcount of 0).
>
> Yes, basically the page changed betwen folio_test_large() (returned true
> for PG_Head) and the call to const_folio_flags() (which now returned
> false for PG_Head).
>
> As David pointed out, Willy is working on making PageHutelb more
> robust [1].
>
>
> [1] https://lore.kernel.org/linux-mm/[email protected]/

Sorry, I am not on the CC list, so I didn't know this. But thank
you and David for this information, I think it could fix this problem.

Muchun,
Thanks.