2022-08-06 01:45:39

by syzbot

[permalink] [raw]
Subject: [syzbot] KASAN: invalid-access Read in copy_page

Hello,

syzbot found the following issue on:

HEAD commit: 9e2f40233670 Merge tag 'x86_sgx_for_v6.0-2022-08-03.1' of ..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16181cbc080000
kernel config: https://syzkaller.appspot.com/x/.config?x=886e7348b2982e4d
dashboard link: https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
compiler: aarch64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: invalid-access in copy_page+0x10/0xd0 arch/arm64/lib/copy_page.S:26
Read at addr f5ff000017f2e000 by task syz-executor.1/2218
Pointer tag: [f5], memory tag: [f2]

CPU: 1 PID: 2218 Comm: syz-executor.1 Not tainted 5.19.0-syzkaller-10532-g9e2f40233670 #0
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace.part.0+0xcc/0xe0 arch/arm64/kernel/stacktrace.c:182
dump_backtrace arch/arm64/kernel/stacktrace.c:188 [inline]
show_stack+0x18/0x5c arch/arm64/kernel/stacktrace.c:189
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x68/0x84 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:313 [inline]
print_report+0xfc/0x5f0 mm/kasan/report.c:429
kasan_report+0x8c/0xb0 mm/kasan/report.c:491
__do_kernel_fault+0x104/0x1c0 arch/arm64/mm/fault.c:319
do_bad_area arch/arm64/mm/fault.c:469 [inline]
do_tag_check_fault+0x78/0x90 arch/arm64/mm/fault.c:738
do_mem_abort+0x48/0xa0 arch/arm64/mm/fault.c:814
el1_abort+0x40/0x60 arch/arm64/kernel/entry-common.c:366
el1h_64_sync_handler+0xb0/0xd0 arch/arm64/kernel/entry-common.c:417
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:576
copy_page+0x10/0xd0 arch/arm64/lib/copy_page.S:26
copy_user_highpage+0x18/0x4c arch/arm64/mm/copypage.c:34
__wp_page_copy_user mm/memory.c:2848 [inline]
wp_page_copy+0xa0/0x790 mm/memory.c:3109
do_wp_page+0x150/0x6a4 mm/memory.c:3471
handle_pte_fault mm/memory.c:4925 [inline]
__handle_mm_fault+0x6c4/0xf84 mm/memory.c:5046
handle_mm_fault+0xe8/0x25c mm/memory.c:5144
__do_page_fault arch/arm64/mm/fault.c:502 [inline]
do_page_fault+0x140/0x3b0 arch/arm64/mm/fault.c:602
do_mem_abort+0x48/0xa0 arch/arm64/mm/fault.c:814
el0_da+0x48/0xbc arch/arm64/kernel/entry-common.c:502
el0t_64_sync_handler+0x134/0x1b0 arch/arm64/kernel/entry-common.c:645
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:581

The buggy address belongs to the physical page:
page:000000003e6672be refcount:3 mapcount:2 mapping:0000000000000000 index:0xffffffffe pfn:0x57f2e
memcg:fbff00001ded8000
anon flags: 0x1ffc2800208001c(uptodate|dirty|lru|swapbacked|arch_2|node=0|zone=0|lastcpupid=0x7ff|kasantag=0xa)
raw: 01ffc2800208001c fffffc00004f91c8 fcff00001d1b1000 f1ff00000510b231
raw: 0000000ffffffffe 0000000000000000 0000000300000001 fbff00001ded8000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff000017f2de00: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
ffff000017f2df00: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
>ffff000017f2e000: f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2
^
ffff000017f2e100: f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2
ffff000017f2e200: f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2022-09-05 21:55:59

by Andrey Konovalov

[permalink] [raw]
Subject: Re: [syzbot] KASAN: invalid-access Read in copy_page

Hi Catalin,

Syzbot reported an issue with MTE tagging of user pages, see the report below.

Possibly, it's related to your "mm: kasan: Skip unpoisoning of user
pages" series. However, I'm not sure what the issue is.

Do you have any ideas?

Thanks!

On Sat, Aug 6, 2022 at 3:31 AM syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 9e2f40233670 Merge tag 'x86_sgx_for_v6.0-2022-08-03.1' of ..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16181cbc080000
> kernel config: https://syzkaller.appspot.com/x/.config?x=886e7348b2982e4d
> dashboard link: https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
> compiler: aarch64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> userspace arch: arm64
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> ==================================================================
> BUG: KASAN: invalid-access in copy_page+0x10/0xd0 arch/arm64/lib/copy_page.S:26
> Read at addr f5ff000017f2e000 by task syz-executor.1/2218
> Pointer tag: [f5], memory tag: [f2]
>
> CPU: 1 PID: 2218 Comm: syz-executor.1 Not tainted 5.19.0-syzkaller-10532-g9e2f40233670 #0
> Hardware name: linux,dummy-virt (DT)
> Call trace:
> dump_backtrace.part.0+0xcc/0xe0 arch/arm64/kernel/stacktrace.c:182
> dump_backtrace arch/arm64/kernel/stacktrace.c:188 [inline]
> show_stack+0x18/0x5c arch/arm64/kernel/stacktrace.c:189
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0x68/0x84 lib/dump_stack.c:106
> print_address_description mm/kasan/report.c:313 [inline]
> print_report+0xfc/0x5f0 mm/kasan/report.c:429
> kasan_report+0x8c/0xb0 mm/kasan/report.c:491
> __do_kernel_fault+0x104/0x1c0 arch/arm64/mm/fault.c:319
> do_bad_area arch/arm64/mm/fault.c:469 [inline]
> do_tag_check_fault+0x78/0x90 arch/arm64/mm/fault.c:738
> do_mem_abort+0x48/0xa0 arch/arm64/mm/fault.c:814
> el1_abort+0x40/0x60 arch/arm64/kernel/entry-common.c:366
> el1h_64_sync_handler+0xb0/0xd0 arch/arm64/kernel/entry-common.c:417
> el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:576
> copy_page+0x10/0xd0 arch/arm64/lib/copy_page.S:26
> copy_user_highpage+0x18/0x4c arch/arm64/mm/copypage.c:34
> __wp_page_copy_user mm/memory.c:2848 [inline]
> wp_page_copy+0xa0/0x790 mm/memory.c:3109
> do_wp_page+0x150/0x6a4 mm/memory.c:3471
> handle_pte_fault mm/memory.c:4925 [inline]
> __handle_mm_fault+0x6c4/0xf84 mm/memory.c:5046
> handle_mm_fault+0xe8/0x25c mm/memory.c:5144
> __do_page_fault arch/arm64/mm/fault.c:502 [inline]
> do_page_fault+0x140/0x3b0 arch/arm64/mm/fault.c:602
> do_mem_abort+0x48/0xa0 arch/arm64/mm/fault.c:814
> el0_da+0x48/0xbc arch/arm64/kernel/entry-common.c:502
> el0t_64_sync_handler+0x134/0x1b0 arch/arm64/kernel/entry-common.c:645
> el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:581
>
> The buggy address belongs to the physical page:
> page:000000003e6672be refcount:3 mapcount:2 mapping:0000000000000000 index:0xffffffffe pfn:0x57f2e
> memcg:fbff00001ded8000
> anon flags: 0x1ffc2800208001c(uptodate|dirty|lru|swapbacked|arch_2|node=0|zone=0|lastcpupid=0x7ff|kasantag=0xa)
> raw: 01ffc2800208001c fffffc00004f91c8 fcff00001d1b1000 f1ff00000510b231
> raw: 0000000ffffffffe 0000000000000000 0000000300000001 fbff00001ded8000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
> ffff000017f2de00: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
> ffff000017f2df00: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
> >ffff000017f2e000: f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2
> ^
> ffff000017f2e100: f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2
> ffff000017f2e200: f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2
> ==================================================================
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

2022-09-06 13:33:19

by Catalin Marinas

[permalink] [raw]
Subject: Re: [syzbot] KASAN: invalid-access Read in copy_page

Hi Andrey,

On Mon, Sep 05, 2022 at 11:39:24PM +0200, Andrey Konovalov wrote:
> Syzbot reported an issue with MTE tagging of user pages, see the report below.
>
> Possibly, it's related to your "mm: kasan: Skip unpoisoning of user
> pages" series. However, I'm not sure what the issue is.
[...]
> On Sat, Aug 6, 2022 at 3:31 AM syzbot
> <[email protected]> wrote:
> > BUG: KASAN: invalid-access in copy_page+0x10/0xd0 arch/arm64/lib/copy_page.S:26
> > Read at addr f5ff000017f2e000 by task syz-executor.1/2218
> > Pointer tag: [f5], memory tag: [f2]
[...]
> > The buggy address belongs to the physical page:
> > page:000000003e6672be refcount:3 mapcount:2 mapping:0000000000000000 index:0xffffffffe pfn:0x57f2e
> > memcg:fbff00001ded8000
> > anon flags: 0x1ffc2800208001c(uptodate|dirty|lru|swapbacked|arch_2|node=0|zone=0|lastcpupid=0x7ff|kasantag=0xa)

It looks like a copy-on-write where the source page is tagged
(PG_mte_tagged set) but page_kasan_tag() != 0xff (kasantag == 0xa). The
page is also swap-backed. Our current assumption is that
page_kasan_tag_reset() should be called on page allocation and we should
never end up with a user page without the kasan tag reset.

I was hoping we can catch such condition with the diff below but it
never triggered for me even when swapping tagged pages in and out:

-------------8<-------------------------------------------
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index b2b730233274..241c616e3685 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -62,6 +62,9 @@ void mte_sync_tags(pte_t old_pte, pte_t pte)
if (!check_swap && !pte_is_tagged)
return;

+ /* Pages mapped in user space should have had the kasan tag reset */
+ WARN_ON_ONCE(page_kasan_tag(page) != 0xff);
+
/* if PG_mte_tagged is set, tags have already been initialised */
for (i = 0; i < nr_pages; i++, page++) {
if (!test_and_set_bit(PG_mte_tagged, &page->flags))
------------------------8<-------------------------------

Does it take long to reproduce this kasan warning? If not, it may be
worth adding the above hunk, hopefully we can identify where that page
is coming from before it ends up in copy_page().

--
Catalin

2022-09-06 15:02:34

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] KASAN: invalid-access Read in copy_page

On Tue, 6 Sept 2022 at 15:24, Catalin Marinas <[email protected]> wrote:
>
> Hi Andrey,
>
> On Mon, Sep 05, 2022 at 11:39:24PM +0200, Andrey Konovalov wrote:
> > Syzbot reported an issue with MTE tagging of user pages, see the report below.
> >
> > Possibly, it's related to your "mm: kasan: Skip unpoisoning of user
> > pages" series. However, I'm not sure what the issue is.
> [...]
> > On Sat, Aug 6, 2022 at 3:31 AM syzbot
> > <[email protected]> wrote:
> > > BUG: KASAN: invalid-access in copy_page+0x10/0xd0 arch/arm64/lib/copy_page.S:26
> > > Read at addr f5ff000017f2e000 by task syz-executor.1/2218
> > > Pointer tag: [f5], memory tag: [f2]
> [...]
> > > The buggy address belongs to the physical page:
> > > page:000000003e6672be refcount:3 mapcount:2 mapping:0000000000000000 index:0xffffffffe pfn:0x57f2e
> > > memcg:fbff00001ded8000
> > > anon flags: 0x1ffc2800208001c(uptodate|dirty|lru|swapbacked|arch_2|node=0|zone=0|lastcpupid=0x7ff|kasantag=0xa)
>
> It looks like a copy-on-write where the source page is tagged
> (PG_mte_tagged set) but page_kasan_tag() != 0xff (kasantag == 0xa). The
> page is also swap-backed. Our current assumption is that
> page_kasan_tag_reset() should be called on page allocation and we should
> never end up with a user page without the kasan tag reset.
>
> I was hoping we can catch such condition with the diff below but it
> never triggered for me even when swapping tagged pages in and out:
>
> -------------8<-------------------------------------------
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index b2b730233274..241c616e3685 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -62,6 +62,9 @@ void mte_sync_tags(pte_t old_pte, pte_t pte)
> if (!check_swap && !pte_is_tagged)
> return;
>
> + /* Pages mapped in user space should have had the kasan tag reset */
> + WARN_ON_ONCE(page_kasan_tag(page) != 0xff);
> +
> /* if PG_mte_tagged is set, tags have already been initialised */
> for (i = 0; i < nr_pages; i++, page++) {
> if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> ------------------------8<-------------------------------
>
> Does it take long to reproduce this kasan warning? If not, it may be
> worth adding the above hunk, hopefully we can identify where that page
> is coming from before it ends up in copy_page().

syzbot finds several such cases every day (200 crashes for the past 35 days):
https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
So once it reaches the tested tree, we should have an answer within a day.

2022-09-06 15:23:49

by Catalin Marinas

[permalink] [raw]
Subject: Re: [syzbot] KASAN: invalid-access Read in copy_page

On Tue, Sep 06, 2022 at 03:40:59PM +0200, Dmitry Vyukov wrote:
> On Tue, 6 Sept 2022 at 15:24, Catalin Marinas <[email protected]> wrote:
> > On Mon, Sep 05, 2022 at 11:39:24PM +0200, Andrey Konovalov wrote:
> > > Syzbot reported an issue with MTE tagging of user pages, see the report below.
> > >
> > > Possibly, it's related to your "mm: kasan: Skip unpoisoning of user
> > > pages" series. However, I'm not sure what the issue is.
> > [...]
> > > On Sat, Aug 6, 2022 at 3:31 AM syzbot
> > > <[email protected]> wrote:
> > > > BUG: KASAN: invalid-access in copy_page+0x10/0xd0 arch/arm64/lib/copy_page.S:26
> > > > Read at addr f5ff000017f2e000 by task syz-executor.1/2218
> > > > Pointer tag: [f5], memory tag: [f2]
> > [...]
> > > > The buggy address belongs to the physical page:
> > > > page:000000003e6672be refcount:3 mapcount:2 mapping:0000000000000000 index:0xffffffffe pfn:0x57f2e
> > > > memcg:fbff00001ded8000
> > > > anon flags: 0x1ffc2800208001c(uptodate|dirty|lru|swapbacked|arch_2|node=0|zone=0|lastcpupid=0x7ff|kasantag=0xa)
> >
> > It looks like a copy-on-write where the source page is tagged
> > (PG_mte_tagged set) but page_kasan_tag() != 0xff (kasantag == 0xa). The
> > page is also swap-backed. Our current assumption is that
> > page_kasan_tag_reset() should be called on page allocation and we should
> > never end up with a user page without the kasan tag reset.
[...]
> > Does it take long to reproduce this kasan warning?
>
> syzbot finds several such cases every day (200 crashes for the past 35 days):
> https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
> So once it reaches the tested tree, we should have an answer within a day.

That's good to know. BTW, does syzkaller write tags in mmap'ed pages or
only issues random syscalls? I'm trying to figure out whether tag 0xf2
was written by the kernel without updating the corresponding
page_kasan_tag() or it was syzkaller recolouring the page.

--
Catalin

2022-09-06 15:46:32

by Andrey Konovalov

[permalink] [raw]
Subject: Re: [syzbot] KASAN: invalid-access Read in copy_page

On Tue, Sep 6, 2022 at 4:29 PM Catalin Marinas <[email protected]> wrote:
>
> > > Does it take long to reproduce this kasan warning?
> >
> > syzbot finds several such cases every day (200 crashes for the past 35 days):
> > https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
> > So once it reaches the tested tree, we should have an answer within a day.

To be specific, this syzkaller instance fuzzes the mainline, so the
patch with the WARN_ON needs to end up there.

If this is unacceptable, perhaps, we could switch the MTE syzkaller
instance to the arm64 testing tree.

> That's good to know. BTW, does syzkaller write tags in mmap'ed pages or
> only issues random syscalls?

syzkaller doesn't write tags. Or, at least, shouldn't. Theoretically
it could come up with same way to generate instructions that write
tags, but this is unlikely.

> I'm trying to figure out whether tag 0xf2
> was written by the kernel without updating the corresponding
> page_kasan_tag() or it was syzkaller recolouring the page.

Just in case, I want to point out that the kasantag == 0xa from the
page flags matches the pointer tag 0xf5 in the report. The tag value
is stored bitwise-inverted in the page flags. Not that this matters in
this case though.

2022-09-06 17:17:40

by Catalin Marinas

[permalink] [raw]
Subject: Re: [syzbot] KASAN: invalid-access Read in copy_page

On Tue, Sep 06, 2022 at 04:39:57PM +0200, Andrey Konovalov wrote:
> On Tue, Sep 6, 2022 at 4:29 PM Catalin Marinas <[email protected]> wrote:
> > > > Does it take long to reproduce this kasan warning?
> > >
> > > syzbot finds several such cases every day (200 crashes for the past 35 days):
> > > https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
> > > So once it reaches the tested tree, we should have an answer within a day.
>
> To be specific, this syzkaller instance fuzzes the mainline, so the
> patch with the WARN_ON needs to end up there.
>
> If this is unacceptable, perhaps, we could switch the MTE syzkaller
> instance to the arm64 testing tree.

It needs some more digging first. My first guess was that a PROT_MTE
page was mapped into the user address space and the task repainted it
but I don't think that's the case.

> > That's good to know. BTW, does syzkaller write tags in mmap'ed pages or
> > only issues random syscalls?
>
> syzkaller doesn't write tags. Or, at least, shouldn't. Theoretically
> it could come up with same way to generate instructions that write
> tags, but this is unlikely.

Yeah. And colouring an entire page with the same tag is even less
likely.

> > I'm trying to figure out whether tag 0xf2
> > was written by the kernel without updating the corresponding
> > page_kasan_tag() or it was syzkaller recolouring the page.
>
> Just in case, I want to point out that the kasantag == 0xa from the
> page flags matches the pointer tag 0xf5 in the report. The tag value
> is stored bitwise-inverted in the page flags. Not that this matters in
> this case though.

Yes, I'm aware of this. So copy_page() tries to read from
page_address(src) with kasantag == 0xa (real tag 0xf5) while the
in-memory tag is 0xf2. Since the user didn't repaint the page, I'm
trying to figure out what set the tags to 0xf2 while leaving the
page_kasan_tag() to 0xf5. Some of the page_kasan_tag_reset() calls in
the past could have hidden a different issue.

Since I can't find the kernel boot log for these runs, is there any kind
of swap enabled? I'm trying to narrow down where the problem may be.

--
Catalin

2022-09-27 17:24:16

by Andrey Konovalov

[permalink] [raw]
Subject: Re: [syzbot] KASAN: invalid-access Read in copy_page

On Tue, Sep 6, 2022 at 6:23 PM Catalin Marinas <[email protected]> wrote:
>
> On Tue, Sep 06, 2022 at 04:39:57PM +0200, Andrey Konovalov wrote:
> > On Tue, Sep 6, 2022 at 4:29 PM Catalin Marinas <[email protected]> wrote:
> > > > > Does it take long to reproduce this kasan warning?
> > > >
> > > > syzbot finds several such cases every day (200 crashes for the past 35 days):
> > > > https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
> > > > So once it reaches the tested tree, we should have an answer within a day.
> >
> > To be specific, this syzkaller instance fuzzes the mainline, so the
> > patch with the WARN_ON needs to end up there.
> >
> > If this is unacceptable, perhaps, we could switch the MTE syzkaller
> > instance to the arm64 testing tree.
>
> It needs some more digging first. My first guess was that a PROT_MTE
> page was mapped into the user address space and the task repainted it
> but I don't think that's the case.

Hi Catalin,

syzkaller still keeps hitting this issue and I was wondering if you
have any ideas of what could be wrong here?

> Since I can't find the kernel boot log for these runs, is there any kind
> of swap enabled? I'm trying to narrow down where the problem may be.

I don't think there is.

Thanks!

2022-10-05 13:02:53

by James Morse

[permalink] [raw]
Subject: Re: [syzbot] KASAN: invalid-access Read in copy_page

Hi guys,

On 27/09/2022 17:55, Andrey Konovalov wrote:
> On Tue, Sep 6, 2022 at 6:23 PM Catalin Marinas <[email protected]> wrote:
>>
>> On Tue, Sep 06, 2022 at 04:39:57PM +0200, Andrey Konovalov wrote:
>>> On Tue, Sep 6, 2022 at 4:29 PM Catalin Marinas <[email protected]> wrote:
>>>>>> Does it take long to reproduce this kasan warning?
>>>>>
>>>>> syzbot finds several such cases every day (200 crashes for the past 35 days):
>>>>> https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
>>>>> So once it reaches the tested tree, we should have an answer within a day.
>>>
>>> To be specific, this syzkaller instance fuzzes the mainline, so the
>>> patch with the WARN_ON needs to end up there.
>>>
>>> If this is unacceptable, perhaps, we could switch the MTE syzkaller
>>> instance to the arm64 testing tree.
>>
>> It needs some more digging first. My first guess was that a PROT_MTE
>> page was mapped into the user address space and the task repainted it
>> but I don't think that's the case.

> syzkaller still keeps hitting this issue and I was wondering if you
> have any ideas of what could be wrong here?
>
>> Since I can't find the kernel boot log for these runs, is there any kind
>> of swap enabled? I'm trying to narrow down where the problem may be.
>
> I don't think there is.


I've reproduced this with the latest qemu and v6.0 kernel using ubuntu 15.04 user-space.

The reproducer is just to log in once its booted. The vm has swap, and I've turned the
memory down low enough to force it to swap. The round trip time is about 15 minutes.

I've not managed to reproduce it without swap, or with more memory. (but it may be a
timing thing)


Below is one example of tag corruption that affected page-cache memory that wouldn't be
swapped:
-------------------%<-------------------
[49488.484420] BUG: KASAN: invalid-access in __arch_copy_to_user+0x180/0x240
[49488.487122] Read at addr f1ff00000ad48000 by task apt-config/5041
[49488.488614] Pointer tag: [f1], memory tag: [fe]

[49488.490921] CPU: 1 PID: 5041 Comm: apt-config Not tainted 6.0.0 #14546
[49488.492364] Hardware name: linux,dummy-virt (DT)
[49488.493790] Call trace:
[49488.494640] dump_backtrace.part.0+0xd0/0xe0
[49488.495811] show_stack+0x18/0x50
[49488.496785] dump_stack_lvl+0x68/0x84
[49488.497781] print_report+0x104/0x604
[49488.498790] kasan_report+0x8c/0xb0
[49488.499758] __do_kernel_fault+0x11c/0x1bc
[49488.500801] do_tag_check_fault+0x78/0x90
[49488.501830] do_mem_abort+0x44/0x9c
[49488.502813] el1_abort+0x40/0x60
[49488.503839] el1h_64_sync_handler+0xb0/0xd0
[49488.504880] el1h_64_sync+0x64/0x68
[49488.505847] __arch_copy_to_user+0x180/0x240
[49488.506917] _copy_to_iter+0x68/0x5c0
[49488.507918] copy_page_to_iter+0xac/0x33c
[49488.508943] filemap_read+0x1b4/0x3b0
[49488.509936] generic_file_read_iter+0x108/0x1a0
[49488.511033] ext4_file_read_iter+0x58/0x1f0
[49488.512078] vfs_read+0x1f8/0x2a0
[49488.513031] ksys_read+0x68/0xf4
[49488.513978] __arm64_sys_read+0x1c/0x2c
[49488.514998] invoke_syscall+0x48/0x114
[49488.516046] el0_svc_common.constprop.0+0x44/0xec
[49488.517153] do_el0_svc+0x2c/0xc0
[49488.518120] el0_svc+0x2c/0xb4
[49488.519041] el0t_64_sync_handler+0xb8/0xc0
[49488.520080] el0t_64_sync+0x198/0x19c

[49488.522268] The buggy address belongs to the physical page:
[49488.523778] page:00000000db6e19d9 refcount:20 mapcount:18 mapping:0000000052573be9
index:0x0 pfn:0x4ad48
[49488.524938] memcg:faff000002c70000
[49488.525430] aops:ext4_da_aops ino:8061 dentry name:"libc-2.21.so"
[49488.526289] flags:
0x1ffc38002020876(referenced|uptodate|lru|active|workingset|arch_1|mappedtodisk|arch_2|node=0|zone=0|lastcpupid=0x7ff|kasantag=0xe)
CMA
[49488.527947] raw: 01ffc38002020876 fffffc00002b5248 fffffc00002b51c8 f8ff00000335c760
[49488.528325] raw: 0000000000000000 0000000000000000 0000001400000011 faff000002c70000
[49488.528669] page dumped because: kasan: bad access detected

[49488.529615] Memory state around the buggy address:
[49488.531027] ffff00000ad47e00: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
[49488.532442] ffff00000ad47f00: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
[49488.533922] >ffff00000ad48000: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
[49488.535259] ^
[49488.536292] ffff00000ad48100: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
[49488.537628] ffff00000ad48200: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
[49488.539015] ==================================================================
[49488.603970] Disabling lock debugging due to kernel taint
-------------------%<-------------------


Thanks,

James

2022-10-06 10:29:45

by Catalin Marinas

[permalink] [raw]
Subject: Re: [syzbot] KASAN: invalid-access Read in copy_page

On Wed, Oct 05, 2022 at 01:38:55PM +0100, James Morse wrote:
> On 27/09/2022 17:55, Andrey Konovalov wrote:
> > On Tue, Sep 6, 2022 at 6:23 PM Catalin Marinas <[email protected]> wrote:
> >> On Tue, Sep 06, 2022 at 04:39:57PM +0200, Andrey Konovalov wrote:
> >>> On Tue, Sep 6, 2022 at 4:29 PM Catalin Marinas <[email protected]> wrote:
> >>>>>> Does it take long to reproduce this kasan warning?
> >>>>>
> >>>>> syzbot finds several such cases every day (200 crashes for the past 35 days):
> >>>>> https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
> >>>>> So once it reaches the tested tree, we should have an answer within a day.
[...]
> I've reproduced this with the latest qemu and v6.0 kernel using ubuntu 15.04 user-space.
>
> The reproducer is just to log in once its booted. The vm has swap, and I've turned the
> memory down low enough to force it to swap. The round trip time is about 15 minutes.
>
> I've not managed to reproduce it without swap, or with more memory. (but it may be a
> timing thing)

Thanks James. I got the error without swap enabled. Just booted Debian
under Qemu with 256MB of RAM (no graphics), did an 'ls -lR /' and it
triggered shortly after. There's no MTE used in user-space.

==================================================================
BUG: KASAN: invalid-access in copy_page+0x10/0xd0
Read at addr f9ff0000050ba000 by task kcompactd0/28
Pointer tag: [f9], memory tag: [f8]

CPU: 0 PID: 28 Comm: kcompactd0 Tainted: G W 6.0.0-rc3-dirty #1
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
dump_backtrace.part.0+0xdc/0xf0
show_stack+0x1c/0x4c
dump_stack_lvl+0x68/0x84
print_report+0x104/0x610
kasan_report+0x90/0xb0
__do_kernel_fault+0x70/0x194
do_tag_check_fault+0x7c/0x90
do_mem_abort+0x48/0xa0
el1_abort+0x40/0x60
el1h_64_sync_handler+0xdc/0xec
el1h_64_sync+0x64/0x68
copy_page+0x10/0xd0
folio_copy+0x50/0xb0
migrate_folio+0x50/0x9c
move_to_new_folio+0xc0/0x1d4
migrate_pages+0x16b4/0x1740
compact_zone+0x66c/0xb0c
proactive_compact_node+0x70/0xac
kcompactd+0x1b4/0x370
kthread+0x110/0x114
ret_from_fork+0x10/0x20

The buggy address belongs to the physical page:
page:000000007339140a refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff90019 pfn:0x450ba
memcg:f9ff0000052e4000
anon flags: 0x3fffc180088000d(locked|uptodate|dirty|swapbacked|arch_2|node=0|zone=0|lastcpupid=0xffff|kasantag=0x6)
raw: 03fffc180088000d fffffc0000142e48 ffff80000815bd68 fdff000001c738c1
raw: 0000000ffff90019 0000000000000000 00000001ffffffff f9ff0000052e4000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff0000050b9e00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
ffff0000050b9f00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
>ffff0000050ba000: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
^
ffff0000050ba100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
ffff0000050ba200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
==================================================================

It looks like it always happens on read. Something updated the tag in
page->flags for an existing page (or repainted the page, though less
likely as I think the page is in use).

I'm surprised that even without MTE in user-space, we still get
PG_mte_tagged (arch_2) set. Time for more printks.

--
Catalin