LinuxLists.cc - Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end

2023-09-18 12:25:07

Subject: Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io

On 9/8/23 17:25, Matthew Wilcox wrote:
> On Thu, Aug 31, 2023 at 03:52:49PM +0100, Matthew Wilcox wrote:
>>> read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
>>> folio_batch_move_lru (./include/linux/mm.h:1814 ./include/linux/mm.h:1824 ./include/linux/memcontrol.h:1636 ./include/linux/memcontrol.h:1659 mm/swap.c:216)
>>> folio_batch_add_and_move (mm/swap.c:235)
>>> folio_add_lru (./arch/x86/include/asm/preempt.h:95 mm/swap.c:518)
>>> folio_add_lru_vma (mm/swap.c:538)
>>> do_anonymous_page (mm/memory.c:4146)
>>
>> This is the part I don't understand. The path to calling
>> folio_add_lru_vma() comes directly from vma_alloc_zeroed_movable_folio():
>>
> [snip]
>>
>> (sorry that's a lot of lines). But there's _nowhere_ there that sets
>> PG_locked. It's a freshly allocated page; all page flags (that are
>> actually flags; ignore the stuff up at the top) should be clear. We
>> even check that with PAGE_FLAGS_CHECK_AT_PREP. Plus, it doesn't
>> make sense that we'd start I/O; the page is freshly allocated, full of
>> zeroes; there's no backing store to read the page from.
>>
>> It really feels like this page was freed while it was still under I/O
>> and it's been reallocated to this victim process.
>>
>> I'm going to try a few things and see if I can figure this out.
>
> I'm having trouble reproducing this. Can you get it to happen reliably?
>
> This is what I'm currently running with, and it doesn't trigger.
> I'd expect it to if we were going to hit the KCSAN bug.
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0c5be12f9336..d22e8798c326 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4439,6 +4439,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
> page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
>
> out:
> + VM_BUG_ON_PAGE(page && (page->flags & (PAGE_FLAGS_CHECK_AT_PREP &~ (1 << PG_head))), page);
> if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
> unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
> __free_pages(page, order);

Hi,

Caught another instance of this bug involving folio_batch_move_lru: I don't seem that I can make it
happen reliably by the nature of the data racing conditions if I understood them well.

I have only found them in dmesg, I cannot determine what exactly the system was doing at that
spurious moment ...

Hope this will get some more light on the issue (6.6-rc2 vanilla torvalds tree kernel):

[ 114.557937] ==================================================================
[ 114.558262] BUG: KCSAN: data-race in btrfs_page_set_uptodate [btrfs] / folio_batch_move_lru

[ 114.558902] write (marked) to 0xffffea0006f68f00 of 8 bytes by task 2678 on cpu 19:
[ 114.558912] btrfs_page_set_uptodate (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/bitops.h:55 /home/marvin/linux/kernel/torvalds2/./include/asm-generic/bitops/instrumented-atomic.h:29 /home/marvin/linux/kernel/torvalds2/./include/linux/page-flags.h:741 /home/marvin/linux/kernel/torvalds2/./include/linux/page-flags.h:751 /home/marvin/linux/kernel/torvalds2/fs/btrfs/subpage.c:642) btrfs
[ 114.559539] end_page_read (/home/marvin/linux/kernel/torvalds2/fs/btrfs/extent_io.c:445) btrfs
[ 114.560166] end_bio_extent_readpage (/home/marvin/linux/kernel/torvalds2/fs/btrfs/extent_io.c:660) btrfs
[ 114.560796] __btrfs_bio_end_io (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:120) btrfs
[ 114.561434] btrfs_orig_bbio_end_io (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:164) btrfs
[ 114.562079] btrfs_check_read_bio (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:324) btrfs
[ 114.562724] btrfs_end_bio_work (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:359) btrfs
[ 114.563360] process_one_work (/home/marvin/linux/kernel/torvalds2/kernel/workqueue.c:2630)
[ 114.563371] worker_thread (/home/marvin/linux/kernel/torvalds2/kernel/workqueue.c:2697 /home/marvin/linux/kernel/torvalds2/kernel/workqueue.c:2784)
[ 114.563381] kthread (/home/marvin/linux/kernel/torvalds2/kernel/kthread.c:388)
[ 114.563390] ret_from_fork (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/process.c:147)
[ 114.563401] ret_from_fork_asm (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/entry_64.S:312)

[ 114.563416] read to 0xffffea0006f68f00 of 8 bytes by task 3540 on cpu 20:
[ 114.563426] folio_batch_move_lru (/home/marvin/linux/kernel/torvalds2/./include/linux/mm.h:1849 /home/marvin/linux/kernel/torvalds2/./include/linux/mm.h:1859 /home/marvin/linux/kernel/torvalds2/./include/linux/memcontrol.h:1639 /home/marvin/linux/kernel/torvalds2/./include/linux/memcontrol.h:1662 /home/marvin/linux/kernel/torvalds2/mm/swap.c:216)
[ 114.563436] folio_batch_add_and_move (/home/marvin/linux/kernel/torvalds2/mm/swap.c:235)
[ 114.563446] folio_add_lru (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/preempt.h:95 /home/marvin/linux/kernel/torvalds2/mm/swap.c:518)
[ 114.563455] filemap_add_folio (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:957)
[ 114.563464] page_cache_ra_unbounded (/home/marvin/linux/kernel/torvalds2/mm/readahead.c:250)
[ 114.563477] page_cache_ra_order (/home/marvin/linux/kernel/torvalds2/mm/readahead.c:547)
[ 114.563490] ondemand_readahead (/home/marvin/linux/kernel/torvalds2/mm/readahead.c:669)
[ 114.563499] page_cache_async_ra (/home/marvin/linux/kernel/torvalds2/mm/readahead.c:718)
[ 114.563507] filemap_fault (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:3227 /home/marvin/linux/kernel/torvalds2/mm/filemap.c:3281)
[ 114.563518] __do_fault (/home/marvin/linux/kernel/torvalds2/mm/memory.c:4204)
[ 114.563528] do_fault (/home/marvin/linux/kernel/torvalds2/mm/memory.c:4568 /home/marvin/linux/kernel/torvalds2/mm/memory.c:4705)
[ 114.563538] __handle_mm_fault (/home/marvin/linux/kernel/torvalds2/mm/memory.c:3669 /home/marvin/linux/kernel/torvalds2/mm/memory.c:4978 /home/marvin/linux/kernel/torvalds2/mm/memory.c:5119)
[ 114.563549] handle_mm_fault (/home/marvin/linux/kernel/torvalds2/mm/memory.c:5284)
[ 114.563560] do_user_addr_fault (/home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1413)
[ 114.563572] exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/paravirt.h:695 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1513 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1561)
[ 114.563582] asm_exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/idtentry.h:570)

[ 114.563597] value changed: 0x0017ffffc0008101 -> 0x0017ffffc0008108

[ 114.563612] Reported by Kernel Concurrency Sanitizer on:
[ 114.563619] CPU: 20 PID: 3540 Comm: chrome Not tainted 6.6.0-rc2-kcsan-00003-g16819584c239-dirty #11
[ 114.563630] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 114.563636] ==================================================================

Best regards,
Mirsad Todorovac

2023-09-18 23:05:18

by Matthew Wilcox

[permalink] [raw]

Subject: Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io

On Mon, Sep 18, 2023 at 02:15:05PM +0200, Mirsad Todorovac wrote:
> > This is what I'm currently running with, and it doesn't trigger.
> > I'd expect it to if we were going to hit the KCSAN bug.
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 0c5be12f9336..d22e8798c326 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4439,6 +4439,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
> > page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
> > out:
> > + VM_BUG_ON_PAGE(page && (page->flags & (PAGE_FLAGS_CHECK_AT_PREP &~ (1 << PG_head))), page);
> > if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
> > unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
> > __free_pages(page, order);
>
> Hi,
>
> Caught another instance of this bug involving folio_batch_move_lru: I don't seem that I can make it
> happen reliably by the nature of the data racing conditions if I understood them well.

Were you running with this patch at the time, or was this actually
vanilla? The problem is that, if my diagnosis is correct, both of the
tasks mentioned are victims; we have a prematurely freed page. While
btrfs is clearly a user, it may not be btrfs's fault that the
page was also allocated as an anon page.

I'm trying to gather more data, and running with this patch will give
us more -- because it'll dump the entire struct page instead of just
the page->flags, like KCSAN is currently doing.

2023-09-19 11:56:35

by Mirsad Todorovac

[permalink] [raw]

Subject: Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io

On 9/18/2023 4:53 PM, Matthew Wilcox wrote:

> On Mon, Sep 18, 2023 at 02:15:05PM +0200, Mirsad Todorovac wrote:
>>> This is what I'm currently running with, and it doesn't trigger.
>>> I'd expect it to if we were going to hit the KCSAN bug.
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 0c5be12f9336..d22e8798c326 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -4439,6 +4439,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
>>> page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
>>> out:
>>> + VM_BUG_ON_PAGE(page && (page->flags & (PAGE_FLAGS_CHECK_AT_PREP &~ (1 << PG_head))), page);
>>> if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
>>> unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
>>> __free_pages(page, order);
>> Hi,
>>
>> Caught another instance of this bug involving folio_batch_move_lru: I don't seem that I can make it
>> happen reliably by the nature of the data racing conditions if I understood them well.
> Were you running with this patch at the time, or was this actually
> vanilla? The problem is that, if my diagnosis is correct, both of the
> tasks mentioned are victims; we have a prematurely freed page. While
> btrfs is clearly a user, it may not be btrfs's fault that the
> page was also allocated as an anon page.
>
> I'm trying to gather more data, and running with this patch will give
> us more -- because it'll dump the entire struct page instead of just
> the page->flags, like KCSAN is currently doing.

Hi, Mr. Matthew,

Yes, I am using "vanilla with your VM_BUG_ON_PAGE()" patch all the time,
as it seems non-disruptive and I am hoping to catch this spurious page
alloc.

Best regards, Mirsad Todorovac

2023-10-03 20:12:52

by Mirsad Todorovac

[permalink] [raw]

Subject: Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io

On 9/18/23 16:53, Matthew Wilcox wrote:
> On Mon, Sep 18, 2023 at 02:15:05PM +0200, Mirsad Todorovac wrote:
>>> This is what I'm currently running with, and it doesn't trigger.
>>> I'd expect it to if we were going to hit the KCSAN bug.
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 0c5be12f9336..d22e8798c326 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -4439,6 +4439,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
>>> page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
>>> out:
>>> + VM_BUG_ON_PAGE(page && (page->flags & (PAGE_FLAGS_CHECK_AT_PREP &~ (1 << PG_head))), page);
>>> if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
>>> unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
>>> __free_pages(page, order);
>>
>> Hi,
>>
>> Caught another instance of this bug involving folio_batch_move_lru: I don't seem that I can make it
>> happen reliably by the nature of the data racing conditions if I understood them well.
>
> Were you running with this patch at the time, or was this actually
> vanilla? The problem is that, if my diagnosis is correct, both of the
> tasks mentioned are victims; we have a prematurely freed page. While
> btrfs is clearly a user, it may not be btrfs's fault that the
> page was also allocated as an anon page.
>
> I'm trying to gather more data, and running with this patch will give
> us more -- because it'll dump the entire struct page instead of just
> the page->flags, like KCSAN is currently doing.

As my learning curve adapts, I seem to be more aware of what you are talking about.

I still have to learn to cope with patches, diffs, fixes and pulls all together and
consistent.

Sometimes I feel like in the BORG maturation chamber when I try to learn the Linux kernel,
and I wonder if this is the Author of my story trying to make up "for the years that locust
had eaten". Or is it that I am just losing the plot.

I learn that I was conceited and not respecting the work you guys have done in thirty years
I wasted for one reason or another: objective difficulties and personal weaknesses.

Forgive me this moment of truth.

I certainly feel more motivated to catch the real culprit, rather than just the symptoms.

I will rebuild with your patch again and try to reproduce the problem.

Best regards
Mirsad Todorovac