2023-12-30 15:33:15

by Genes Lists

[permalink] [raw]
Subject: 6.6.8 stable: crash in folio_mark_dirty


Apologies in advance, but I cannot git bisect this since machine was
running for 10 days on 6.6.8 before this happened.

Reporting in case it's useful (and not a hardware fail).

There is nothing interesting in journal ahead of the crash - previous
entry, 2 minutes prior from user space dhcp server.

- Root, efi is on nvme
- Spare root,efi is on sdg
- md raid6 on sda-sd with lvmcache from one partition on nvme drive.
- all filesystems are ext4 (other than efi).
- 32 GB mem.


regards

gene

details attached which show:

Dec 30 07:00:36 s6 kernel: <TASK>
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty+0x21c/0x2a0
Dec 30 07:00:36 s6 kernel: ? __warn+0x81/0x130
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty+0x21c/0x2a0
Dec 30 07:00:36 s6 kernel: ? report_bug+0x171/0x1a0
Dec 30 07:00:36 s6 kernel: ? handle_bug+0x3c/0x80
Dec 30 07:00:36 s6 kernel: ? exc_invalid_op+0x17/0x70
Dec 30 07:00:36 s6 kernel: ? asm_exc_invalid_op+0x1a/0x20
Dec 30 07:00:36 s6 kernel: ? __folio_mark_dirty+0x21c/0x2a0
Dec 30 07:00:36 s6 kernel: block_dirty_folio+0x8a/0xb0
Dec 30 07:00:36 s6 kernel: unmap_page_range+0xd17/0x1120
Dec 30 07:00:36 s6 kernel: unmap_vmas+0xb5/0x190
Dec 30 07:00:36 s6 kernel: exit_mmap+0xec/0x340
Dec 30 07:00:36 s6 kernel: __mmput+0x3e/0x130
Dec 30 07:00:36 s6 kernel: do_exit+0x31c/0xb20
Dec 30 07:00:36 s6 kernel: do_group_exit+0x31/0x80
Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group+0x18/0x20
Dec 30 07:00:36 s6 kernel: do_syscall_64+0x5d/0x90
Dec 30 07:00:36 s6 kernel: ? count_memcg_events.constprop.0+0x1a/0x30
Dec 30 07:00:36 s6 kernel: ? handle_mm_fault+0xa2/0x360
Dec 30 07:00:36 s6 kernel: ? do_user_addr_fault+0x30f/0x660
Dec 30 07:00:36 s6 kernel: ? exc_page_fault+0x7f/0x180
Dec 30 07:00:36 s6 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Dec 30 07:00:36 s6 kernel: RIP: 0033:0x7fb3c581ee2d
Dec 30 07:00:36 s6 kernel: Code: Unable to access opcode bytes at
0x7fb3c581ee03.
Dec 30 07:00:36 s6 kernel: RSP: 002b:00007fff620541e8 EFLAGS: 00000206
ORIG_RAX: 00000000000000e7
Dec 30 07:00:36 s6 kernel: RAX: ffffffffffffffda RBX: 00007fb3c591efa8
RCX: 00007fb3c581ee2d
Dec 30 07:00:36 s6 kernel: RDX: 00000000000000e7 RSI: ffffffffffffff88
RDI: 0000000000000000
Dec 30 07:00:36 s6 kernel: RBP: 0000000000000002 R08: 0000000000000000
R09: 00007fb3c5924920
Dec 30 07:00:36 s6 kernel: R10: 00005650f2e615f0 R11: 0000000000000206
R12: 0000000000000000
Dec 30 07:00:36 s6 kernel: R13: 0000000000000000 R14: 00007fb3c591d680
R15: 00007fb3c591efc0
Dec 30 07:00:36 s6 kernel: </TASK>


Attachments:
s6-crash (7.71 kB)

2023-12-30 18:02:50

by Matthew Wilcox

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Sat, Dec 30, 2023 at 10:23:26AM -0500, Genes Lists wrote:
> Apologies in advance, but I cannot git bisect this since machine was
> running for 10 days on 6.6.8 before this happened.

Thanks for the report. Apologies, I'm on holiday until the middle of
the week so this will be extremely terse.

> - Root, efi is on nvme
> - Spare root,efi is on sdg
> - md raid6 on sda-sd with lvmcache from one partition on nvme drive.
> - all filesystems are ext4 (other than efi).
> - 32 GB mem.

> Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
> Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?)

This is:

WARN_ON_ONCE(warn && !folio_test_uptodate(folio));

> Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df

So rsync is exiting. Do you happen to know what rsync is doing?

> Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?)
> Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?)
> Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?)
> Dec 30 07:00:36 s6 kernel: exit_mmap (??:?)
> Dec 30 07:00:36 s6 kernel: __mmput (??:?)
> Dec 30 07:00:36 s6 kernel: do_exit (??:?)
> Dec 30 07:00:36 s6 kernel: do_group_exit (??:?)
> Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?)
> Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?)

It looks llike rsync has a page from the block device mmaped? I'll have
to investigate this properly when I'm back. If you haven't heard from
me in a week, please ping me.

(I don't think I caused this, but I think I stand a fighting chance of
tracking down what the problem is, just not right now).

2023-12-30 19:16:57

by Genes Lists

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Sat, 2023-12-30 at 18:02 +0000, Matthew Wilcox wrote:
>
> Thanks for the report.  Apologies, I'm on holiday until the middle of
> the week so this will be extremely terse.
>

Enjoy ????

> >
> Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted
> So rsync is exiting.  Do you happen to know what rsync is doing?
> .

There are 2 rsyncs I can think of:

(a) rsync from another server (s8) pushing files over the local
network to this machine (s6). rsync writes to the raid drives on s6.

s8 says the rsync completed successfully at 3:04 am (about 4 hours
prior to this error at 7.00 am).

(b) There is also a script running inotify which uses rsync to keep
the spare root drive sync'ed. System had update at 5:48 am of a few
packages, and that would have caused an rsync from root on nvme to
sapre on sdg. Most likely this is this one that triggered around 7 am.

This one runs:

/usr/bin/rsync --open-noatime --no-specials --delete --atimes -
axHAX --times <src> <dst>



> t looks llike rsync has a page from the block device mmaped?  I'll
> have
> to investigate this properly when I'm back.  If you haven't heard
> from
> me in a week, please ping me.

Thank you.

>
> (I don't think I caused this, but I think I stand a fighting chance
> of
> tracking down what the problem is, just not right now).


This may or may not be related, but this same machine crashed during an
rsync same as (a) above (i.e. s8 pushing files to the raid6 disks on
s6) about 3 weeks ago - then was on 6.6.4 kernel. In that case the
error was in md code.

https://lore.kernel.org/lkml/[email protected]/T/

Thank you again,


gene





2023-12-31 01:29:23

by Hillf Danton

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Sat, Dec 30, 2023 at 10:23:26AM -0500 Genes Lists <[email protected]>
> Apologies in advance, but I cannot git bisect this since machine was
> running for 10 days on 6.6.8 before this happened.
>
> Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
> Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?)
> Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
> Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?)
> Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?)
> Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?)
> Dec 30 07:00:36 s6 kernel: exit_mmap (??:?)
> Dec 30 07:00:36 s6 kernel: __mmput (??:?)
> Dec 30 07:00:36 s6 kernel: do_exit (??:?)
> Dec 30 07:00:36 s6 kernel: do_group_exit (??:?)
> Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?)
> Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?)

See what comes out if race is handled.
Only for thoughts.

--- x/mm/page-writeback.c
+++ y/mm/page-writeback.c
@@ -2661,12 +2661,19 @@ void __folio_mark_dirty(struct folio *fo
{
unsigned long flags;

+again:
xa_lock_irqsave(&mapping->i_pages, flags);
- if (folio->mapping) { /* Race with truncate? */
+ if (folio->mapping && mapping == folio->mapping) {
WARN_ON_ONCE(warn && !folio_test_uptodate(folio));
folio_account_dirtied(folio, mapping);
__xa_set_mark(&mapping->i_pages, folio_index(folio),
PAGECACHE_TAG_DIRTY);
+ } else if (folio->mapping) { /* Race with truncate? */
+ struct address_space *tmp = folio->mapping;
+
+ xa_unlock_irqrestore(&mapping->i_pages, flags);
+ mapping = tmp;
+ goto again;
}
xa_unlock_irqrestore(&mapping->i_pages, flags);
}
--

2023-12-31 13:07:32

by Matthew Wilcox

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Sun, Dec 31, 2023 at 09:28:46AM +0800, Hillf Danton wrote:
> On Sat, Dec 30, 2023 at 10:23:26AM -0500 Genes Lists <[email protected]>
> > Apologies in advance, but I cannot git bisect this since machine was
> > running for 10 days on 6.6.8 before this happened.
> >
> > Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
> > Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?)
> > Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
> > Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?)
> > Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?)
> > Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?)
> > Dec 30 07:00:36 s6 kernel: exit_mmap (??:?)
> > Dec 30 07:00:36 s6 kernel: __mmput (??:?)
> > Dec 30 07:00:36 s6 kernel: do_exit (??:?)
> > Dec 30 07:00:36 s6 kernel: do_group_exit (??:?)
> > Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?)
> > Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?)
>
> See what comes out if race is handled.
> Only for thoughts.

I don't think this can happen. Look at the call trace;
block_dirty_folio() is called from unmap_page_range(). That means the
page is in the page tables. We unmap the pages in a folio from the
page tables before we set folio->mapping to NULL. Look at
invalidate_inode_pages2_range() for example:

unmap_mapping_pages(mapping, indices[i],
(1 + end - indices[i]), false);
folio_lock(folio);
folio_wait_writeback(folio);
if (folio_mapped(folio))
unmap_mapping_folio(folio);
BUG_ON(folio_mapped(folio));
if (!invalidate_complete_folio2(mapping, folio))

... and invalidate_complete_folio2() is where we set ->mapping to NULL
in __filemap_remove_folio -> page_cache_delete().


2023-12-31 20:59:18

by Matthew Wilcox

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Sat, Dec 30, 2023 at 10:23:26AM -0500, Genes Lists wrote:
> Apologies in advance, but I cannot git bisect this since machine was
> running for 10 days on 6.6.8 before this happened.

This problem simply doesn't make sense. There's just no way we shoud be
able to get a not-uptodate folio into the page tables. We do have one
pending patch which fixes a situation in which we can get some very
odd-looking situations due to reusing a page which has been freed.
I appreciate your ability to reproduce this is likely nil, but if you
could add

https://lore.kernel.org/all/[email protected]/

to your kernel, it might make things more stable for you.


2023-12-31 21:16:04

by Genes Lists

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Sun, 2023-12-31 at 20:59 +0000, Matthew Wilcox wrote:
> On Sat, Dec 30, 2023 at 10:23:26AM -0500, Genes Lists wrote:
> > Apologies in advance, but I cannot git bisect this since machine
> > was
> > running for 10 days on 6.6.8 before this happened.
>
> This problem simply doesn't make sense.  There's just no way we shoud
> be
> able to get a not-uptodate folio into the page tables.  We do have
> one
> pending patch which fixes a situation in which we can get some very
> odd-looking situations due to reusing a page which has been freed.
> I appreciate your ability to reproduce this is likely nil, but if you
> could add
>
> https://lore.kernel.org/all/[email protected]
> g/
>
> to your kernel, it might make things more stable for you.
>

Ok looks like that's in mainline - machine is now running 6.7.0-rc7 -
unless you prefer I patch 6.6.8 with above and change to that.

thanks and sorry about bringing up wacky problem.

gene

2024-01-01 01:55:37

by Hillf Danton

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Sun, 31 Dec 2023 13:07:03 +0000 Matthew Wilcox <[email protected]>
> On Sun, Dec 31, 2023 at 09:28:46AM +0800, Hillf Danton wrote:
> > On Sat, Dec 30, 2023 at 10:23:26AM -0500 Genes Lists <[email protected]>
> > > Apologies in advance, but I cannot git bisect this since machine was
> > > running for 10 days on 6.6.8 before this happened.
> > >
> > > Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
> > > Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?)
> > > Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
> > > Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?)
> > > Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?)
> > > Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?)
> > > Dec 30 07:00:36 s6 kernel: exit_mmap (??:?)
> > > Dec 30 07:00:36 s6 kernel: __mmput (??:?)
> > > Dec 30 07:00:36 s6 kernel: do_exit (??:?)
> > > Dec 30 07:00:36 s6 kernel: do_group_exit (??:?)
> > > Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?)
> > > Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?)
> >
> > See what comes out if race is handled.
> > Only for thoughts.
>
> I don't think this can happen. Look at the call trace;
> block_dirty_folio() is called from unmap_page_range(). That means the
> page is in the page tables. We unmap the pages in a folio from the
> page tables before we set folio->mapping to NULL. Look at
> invalidate_inode_pages2_range() for example:
>
> unmap_mapping_pages(mapping, indices[i],
> (1 + end - indices[i]), false);
> folio_lock(folio);
> folio_wait_writeback(folio);
> if (folio_mapped(folio))
> unmap_mapping_folio(folio);
> BUG_ON(folio_mapped(folio));
> if (!invalidate_complete_folio2(mapping, folio))
>
What is missed here is the same check [1] in invalidate_inode_pages2_range(),
so I built no wheel.

folio_lock(folio);
if (unlikely(folio->mapping != mapping)) {
folio_unlock(folio);
continue;
}

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/truncate.c#n658

2024-01-01 09:08:15

by Matthew Wilcox

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Mon, Jan 01, 2024 at 09:55:04AM +0800, Hillf Danton wrote:
> On Sun, 31 Dec 2023 13:07:03 +0000 Matthew Wilcox <[email protected]>
> > On Sun, Dec 31, 2023 at 09:28:46AM +0800, Hillf Danton wrote:
> > > On Sat, Dec 30, 2023 at 10:23:26AM -0500 Genes Lists <[email protected]>
> > > > Apologies in advance, but I cannot git bisect this since machine was
> > > > running for 10 days on 6.6.8 before this happened.
> > > >
> > > > Dec 30 07:00:36 s6 kernel: ------------[ cut here ]------------
> > > > Dec 30 07:00:36 s6 kernel: WARNING: CPU: 0 PID: 521524 at mm/page-writeback.c:2668 __folio_mark_dirty (??:?)
> > > > Dec 30 07:00:36 s6 kernel: CPU: 0 PID: 521524 Comm: rsync Not tainted 6.6.8-stable-1 #13 d238f5ab6a206cdb0cc5cd72f8688230f23d58df
> > > > Dec 30 07:00:36 s6 kernel: block_dirty_folio (??:?)
> > > > Dec 30 07:00:36 s6 kernel: unmap_page_range (??:?)
> > > > Dec 30 07:00:36 s6 kernel: unmap_vmas (??:?)
> > > > Dec 30 07:00:36 s6 kernel: exit_mmap (??:?)
> > > > Dec 30 07:00:36 s6 kernel: __mmput (??:?)
> > > > Dec 30 07:00:36 s6 kernel: do_exit (??:?)
> > > > Dec 30 07:00:36 s6 kernel: do_group_exit (??:?)
> > > > Dec 30 07:00:36 s6 kernel: __x64_sys_exit_group (??:?)
> > > > Dec 30 07:00:36 s6 kernel: do_syscall_64 (??:?)
> > >
> > > See what comes out if race is handled.
> > > Only for thoughts.
> >
> > I don't think this can happen. Look at the call trace;
> > block_dirty_folio() is called from unmap_page_range(). That means the
> > page is in the page tables. We unmap the pages in a folio from the
> > page tables before we set folio->mapping to NULL. Look at
> > invalidate_inode_pages2_range() for example:
> >
> > unmap_mapping_pages(mapping, indices[i],
> > (1 + end - indices[i]), false);
> > folio_lock(folio);
> > folio_wait_writeback(folio);
> > if (folio_mapped(folio))
> > unmap_mapping_folio(folio);
> > BUG_ON(folio_mapped(folio));
> > if (!invalidate_complete_folio2(mapping, folio))
> >
> What is missed here is the same check [1] in invalidate_inode_pages2_range(),
> so I built no wheel.
>
> folio_lock(folio);
> if (unlikely(folio->mapping != mapping)) {
> folio_unlock(folio);
> continue;
> }
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/truncate.c#n658

That's entirely different. That's checking in the truncate path whether
somebody else already truncated this page. What I was showing was why
a page found through a page table walk cannot have been truncated (which
is actually quite interesting, because it's the page table lock that
prevents the race).

2024-01-01 11:33:59

by Hillf Danton

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Mon, 1 Jan 2024 09:07:52 +0000 Matthew Wilcox
> On Mon, Jan 01, 2024 at 09:55:04AM +0800, Hillf Danton wrote:
> > On Sun, 31 Dec 2023 13:07:03 +0000 Matthew Wilcox <[email protected]>
> > > I don't think this can happen. Look at the call trace;
> > > block_dirty_folio() is called from unmap_page_range(). That means the
> > > page is in the page tables. We unmap the pages in a folio from the
> > > page tables before we set folio->mapping to NULL. Look at
> > > invalidate_inode_pages2_range() for example:
> > >
> > > unmap_mapping_pages(mapping, indices[i],
> > > (1 + end - indices[i]), false);
> > > folio_lock(folio);
> > > folio_wait_writeback(folio);
> > > if (folio_mapped(folio))
> > > unmap_mapping_folio(folio);
> > > BUG_ON(folio_mapped(folio));
> > > if (!invalidate_complete_folio2(mapping, folio))
> > >
> > What is missed here is the same check [1] in invalidate_inode_pages2_range(),
> > so I built no wheel.
> >
> > folio_lock(folio);
> > if (unlikely(folio->mapping != mapping)) {
> > folio_unlock(folio);
> > continue;
> > }
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/truncate.c#n658
>
> That's entirely different. That's checking in the truncate path whether
> somebody else already truncated this page. What I was showing was why
> a page found through a page table walk cannot have been truncated (which
> is actually quite interesting, because it's the page table lock that
> prevents the race).
>
Feel free to shed light on how ptl protects folio->mapping.

2024-01-01 14:11:31

by Matthew Wilcox

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Mon, Jan 01, 2024 at 07:33:16PM +0800, Hillf Danton wrote:
> On Mon, 1 Jan 2024 09:07:52 +0000 Matthew Wilcox
> > On Mon, Jan 01, 2024 at 09:55:04AM +0800, Hillf Danton wrote:
> > > On Sun, 31 Dec 2023 13:07:03 +0000 Matthew Wilcox <[email protected]>
> > > > I don't think this can happen. Look at the call trace;
> > > > block_dirty_folio() is called from unmap_page_range(). That means the
> > > > page is in the page tables. We unmap the pages in a folio from the
> > > > page tables before we set folio->mapping to NULL. Look at
> > > > invalidate_inode_pages2_range() for example:
> > > >
> > > > unmap_mapping_pages(mapping, indices[i],
> > > > (1 + end - indices[i]), false);
> > > > folio_lock(folio);
> > > > folio_wait_writeback(folio);
> > > > if (folio_mapped(folio))
> > > > unmap_mapping_folio(folio);
> > > > BUG_ON(folio_mapped(folio));
> > > > if (!invalidate_complete_folio2(mapping, folio))
> > > >
> > > What is missed here is the same check [1] in invalidate_inode_pages2_range(),
> > > so I built no wheel.
> > >
> > > folio_lock(folio);
> > > if (unlikely(folio->mapping != mapping)) {
> > > folio_unlock(folio);
> > > continue;
> > > }
> > >
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/truncate.c#n658
> >
> > That's entirely different. That's checking in the truncate path whether
> > somebody else already truncated this page. What I was showing was why
> > a page found through a page table walk cannot have been truncated (which
> > is actually quite interesting, because it's the page table lock that
> > prevents the race).
> >
> Feel free to shed light on how ptl protects folio->mapping.

The documentation for __folio_mark_dirty() hints at it:

* The caller must hold folio_memcg_lock(). Most callers have the folio
* locked. A few have the folio blocked from truncation through other
* means (eg zap_vma_pages() has it mapped and is holding the page table
* lock). This can also be called from mark_buffer_dirty(), which I
* cannot prove is always protected against truncate.

Re-reading that now, I _think_ mark_buffer_dirty() always has to be
called with a reference on the bufferhead, which means that a racing
truncate will fail due to

invalidate_inode_pages2_range -> invalidate_complete_folio2 ->
filemap_release_folio -> try_to_free_buffers -> drop_buffers -> buffer_busy


From an mm point of view, what is implicit is that truncate calls
unmap_mapping_folio -> unmap_mapping_range_tree ->
unmap_mapping_range_vma -> zap_page_range_single -> unmap_single_vma ->
unmap_page_range -> zap_p4d_range -> zap_pud_range -> zap_pmd_range ->
zap_pte_range -> pte_offset_map_lock()

So a truncate will take the page lock, then spin on the pte lock
until the racing munmap() has finished (ok, this was an exit(), not
a munmap(), but exit() does an implicit munmap()).

2024-01-03 10:49:44

by Hillf Danton

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Mon, 1 Jan 2024 14:11:02 +0000 Matthew Wilcox
>
> From an mm point of view, what is implicit is that truncate calls
> unmap_mapping_folio -> unmap_mapping_range_tree ->
> unmap_mapping_range_vma -> zap_page_range_single -> unmap_single_vma ->
> unmap_page_range -> zap_p4d_range -> zap_pud_range -> zap_pmd_range ->
> zap_pte_range -> pte_offset_map_lock()
>
> So a truncate will take the page lock, then spin on the pte lock
> until the racing munmap() has finished (ok, this was an exit(), not
> a munmap(), but exit() does an implicit munmap()).
>
But ptl fails to explain the warning reported, while the sequence in
__block_commit_write()

mark_buffer_dirty();
folio_mark_uptodate();

hints the warning is bogus.

2024-01-03 17:54:27

by Matthew Wilcox

[permalink] [raw]
Subject: Re: 6.6.8 stable: crash in folio_mark_dirty

On Wed, Jan 03, 2024 at 06:49:07PM +0800, Hillf Danton wrote:
> On Mon, 1 Jan 2024 14:11:02 +0000 Matthew Wilcox
> >
> > From an mm point of view, what is implicit is that truncate calls
> > unmap_mapping_folio -> unmap_mapping_range_tree ->
> > unmap_mapping_range_vma -> zap_page_range_single -> unmap_single_vma ->
> > unmap_page_range -> zap_p4d_range -> zap_pud_range -> zap_pmd_range ->
> > zap_pte_range -> pte_offset_map_lock()
> >
> > So a truncate will take the page lock, then spin on the pte lock
> > until the racing munmap() has finished (ok, this was an exit(), not
> > a munmap(), but exit() does an implicit munmap()).
> >
> But ptl fails to explain the warning reported, while the sequence in
> __block_commit_write()
>
> mark_buffer_dirty();
> folio_mark_uptodate();
>
> hints the warning is bogus.

The folio is locked when filesystems call __block_commit_write().

Nothing explains the reported warning, IMO. Other than data corruption,
and I'm not sure that we've found the last data corrupter.