2024-01-25 21:16:25

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty()

On Thu, Jan 25, 2024 at 01:09:46PM +0000, Roman Smirnov wrote:
> Syzkaller reports warning in ext4_set_page_dirty() in 5.10 and 5.15
> stable releases. It happens because invalidate_inode_page() frees pages
> that are needed for the system. To fix this we need to add additional
> checks to the function. page_mapped() checks if a page exists in the
> page tables, but this is not enough. The page can be used in other places:
> https://elixir.bootlin.com/linux/v6.8-rc1/source/include/linux/page_ref.h#L71
>
> Kernel outputs an error line related to direct I/O:
> https://syzkaller.appspot.com/text?tag=CrashLog&x=14ab52dac80000

OK, this is making a lot more sense.

The invalidate_inode_page() path (after the page_mapped check) calls
try_to_release_page() which strips the buffers from the page.
__remove_mapping() tries to freeze the page and presuambly fails.

ext4 is checking there are still buffer heads attached to the page.
I'm not sure why it's doing that; it's legitimate to strip the
bufferheads from a page and then reattach them later (if they're
attached to a dirty page, they are created dirty).

So the only question in my mind is whether ext4 is right to have this
assert in the first place. It seems wrong to me, but perhaps someone
from ext4 can explain why it's correct.

> The problem can be fixed in 5.10 and 5.15 stable releases by the
> following patch.
>
> The patch replaces page_mapped() call with check that finds additional
> references to the page excluding page cache and filesystem private data.
> If additional references exist, the page cannot be freed.
>
> This version does not include the first patch from the first version.
> The problem can be fixed without it.
>
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
>
> Link: https://syzkaller.appspot.com/bug?extid=02f21431b65c214aa1d6
>
> Matthew Wilcox (Oracle) (1):
> mm/truncate: Replace page_mapped() call in invalidate_inode_page()
>
> mm/truncate.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> --
> 2.34.1
>


2024-01-29 09:11:43

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty()

On Thu 25-01-24 14:06:58, Matthew Wilcox wrote:
> On Thu, Jan 25, 2024 at 01:09:46PM +0000, Roman Smirnov wrote:
> > Syzkaller reports warning in ext4_set_page_dirty() in 5.10 and 5.15
> > stable releases. It happens because invalidate_inode_page() frees pages
> > that are needed for the system. To fix this we need to add additional
> > checks to the function. page_mapped() checks if a page exists in the
> > page tables, but this is not enough. The page can be used in other places:
> > https://elixir.bootlin.com/linux/v6.8-rc1/source/include/linux/page_ref.h#L71
> >
> > Kernel outputs an error line related to direct I/O:
> > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ab52dac80000
>
> OK, this is making a lot more sense.
>
> The invalidate_inode_page() path (after the page_mapped check) calls
> try_to_release_page() which strips the buffers from the page.
> __remove_mapping() tries to freeze the page and presuambly fails.

Yep, likely.

> ext4 is checking there are still buffer heads attached to the page.
> I'm not sure why it's doing that; it's legitimate to strip the
> bufferheads from a page and then reattach them later (if they're
> attached to a dirty page, they are created dirty).

Well, we really need to track dirtiness on per fs-block basis in ext4
(which makes a difference when blocksize < page size). For example for
delayed block allocation we reserve exactly as many blocks as we need
(which need not be all the blocks in the page e.g. when writing just one
block in the middle of a large hole). So when all buffers would be marked
as dirty we would overrun our reservation. Hence at the moment of dirtying
we really need buffers to be attached to the page and stay there until the
page is written back.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-01-29 19:30:36

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 5.10/5.15 v2 0/1 RFC] mm/truncate: fix WARNING in ext4_set_page_dirty()

On Mon, Jan 29, 2024 at 10:11:24AM +0100, Jan Kara wrote:
> On Thu 25-01-24 14:06:58, Matthew Wilcox wrote:
> > On Thu, Jan 25, 2024 at 01:09:46PM +0000, Roman Smirnov wrote:
> > > Syzkaller reports warning in ext4_set_page_dirty() in 5.10 and 5.15
> > > stable releases. It happens because invalidate_inode_page() frees pages
> > > that are needed for the system. To fix this we need to add additional
> > > checks to the function. page_mapped() checks if a page exists in the
> > > page tables, but this is not enough. The page can be used in other places:
> > > https://elixir.bootlin.com/linux/v6.8-rc1/source/include/linux/page_ref.h#L71
> > >
> > > Kernel outputs an error line related to direct I/O:
> > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ab52dac80000
> >
> > OK, this is making a lot more sense.
> >
> > The invalidate_inode_page() path (after the page_mapped check) calls
> > try_to_release_page() which strips the buffers from the page.
> > __remove_mapping() tries to freeze the page and presuambly fails.
>
> Yep, likely.
>
> > ext4 is checking there are still buffer heads attached to the page.
> > I'm not sure why it's doing that; it's legitimate to strip the
> > bufferheads from a page and then reattach them later (if they're
> > attached to a dirty page, they are created dirty).
>
> Well, we really need to track dirtiness on per fs-block basis in ext4
> (which makes a difference when blocksize < page size). For example for
> delayed block allocation we reserve exactly as many blocks as we need
> (which need not be all the blocks in the page e.g. when writing just one
> block in the middle of a large hole). So when all buffers would be marked
> as dirty we would overrun our reservation. Hence at the moment of dirtying
> we really need buffers to be attached to the page and stay there until the
> page is written back.

Thanks for the clear explanation!

Isn't the correct place to ensure that this is true in
ext4_release_folio()? I think all paths to remove buffer_heads from a
folio go through ext4_release_folio() and so it can be prohibited here
if the folio is part of a delalloc extent?

I worry that the proposed fix here cuts off only one path to hitting
this WARN_ON and we need a more comprehensive fix.