On Sun, Mar 06, 2022 at 02:59:05PM -0800, Hugh Dickins wrote:
> Mikulas asked in
> https://lore.kernel.org/linux-mm/alpine.LRH.2.02.2007210510230.6959@file01.intranet.prod.int.rdu2.redhat.com/
> Do we still need a0ee5ec520ed ("tmpfs: allocate on read when stacked")?
>
> Lukas noticed this unusual behavior of loop device backed by tmpfs in
> https://lore.kernel.org/linux-mm/20211126075100.gd64odg2bcptiqeb@work/
>
> Normally, shmem_file_read_iter() copies the ZERO_PAGE when reading holes;
> but if it looks like it might be a read for "a stacking filesystem", it
> allocates actual pages to the page cache, and even marks them as dirty.
> And reads from the loop device do satisfy the test that is used.
>
> This oddity was added for an old version of unionfs, to help to limit
> its usage to the limited size of the tmpfs mount involved; but about
> the same time as the tmpfs mod went in (2.6.25), unionfs was reworked
> to proceed differently; and the mod kept just in case others needed it.
>
> Do we still need it? I cannot answer with more certainty than "Probably
> not". It's nasty enough that we really should try to delete it; but if
> a regression is reported somewhere, then we might have to revert later.
>
> It's not quite as simple as just removing the test (as Mikulas did):
> xfstests generic/013 hung because splice from tmpfs failed on page not
> up-to-date and page mapping unset. That can be fixed just by marking
> the ZERO_PAGE as Uptodate, which of course it is: do so in
> pagecache_init() - it might be useful to others than tmpfs.
>
> My intention, though, was to stop using the ZERO_PAGE here altogether:
> surely iov_iter_zero() is better for this case? Sadly not: it relies
> on clear_user(), and the x86 clear_user() is slower than its copy_user():
> https://lore.kernel.org/lkml/[email protected]/
>
> But while we are still using the ZERO_PAGE, let's stop dirtying its
> struct page cacheline with unnecessary get_page() and put_page().
>
> Reported-by: Mikulas Patocka <[email protected]>
> Reported-by: Lukas Czerner <[email protected]>
> Signed-off-by: Hugh Dickins <[email protected]>
I would have split the uptodate setting of ZERO_PAGE into a separate,
clearly documented patch, but otherwise this looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
On Mon, Mar 07, 2022 at 07:44:34AM +0100, Christoph Hellwig wrote:
> On Sun, Mar 06, 2022 at 02:59:05PM -0800, Hugh Dickins wrote:
> > Mikulas asked in
> > https://lore.kernel.org/linux-mm/alpine.LRH.2.02.2007210510230.6959@file01.intranet.prod.int.rdu2.redhat.com/
> > Do we still need a0ee5ec520ed ("tmpfs: allocate on read when stacked")?
> >
> > Lukas noticed this unusual behavior of loop device backed by tmpfs in
> > https://lore.kernel.org/linux-mm/20211126075100.gd64odg2bcptiqeb@work/
> >
> > Normally, shmem_file_read_iter() copies the ZERO_PAGE when reading holes;
> > but if it looks like it might be a read for "a stacking filesystem", it
> > allocates actual pages to the page cache, and even marks them as dirty.
> > And reads from the loop device do satisfy the test that is used.
> >
> > This oddity was added for an old version of unionfs, to help to limit
> > its usage to the limited size of the tmpfs mount involved; but about
> > the same time as the tmpfs mod went in (2.6.25), unionfs was reworked
> > to proceed differently; and the mod kept just in case others needed it.
> >
> > Do we still need it? I cannot answer with more certainty than "Probably
> > not". It's nasty enough that we really should try to delete it; but if
> > a regression is reported somewhere, then we might have to revert later.
> >
> > It's not quite as simple as just removing the test (as Mikulas did):
> > xfstests generic/013 hung because splice from tmpfs failed on page not
> > up-to-date and page mapping unset. That can be fixed just by marking
> > the ZERO_PAGE as Uptodate, which of course it is: do so in
> > pagecache_init() - it might be useful to others than tmpfs.
> >
> > My intention, though, was to stop using the ZERO_PAGE here altogether:
> > surely iov_iter_zero() is better for this case? Sadly not: it relies
> > on clear_user(), and the x86 clear_user() is slower than its copy_user():
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > But while we are still using the ZERO_PAGE, let's stop dirtying its
> > struct page cacheline with unnecessary get_page() and put_page().
> >
> > Reported-by: Mikulas Patocka <[email protected]>
> > Reported-by: Lukas Czerner <[email protected]>
> > Signed-off-by: Hugh Dickins <[email protected]>
>
> I would have split the uptodate setting of ZERO_PAGE into a separate,
> clearly documented patch, but otherwise this looks good:
>
> Reviewed-by: Christoph Hellwig <[email protected]>
I've long wondered (for my own nefarious purposes) why tmpfs files
didn't just grab the zero page, so:
Acked-by: Darrick J. Wong <[email protected]>
--D
On Tue, 8 Mar 2022, Darrick J. Wong wrote:
>
> I've long wondered (for my own nefarious purposes) why tmpfs files
> didn't just grab the zero page,
(tmpfs files have been using the zero page for reads for many years:
it was just this odd internal "could it be for a stacking filesystem?"
case, which /dev/loop also fell into, which was doing allocation on read.
I wonder what your nefarious purposes are ;) Maybe related to pages
faulted into an mmap: those pages tmpfs has always allocated for, then
they're freed up later by page reclaim if still undirtied. We may change
that in future, and use the zero page even there: there are advantages
of course, but some care and code needed - never been a priority.)
> so:
>
> Acked-by: Darrick J. Wong <[email protected]>
Thanks,
Hugh