2020-04-15 22:15:15

by Michal Hocko

[permalink] [raw]
Subject: implicit AOP_FLAG_NOFS for grab_cache_page_write_begin

Hi,
I have just received a bug report about memcg OOM [1]. The underlying
issue is memcg specific but the stack trace made me look at the write(2)
patch and I have noticed that iomap_write_begin enforces AOP_FLAG_NOFS
which means that all the page cache that has to be allocated is
GFP_NOFS. What is the reason for this? Do all filesystems really need
the reclaim protection? I was hoping that those filesystems which really
need NOFS context would be using the scope API
(memalloc_nofs_{save,restore}.

Could you clarify please?

[1] http://lkml.kernel.org/r/[email protected]
--
Michal Hocko
SUSE Labs


2020-04-17 07:33:23

by Christoph Hellwig

[permalink] [raw]
Subject: Re: implicit AOP_FLAG_NOFS for grab_cache_page_write_begin

On Wed, Apr 15, 2020 at 09:02:28AM +0200, Michal Hocko wrote:
> Hi,
> I have just received a bug report about memcg OOM [1]. The underlying
> issue is memcg specific but the stack trace made me look at the write(2)
> patch and I have noticed that iomap_write_begin enforces AOP_FLAG_NOFS
> which means that all the page cache that has to be allocated is
> GFP_NOFS. What is the reason for this? Do all filesystems really need
> the reclaim protection? I was hoping that those filesystems which really
> need NOFS context would be using the scope API
> (memalloc_nofs_{save,restore}.

This comes from the historic XFS code, and this commit from Dave
in particular:

commit aea1b9532143218f8599ecedbbd6bfbf812385e1
Author: Dave Chinner <[email protected]>
Date: Tue Jul 20 17:54:12 2010 +1000

xfs: use GFP_NOFS for page cache allocation

Avoid a lockdep warning by preventing page cache allocation from
recursing back into the filesystem during memory reclaim.

2020-04-17 08:03:52

by Michal Hocko

[permalink] [raw]
Subject: Re: implicit AOP_FLAG_NOFS for grab_cache_page_write_begin

On Fri 17-04-20 00:29:31, Christoph Hellwig wrote:
> On Wed, Apr 15, 2020 at 09:02:28AM +0200, Michal Hocko wrote:
> > Hi,
> > I have just received a bug report about memcg OOM [1]. The underlying
> > issue is memcg specific but the stack trace made me look at the write(2)
> > patch and I have noticed that iomap_write_begin enforces AOP_FLAG_NOFS
> > which means that all the page cache that has to be allocated is
> > GFP_NOFS. What is the reason for this? Do all filesystems really need
> > the reclaim protection? I was hoping that those filesystems which really
> > need NOFS context would be using the scope API
> > (memalloc_nofs_{save,restore}.
>
> This comes from the historic XFS code, and this commit from Dave
> in particular:
>
> commit aea1b9532143218f8599ecedbbd6bfbf812385e1
> Author: Dave Chinner <[email protected]>
> Date: Tue Jul 20 17:54:12 2010 +1000
>
> xfs: use GFP_NOFS for page cache allocation
>
> Avoid a lockdep warning by preventing page cache allocation from
> recursing back into the filesystem during memory reclaim.

Thanks for digging this up! The changelog is not really clear whether
NOFS is to avoid false possitive lockup warnings or real ones. If the
former then we have grown __GFP_NOLOCKDEP flag to workaround the problem
if the later then can we use memalloc_nofs_{save,restore} in the xfs
specific code please?
--
Michal Hocko
SUSE Labs

2020-04-17 08:07:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: implicit AOP_FLAG_NOFS for grab_cache_page_write_begin

On Fri, Apr 17, 2020 at 10:00:03AM +0200, Michal Hocko wrote:
> > commit aea1b9532143218f8599ecedbbd6bfbf812385e1
> > Author: Dave Chinner <[email protected]>
> > Date: Tue Jul 20 17:54:12 2010 +1000
> >
> > xfs: use GFP_NOFS for page cache allocation
> >
> > Avoid a lockdep warning by preventing page cache allocation from
> > recursing back into the filesystem during memory reclaim.
>
> Thanks for digging this up! The changelog is not really clear whether
> NOFS is to avoid false possitive lockup warnings or real ones. If the
> former then we have grown __GFP_NOLOCKDEP flag to workaround the problem
> if the later then can we use memalloc_nofs_{save,restore} in the xfs
> specific code please?

As far as I can tell we are never in a file system transaction in XFS
when allocating page cache pages. We do, however usually have i_rwsem
locked (or back in the day the XFS-specific predecessor). I'm not
sure what the current issues are, but maybe Dave remembers. In doubt
we should try removing the flag and run heavy stress testing with
lockdep enabled and see if it screams.