From: Trond Myklebust <[email protected]>
Allowing kswapd to do GFP_KERNEL memory allocations (or any blocking memory
allocations) is wrong and can cause deadlocks in try_to_release_page(), as
the filesystem believes it is safe to allocate new memory and block,
whereas kswapd is there specifically to clear a low-memory situation...
Set the gfp_mask to GFP_IOFS instead.
Signed-off-by: Trond Myklebust <[email protected]>
---
mm/vmscan.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ec5ddcc..716dd16 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2095,7 +2095,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
unsigned long total_scanned;
struct reclaim_state *reclaim_state = current->reclaim_state;
struct scan_control sc = {
- .gfp_mask = GFP_KERNEL,
+ .gfp_mask = GFP_IOFS,
.may_unmap = 1,
.may_swap = 1,
/*
On Wed, 2010-08-18 at 12:24 -0700, Ram Pai wrote:
>
>
> On Wed, Aug 18, 2010 at 12:04 PM, Trond Myklebust
> <[email protected]> wrote:
> From: Trond Myklebust <[email protected]>
>
> Allowing kswapd to do GFP_KERNEL memory allocations (or any
> blocking memory
> allocations) is wrong and can cause deadlocks in
> try_to_release_page(), as
> the filesystem believes it is safe to allocate new memory and
> block,
> whereas kswapd is there specifically to clear a low-memory
> situation...
>
> Set the gfp_mask to GFP_IOFS instead.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> mm/vmscan.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ec5ddcc..716dd16 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2095,7 +2095,7 @@ static unsigned long
> balance_pgdat(pg_data_t *pgdat, int order)
> unsigned long total_scanned;
> struct reclaim_state *reclaim_state =
> current->reclaim_state;
> struct scan_control sc = {
> - .gfp_mask = GFP_KERNEL,
> + .gfp_mask = GFP_IOFS,
> .may_unmap = 1,
> .may_swap = 1,
> /*
>
> Trond,
>
> Has anyone hit this issue? Or is this based on code
> inspection?
>
> The reason I ask is we are seeing a problem, similar to
> the symptom described, on RH based kernel but have not been able to
> reproduce on 2.6.35.
Hi Ram,
I was seeing it on NFS until I put in the following kswapd-specific hack
into nfs_release_page():
/* Only do I/O if gfp is a superset of GFP_KERNEL */
if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
int how = FLUSH_SYNC;
/* Don't let kswapd deadlock waiting for OOM RPC calls */
if (current_is_kswapd())
how = 0;
nfs_commit_inode(mapping->host, how);
}
Remove the 'if (current_is_kswapd())' line, and run an mmap() write
intensive workload, and it should hang pretty much every time.
Cheers
Trond
On Wed, Aug 18, 2010 at 03:04:01PM -0400, Trond Myklebust wrote:
> From: Trond Myklebust <[email protected]>
>
> Allowing kswapd to do GFP_KERNEL memory allocations (or any blocking memory
> allocations) is wrong and can cause deadlocks in try_to_release_page(), as
> the filesystem believes it is safe to allocate new memory and block,
> whereas kswapd is there specifically to clear a low-memory situation...
>
> Set the gfp_mask to GFP_IOFS instead.
I always thought releasepage was supposed to do almost zero work. It
could release an instantly freeable page but it wasn't supposed to dive
in and solve world hunger or anything.
I thought the VM would be using writepage for that.
-chris
On Wed, 2010-08-18 at 15:34 -0400, Chris Mason wrote:
> On Wed, Aug 18, 2010 at 03:04:01PM -0400, Trond Myklebust wrote:
> > From: Trond Myklebust <[email protected]>
> >
> > Allowing kswapd to do GFP_KERNEL memory allocations (or any blocking memory
> > allocations) is wrong and can cause deadlocks in try_to_release_page(), as
> > the filesystem believes it is safe to allocate new memory and block,
> > whereas kswapd is there specifically to clear a low-memory situation...
> >
> > Set the gfp_mask to GFP_IOFS instead.
>
> I always thought releasepage was supposed to do almost zero work. It
> could release an instantly freeable page but it wasn't supposed to dive
> in and solve world hunger or anything.
>
> I thought the VM would be using writepage for that.
writepage isn't sufficient for the NFS case: the page may be in the
'clean but unstable' state, in which case the NFS client needs to send a
COMMIT rpc call before the page can finally be released.
That is why we need the gfp_flag to tell us when it is safe to do this,
and when it is not.
The main case where it is safe and necessary for try_to_release_page()
to initiate a COMMIT call is in the invalidate_inode_pages2(). We might
want to do it in the kswapd case too, but in that case, we definitely
should tell the filesystem that it is unsafe to block.
Trond
On Wed, Aug 18, 2010 at 03:04:01PM -0400, Trond Myklebust wrote:
> From: Trond Myklebust <[email protected]>
>
> Allowing kswapd to do GFP_KERNEL memory allocations (or any blocking memory
> allocations) is wrong and can cause deadlocks in try_to_release_page(), as
> the filesystem believes it is safe to allocate new memory and block,
> whereas kswapd is there specifically to clear a low-memory situation...
>
> Set the gfp_mask to GFP_IOFS instead.
It would be more descriptive to say "remove the __GFP_WAIT bit".
The change looks reasonable _in itself_, since we always prefer to
avoid unnecessary waits for kswapd. So
Acked-by: Wu Fengguang <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> mm/vmscan.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ec5ddcc..716dd16 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2095,7 +2095,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
> unsigned long total_scanned;
> struct reclaim_state *reclaim_state = current->reclaim_state;
> struct scan_control sc = {
> - .gfp_mask = GFP_KERNEL,
> + .gfp_mask = GFP_IOFS,
> .may_unmap = 1,
> .may_swap = 1,
> /*
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> Hi Ram,
>
> I was seeing it on NFS until I put in the following kswapd-specific hack
> into nfs_release_page():
>
> /* Only do I/O if gfp is a superset of GFP_KERNEL */
> if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
> int how = FLUSH_SYNC;
>
> /* Don't let kswapd deadlock waiting for OOM RPC calls */
> if (current_is_kswapd())
> how = 0;
So the patch can remove the above workaround together, and add comment
that NFS exploits the gfp mask to avoid complex operations involving
recursive memory allocation and hence deadlock?
Thanks,
Fengguang
> nfs_commit_inode(mapping->host, how);
> }
>
> Remove the 'if (current_is_kswapd())' line, and run an mmap() write
> intensive workload, and it should hang pretty much every time.
>
> Cheers
> Trond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Fri, 2010-08-20 at 13:45 +0800, Wu Fengguang wrote:
> > Hi Ram,
> >
> > I was seeing it on NFS until I put in the following kswapd-specific hack
> > into nfs_release_page():
> >
> > /* Only do I/O if gfp is a superset of GFP_KERNEL */
> > if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
> > int how = FLUSH_SYNC;
> >
> > /* Don't let kswapd deadlock waiting for OOM RPC calls */
> > if (current_is_kswapd())
> > how = 0;
>
> So the patch can remove the above workaround together, and add comment
> that NFS exploits the gfp mask to avoid complex operations involving
> recursive memory allocation and hence deadlock?
I thought I'd send that as a separate patch, but yes, that is my
intention next.
Cheers
Trond