2010-08-18 19:04:21

by Myklebust, Trond

[permalink] [raw]
Subject: [PATCH] VM: kswapd should not do blocking memory allocations

From: Trond Myklebust <[email protected]>

Allowing kswapd to do GFP_KERNEL memory allocations (or any blocking memory
allocations) is wrong and can cause deadlocks in try_to_release_page(), as
the filesystem believes it is safe to allocate new memory and block,
whereas kswapd is there specifically to clear a low-memory situation...

Set the gfp_mask to GFP_IOFS instead.

Signed-off-by: Trond Myklebust <[email protected]>
---

mm/vmscan.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/mm/vmscan.c b/mm/vmscan.c
index ec5ddcc..716dd16 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2095,7 +2095,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
unsigned long total_scanned;
struct reclaim_state *reclaim_state = current->reclaim_state;
struct scan_control sc = {
- .gfp_mask = GFP_KERNEL,
+ .gfp_mask = GFP_IOFS,
.may_unmap = 1,
.may_swap = 1,
/*



2010-08-20 05:40:20

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH] VM: kswapd should not do blocking memory allocations

On Wed, Aug 18, 2010 at 03:04:01PM -0400, Trond Myklebust wrote:
> From: Trond Myklebust <[email protected]>
>
> Allowing kswapd to do GFP_KERNEL memory allocations (or any blocking memory
> allocations) is wrong and can cause deadlocks in try_to_release_page(), as
> the filesystem believes it is safe to allocate new memory and block,
> whereas kswapd is there specifically to clear a low-memory situation...
>
> Set the gfp_mask to GFP_IOFS instead.

It would be more descriptive to say "remove the __GFP_WAIT bit".

The change looks reasonable _in itself_, since we always prefer to
avoid unnecessary waits for kswapd. So

Acked-by: Wu Fengguang <[email protected]>

> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> mm/vmscan.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ec5ddcc..716dd16 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2095,7 +2095,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
> unsigned long total_scanned;
> struct reclaim_state *reclaim_state = current->reclaim_state;
> struct scan_control sc = {
> - .gfp_mask = GFP_KERNEL,
> + .gfp_mask = GFP_IOFS,
> .may_unmap = 1,
> .may_swap = 1,
> /*
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2010-08-18 19:34:57

by Chris Mason

[permalink] [raw]
Subject: Re: [PATCH] VM: kswapd should not do blocking memory allocations

On Wed, Aug 18, 2010 at 03:04:01PM -0400, Trond Myklebust wrote:
> From: Trond Myklebust <[email protected]>
>
> Allowing kswapd to do GFP_KERNEL memory allocations (or any blocking memory
> allocations) is wrong and can cause deadlocks in try_to_release_page(), as
> the filesystem believes it is safe to allocate new memory and block,
> whereas kswapd is there specifically to clear a low-memory situation...
>
> Set the gfp_mask to GFP_IOFS instead.

I always thought releasepage was supposed to do almost zero work. It
could release an instantly freeable page but it wasn't supposed to dive
in and solve world hunger or anything.

I thought the VM would be using writepage for that.

-chris

2010-08-18 20:10:14

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] VM: kswapd should not do blocking memory allocations

On Wed, 2010-08-18 at 15:34 -0400, Chris Mason wrote:
> On Wed, Aug 18, 2010 at 03:04:01PM -0400, Trond Myklebust wrote:
> > From: Trond Myklebust <[email protected]>
> >
> > Allowing kswapd to do GFP_KERNEL memory allocations (or any blocking memory
> > allocations) is wrong and can cause deadlocks in try_to_release_page(), as
> > the filesystem believes it is safe to allocate new memory and block,
> > whereas kswapd is there specifically to clear a low-memory situation...
> >
> > Set the gfp_mask to GFP_IOFS instead.
>
> I always thought releasepage was supposed to do almost zero work. It
> could release an instantly freeable page but it wasn't supposed to dive
> in and solve world hunger or anything.
>
> I thought the VM would be using writepage for that.

writepage isn't sufficient for the NFS case: the page may be in the
'clean but unstable' state, in which case the NFS client needs to send a
COMMIT rpc call before the page can finally be released.

That is why we need the gfp_flag to tell us when it is safe to do this,
and when it is not.
The main case where it is safe and necessary for try_to_release_page()
to initiate a COMMIT call is in the invalidate_inode_pages2(). We might
want to do it in the kswapd case too, but in that case, we definitely
should tell the filesystem that it is unsafe to block.

Trond