Processes that have been OOM killed set the thread flag TIF_MEMDIE. A process
such as this is expected to exit the page allocator but potentially, it
loops forever. This patch checks TIF_MEMDIE when deciding whether to loop
again in the page allocator. If set, and __GFP_NOFAIL is not specified
then the loop will exit on the assumption it's no longer important for the
process to make forward progress. Note that a process that has just been
OOM-killed will still loop at least one more time retrying the allocation
before the thread flag is checked.
Signed-off-by: Mel Gorman <[email protected]>
---
mm/page_alloc.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8902e7..5c98d02 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1547,6 +1547,14 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
if (gfp_mask & __GFP_NORETRY)
return 0;
+ /* Do not loop if OOM-killed unless __GFP_NOFAIL is specified */
+ if (test_thread_flag(TIF_MEMDIE)) {
+ if (gfp_mask & __GFP_NOFAIL)
+ WARN(1, "Potential infinite loop with __GFP_NOFAIL");
+ else
+ return 0;
+ }
+
/*
* In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
* means __GFP_NOFAIL, but that may not be true in other
On Wed, 15 Jul 2009, Mel Gorman wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f8902e7..5c98d02 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1547,6 +1547,14 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
> if (gfp_mask & __GFP_NORETRY)
> return 0;
>
> + /* Do not loop if OOM-killed unless __GFP_NOFAIL is specified */
> + if (test_thread_flag(TIF_MEMDIE)) {
> + if (gfp_mask & __GFP_NOFAIL)
> + WARN(1, "Potential infinite loop with __GFP_NOFAIL");
> + else
> + return 0;
> + }
> +
> /*
> * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
> * means __GFP_NOFAIL, but that may not be true in other
>
This only works for GFP_ATOMIC since the next iteration of the page
allocator will (probably) fail reclaim and simply invoke the oom killer
again, which will notice current has TIF_MEMDIE set and choose to do
nothing, at which time the allocator simply loops again.
On Wed, 15 Jul 2009 11:49:45 +0100
Mel Gorman <[email protected]> wrote:
> Processes that have been OOM killed set the thread flag TIF_MEMDIE. A process
> such as this is expected to exit the page allocator but potentially, it
> loops forever. This patch checks TIF_MEMDIE when deciding whether to loop
> again in the page allocator. If set, and __GFP_NOFAIL is not specified
> then the loop will exit on the assumption it's no longer important for the
> process to make forward progress. Note that a process that has just been
> OOM-killed will still loop at least one more time retrying the allocation
> before the thread flag is checked.
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/page_alloc.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f8902e7..5c98d02 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1547,6 +1547,14 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
> if (gfp_mask & __GFP_NORETRY)
> return 0;
>
> + /* Do not loop if OOM-killed unless __GFP_NOFAIL is specified */
> + if (test_thread_flag(TIF_MEMDIE)) {
> + if (gfp_mask & __GFP_NOFAIL)
> + WARN(1, "Potential infinite loop with __GFP_NOFAIL");
> + else
> + return 0;
> + }
> +
> /*
> * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
> * means __GFP_NOFAIL, but that may not be true in other
This fixes a post-2.6.30 regression, yes?
I dug out the commit ID a while back but lost it. Ho hum.
On Wed, 15 Jul 2009, David Rientjes wrote:
> This only works for GFP_ATOMIC since the next iteration of the page
> allocator will (probably) fail reclaim and simply invoke the oom killer
> again, which will notice current has TIF_MEMDIE set and choose to do
> nothing, at which time the allocator simply loops again.
>
In other words, I'd propose this as an alternative patch.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1789,6 +1789,10 @@ rebalance:
if (p->flags & PF_MEMALLOC)
goto nopage;
+ /* Avoid allocations with no watermarks from looping endlessly */
+ if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+ goto nopage;
+
/* Try direct reclaim and then allocating */
page = __alloc_pages_direct_reclaim(gfp_mask, order,
zonelist, high_zoneidx,
On Wed, Jul 15, 2009 at 01:29:33PM -0700, David Rientjes wrote:
> On Wed, 15 Jul 2009, Mel Gorman wrote:
>
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f8902e7..5c98d02 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1547,6 +1547,14 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
> > if (gfp_mask & __GFP_NORETRY)
> > return 0;
> >
> > + /* Do not loop if OOM-killed unless __GFP_NOFAIL is specified */
> > + if (test_thread_flag(TIF_MEMDIE)) {
> > + if (gfp_mask & __GFP_NOFAIL)
> > + WARN(1, "Potential infinite loop with __GFP_NOFAIL");
> > + else
> > + return 0;
> > + }
> > +
> > /*
> > * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
> > * means __GFP_NOFAIL, but that may not be true in other
> >
>
> This only works for GFP_ATOMIC since the next iteration of the page
> allocator will (probably) fail reclaim and simply invoke the oom killer
> again,
GFP_ATOMIC should not be calling the OOM killer. It has already
exited. Immeditely after an OOM kill, I would expect the allocation to
succeed. However, in the event that the task selected for OOM killing is
the current one and no other task exits, it could loop.
> which will notice current has TIF_MEMDIE set and choose to do
> nothing, at which time the allocator simply loops again.
>
So, we should unconditionally check if we should loop again whether we
have OOM killed or not which the following should do.
==== CUT HERE ====
page-allocator: Check after an OOM kill if the allocator should loop
Currently, the allocator loops unconditionally after an OOM kill on the
assumption that the allocation will succeed. However, if the task
selected for OOM-kill is the current task, it could in theory loop
forever and always entering the OOM killer. This patch checks as normal
after an OOM kill if the allocator should loop again.
Signed-off-by: Mel Gorman <[email protected]>
--
mm/page_alloc.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4b8552e..b381a6b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1830,8 +1830,6 @@ rebalance:
if (order > PAGE_ALLOC_COSTLY_ORDER &&
!(gfp_mask & __GFP_NOFAIL))
goto nopage;
-
- goto restart;
}
}
On Wed, Jul 15, 2009 at 01:30:14PM -0700, Andrew Morton wrote:
> On Wed, 15 Jul 2009 11:49:45 +0100
> Mel Gorman <[email protected]> wrote:
>
> > Processes that have been OOM killed set the thread flag TIF_MEMDIE. A process
> > such as this is expected to exit the page allocator but potentially, it
> > loops forever. This patch checks TIF_MEMDIE when deciding whether to loop
> > again in the page allocator. If set, and __GFP_NOFAIL is not specified
> > then the loop will exit on the assumption it's no longer important for the
> > process to make forward progress. Note that a process that has just been
> > OOM-killed will still loop at least one more time retrying the allocation
> > before the thread flag is checked.
> >
> > Signed-off-by: Mel Gorman <[email protected]>
> > ---
> > mm/page_alloc.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f8902e7..5c98d02 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1547,6 +1547,14 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
> > if (gfp_mask & __GFP_NORETRY)
> > return 0;
> >
> > + /* Do not loop if OOM-killed unless __GFP_NOFAIL is specified */
> > + if (test_thread_flag(TIF_MEMDIE)) {
> > + if (gfp_mask & __GFP_NOFAIL)
> > + WARN(1, "Potential infinite loop with __GFP_NOFAIL");
> > + else
> > + return 0;
> > + }
> > +
> > /*
> > * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
> > * means __GFP_NOFAIL, but that may not be true in other
>
> This fixes a post-2.6.30 regression, yes?
>
> I dug out the commit ID a while back but lost it. Ho hum.
>
You made a note at the time "Offending commit 341ce06 handled the PF_MEMALLOC
case but forgot about the TIF_MEMDIE case."
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Thu, 16 Jul 2009, Mel Gorman wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4b8552e..b381a6b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1830,8 +1830,6 @@ rebalance:
> if (order > PAGE_ALLOC_COSTLY_ORDER &&
> !(gfp_mask & __GFP_NOFAIL))
> goto nopage;
> -
> - goto restart;
> }
> }
>
>
This isn't right (and not only because it'll add a compiler warning
because `restart' is now unused).
This would immediately fail any allocation that triggered the oom killer
and ended up being selected that isn't __GFP_NOFAIL, even if it would have
succeeded without even killing any task simply because it allocates
without watermarks.
It will also, coupled with your earlier patch, inappropriately warn about
an infinite loop with __GFP_NOFAIL even though it hasn't even attempted to
loop once since that decision is now handled by should_alloc_retry().
The liklihood of such an infinite loop, considering only one thread per
system (or cpuset) can be TIF_MEMDIE at a time, is very low. I've never
seen memory reserves completely depleted such that the next high-priority
allocation wouldn't succeed so that current could handle its pending
SIGKILL.
You get the same behavior with my patch, but are allowed to try the high
priority allocation again for the attempt that triggered the oom killer
(and not only subsequent ones).
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1789,6 +1789,10 @@ rebalance:
if (p->flags & PF_MEMALLOC)
goto nopage;
+ /* Avoid allocations with no watermarks from looping endlessly */
+ if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+ goto nopage;
+
/* Try direct reclaim and then allocating */
page = __alloc_pages_direct_reclaim(gfp_mask, order,
zonelist, high_zoneidx,
On Thu, Jul 16, 2009 at 12:14:13PM -0700, David Rientjes wrote:
> On Thu, 16 Jul 2009, Mel Gorman wrote:
>
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 4b8552e..b381a6b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1830,8 +1830,6 @@ rebalance:
> > if (order > PAGE_ALLOC_COSTLY_ORDER &&
> > !(gfp_mask & __GFP_NOFAIL))
> > goto nopage;
> > -
> > - goto restart;
> > }
> > }
> >
> >
>
> This isn't right (and not only because it'll add a compiler warning
> because `restart' is now unused).
>
> This would immediately fail any allocation that triggered the oom killer
> and ended up being selected that isn't __GFP_NOFAIL, even if it would have
> succeeded without even killing any task simply because it allocates
> without watermarks.
>
> It will also, coupled with your earlier patch, inappropriately warn about
> an infinite loop with __GFP_NOFAIL even though it hasn't even attempted to
> loop once since that decision is now handled by should_alloc_retry().
>
> The liklihood of such an infinite loop, considering only one thread per
> system (or cpuset) can be TIF_MEMDIE at a time, is very low. I've never
> seen memory reserves completely depleted such that the next high-priority
> allocation wouldn't succeed so that current could handle its pending
> SIGKILL.
>
> You get the same behavior with my patch, but are allowed to try the high
> priority allocation again for the attempt that triggered the oom killer
> (and not only subsequent ones).
Ok, lets go with this patch then. Thanks
> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1789,6 +1789,10 @@ rebalance:
> if (p->flags & PF_MEMALLOC)
> goto nopage;
>
> + /* Avoid allocations with no watermarks from looping endlessly */
> + if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> + goto nopage;
> +
> /* Try direct reclaim and then allocating */
> page = __alloc_pages_direct_reclaim(gfp_mask, order,
> zonelist, high_zoneidx,
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
On Fri, 17 Jul 2009, Mel Gorman wrote:
> Ok, lets go with this patch then. Thanks
>
Ok, thanks, I'll add that as your acked-by and I'll write a formal patch
description for it.
mm: avoid endless looping for oom killed tasks
If a task is oom killed and still cannot find memory when trying with no
watermarks, it's better to fail the allocation attempt than to loop
endlessly. Direct reclaim has already failed and the oom killer will be a
no-op since current has yet to die, so there is no other alternative for
allocations that are not __GFP_NOFAIL.
Acked-by: Mel Gorman <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1789,6 +1789,10 @@ rebalance:
if (p->flags & PF_MEMALLOC)
goto nopage;
+ /* Avoid allocations with no watermarks from looping endlessly */
+ if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+ goto nopage;
+
/* Try direct reclaim and then allocating */
page = __alloc_pages_direct_reclaim(gfp_mask, order,
zonelist, high_zoneidx,
On Fri, 17 Jul 2009, David Rientjes wrote:
> On Fri, 17 Jul 2009, Mel Gorman wrote:
>
> > Ok, lets go with this patch then. Thanks
> >
>
> Ok, thanks, I'll add that as your acked-by and I'll write a formal patch
> description for it.
>
>
> mm: avoid endless looping for oom killed tasks
>
> If a task is oom killed and still cannot find memory when trying with no
> watermarks, it's better to fail the allocation attempt than to loop
> endlessly. Direct reclaim has already failed and the oom killer will be a
> no-op since current has yet to die, so there is no other alternative for
> allocations that are not __GFP_NOFAIL.
>
> Acked-by: Mel Gorman <[email protected]>
> Signed-off-by: David Rientjes <[email protected]>
This works much better for me than earlier variants (I'm needing to worry
about OOM when KSM has a lot of pages to break COW on; but a large mlock
is a good test) - thanks.
Acked-by: Hugh Dickins <[email protected]>
> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1789,6 +1789,10 @@ rebalance:
> if (p->flags & PF_MEMALLOC)
> goto nopage;
>
> + /* Avoid allocations with no watermarks from looping endlessly */
> + if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> + goto nopage;
> +
> /* Try direct reclaim and then allocating */
> page = __alloc_pages_direct_reclaim(gfp_mask, order,
> zonelist, high_zoneidx,