From: Michal Hocko <[email protected]>
Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
detection") has subtly changed semantic for costly high order requests
with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
My code inspection didn't reveal any such users in the tree but it is
true that this might lead to unexpected allocation failures and
subsequent OOPs.
__alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
There are few special cases but we are lacking a catch all place to be
sure we will not miss any case where the non failing allocation might
fail. This patch reorganizes the code a bit and puts all those special
cases under nopage label which is the generic go-to-fail path. Non
failing allocations are retried or those that cannot retry like
non-sleeping allocation go to the failure point directly. This should
make the code flow much easier to follow and make it less error prone
for future changes.
While we are there we have to move the stall check up to catch
potentially looping non-failing allocations.
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
Hi Andrew,
this has been posted previously as a 2 patch series [1]. This is the first patch.
The second one has generated a lot of discussion and Tetsuo has naked it based
because he is worried about a potential lockups. I have argued [2] that there
are other aspects to consider but then later realized that there is a different
risk in place which hasn't been considered before. There are some users who are
performing a lot of __GFP_NOFAIL|GFP_NOFS requests and we certainly do not want to
give them full access to memory reserves without invoking the OOM killer [3].
For that reason I have dropped the second patch for now and think about
this some more. The first patch still makes some sense and I find it as
a useful cleanup so I would ask you to merge it before I find a better
solution for the other issue. There was no opposition this this patch so I guess
it should be good to go.
[1] http://lkml.kernel.org/r/[email protected]
[2] http://lkml.kernel.org/r/[email protected]
[3] http://lkml.kernel.org/r/[email protected]
mm/page_alloc.c | 68 ++++++++++++++++++++++++++++++++++-----------------------
1 file changed, 41 insertions(+), 27 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3f2c9e535f7f..79b327d9c9a6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3640,32 +3640,23 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
goto got_pg;
/* Caller is not willing to reclaim, we can't balance anything */
- if (!can_direct_reclaim) {
- /*
- * All existing users of the __GFP_NOFAIL are blockable, so warn
- * of any new users that actually allow this type of allocation
- * to fail.
- */
- WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+ if (!can_direct_reclaim)
goto nopage;
+
+ /* Make sure we know about allocations which stall for too long */
+ if (time_after(jiffies, alloc_start + stall_timeout)) {
+ warn_alloc(gfp_mask,
+ "page alloction stalls for %ums, order:%u",
+ jiffies_to_msecs(jiffies-alloc_start), order);
+ stall_timeout += 10 * HZ;
}
/* Avoid recursion of direct reclaim */
- if (current->flags & PF_MEMALLOC) {
- /*
- * __GFP_NOFAIL request from this context is rather bizarre
- * because we cannot reclaim anything and only can loop waiting
- * for somebody to do a work for us.
- */
- if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
- cond_resched();
- goto retry;
- }
+ if (current->flags & PF_MEMALLOC)
goto nopage;
- }
/* Avoid allocations with no watermarks from looping endlessly */
- if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+ if (test_thread_flag(TIF_MEMDIE))
goto nopage;
@@ -3692,14 +3683,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
goto nopage;
- /* Make sure we know about allocations which stall for too long */
- if (time_after(jiffies, alloc_start + stall_timeout)) {
- warn_alloc(gfp_mask,
- "page allocation stalls for %ums, order:%u",
- jiffies_to_msecs(jiffies-alloc_start), order);
- stall_timeout += 10 * HZ;
- }
-
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
did_some_progress > 0, &no_progress_loops))
goto retry;
@@ -3728,6 +3711,37 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
}
nopage:
+ /*
+ * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
+ * we always retry
+ */
+ if (gfp_mask & __GFP_NOFAIL) {
+ /*
+ * All existing users of the __GFP_NOFAIL are blockable, so warn
+ * of any new users that actually require GFP_NOWAIT
+ */
+ if (WARN_ON_ONCE(!can_direct_reclaim))
+ goto fail;
+
+ /*
+ * PF_MEMALLOC request from this context is rather bizarre
+ * because we cannot reclaim anything and only can loop waiting
+ * for somebody to do a work for us
+ */
+ WARN_ON_ONCE(current->flags & PF_MEMALLOC);
+
+ /*
+ * non failing costly orders are a hard requirement which we
+ * are not prepared for much so let's warn about these users
+ * so that we can identify them and convert them to something
+ * else.
+ */
+ WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
+
+ cond_resched();
+ goto retry;
+ }
+fail:
warn_alloc(gfp_mask,
"page allocation failure: order:%u", order);
got_pg:
--
2.10.2
On Wed, Dec 14, 2016 at 04:07:06PM +0100, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
> detection") has subtly changed semantic for costly high order requests
> with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
> My code inspection didn't reveal any such users in the tree but it is
> true that this might lead to unexpected allocation failures and
> subsequent OOPs.
>
> __alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
> There are few special cases but we are lacking a catch all place to be
> sure we will not miss any case where the non failing allocation might
> fail. This patch reorganizes the code a bit and puts all those special
> cases under nopage label which is the generic go-to-fail path. Non
> failing allocations are retried or those that cannot retry like
> non-sleeping allocation go to the failure point directly. This should
> make the code flow much easier to follow and make it less error prone
> for future changes.
>
> While we are there we have to move the stall check up to catch
> potentially looping non-failing allocations.
>
> Signed-off-by: Michal Hocko <[email protected]>
> Acked-by: Vlastimil Babka <[email protected]>
It's not the nicest thing that we have to duplicate all the conditions
to warn on, but it's preferable over unreliable GFP_NOFAIL handling.
Consolidating the handling of this flag makes a lot of sense to me.
Acked-by: Johannes Weiner <[email protected]>
On Wednesday, December 14, 2016 11:07 PM Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
> detection") has subtly changed semantic for costly high order requests
> with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
> My code inspection didn't reveal any such users in the tree but it is
> true that this might lead to unexpected allocation failures and
> subsequent OOPs.
>
> __alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
> There are few special cases but we are lacking a catch all place to be
> sure we will not miss any case where the non failing allocation might
> fail. This patch reorganizes the code a bit and puts all those special
> cases under nopage label which is the generic go-to-fail path. Non
> failing allocations are retried or those that cannot retry like
> non-sleeping allocation go to the failure point directly. This should
> make the code flow much easier to follow and make it less error prone
> for future changes.
>
> While we are there we have to move the stall check up to catch
> potentially looping non-failing allocations.
>
> Signed-off-by: Michal Hocko <[email protected]>
> Acked-by: Vlastimil Babka <[email protected]>
> ---
> Hi Andrew,
> this has been posted previously as a 2 patch series [1]. This is the first patch.
> The second one has generated a lot of discussion and Tetsuo has naked it based
> because he is worried about a potential lockups. I have argued [2] that there
> are other aspects to consider but then later realized that there is a different
> risk in place which hasn't been considered before. There are some users who are
> performing a lot of __GFP_NOFAIL|GFP_NOFS requests and we certainly do not want to
> give them full access to memory reserves without invoking the OOM killer [3].
>
> For that reason I have dropped the second patch for now and think about
> this some more. The first patch still makes some sense and I find it as
> a useful cleanup so I would ask you to merge it before I find a better
> solution for the other issue. There was no opposition this this patch so I guess
> it should be good to go.
>
> [1] http://lkml.kernel.org/r/[email protected]
> [2] http://lkml.kernel.org/r/[email protected]
> [3] http://lkml.kernel.org/r/[email protected]
>
> mm/page_alloc.c | 68 ++++++++++++++++++++++++++++++++++-----------------------
> 1 file changed, 41 insertions(+), 27 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3f2c9e535f7f..79b327d9c9a6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3640,32 +3640,23 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> goto got_pg;
>
> /* Caller is not willing to reclaim, we can't balance anything */
> - if (!can_direct_reclaim) {
> - /*
> - * All existing users of the __GFP_NOFAIL are blockable, so warn
> - * of any new users that actually allow this type of allocation
> - * to fail.
> - */
> - WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
> + if (!can_direct_reclaim)
> goto nopage;
> +
> + /* Make sure we know about allocations which stall for too long */
> + if (time_after(jiffies, alloc_start + stall_timeout)) {
> + warn_alloc(gfp_mask,
> + "page alloction stalls for %ums, order:%u",
> + jiffies_to_msecs(jiffies-alloc_start), order);
> + stall_timeout += 10 * HZ;
> }
>
> /* Avoid recursion of direct reclaim */
> - if (current->flags & PF_MEMALLOC) {
> - /*
> - * __GFP_NOFAIL request from this context is rather bizarre
> - * because we cannot reclaim anything and only can loop waiting
> - * for somebody to do a work for us.
> - */
> - if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> - cond_resched();
> - goto retry;
> - }
> + if (current->flags & PF_MEMALLOC)
> goto nopage;
> - }
>
> /* Avoid allocations with no watermarks from looping endlessly */
> - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> + if (test_thread_flag(TIF_MEMDIE))
> goto nopage;
>
Nit: currently we allow TIF_MEMDIE & __GFP_NOFAIL request to
try direct reclaim. Are you intentionally reclaiming that chance?
Other than that, feel free to add
Acked-by: Hillf Danton <[email protected]>
>
> @@ -3692,14 +3683,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
> goto nopage;
>
> - /* Make sure we know about allocations which stall for too long */
> - if (time_after(jiffies, alloc_start + stall_timeout)) {
> - warn_alloc(gfp_mask,
> - "page allocation stalls for %ums, order:%u",
> - jiffies_to_msecs(jiffies-alloc_start), order);
> - stall_timeout += 10 * HZ;
> - }
> -
> if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
> did_some_progress > 0, &no_progress_loops))
> goto retry;
> @@ -3728,6 +3711,37 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> }
>
> nopage:
> + /*
> + * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
> + * we always retry
> + */
> + if (gfp_mask & __GFP_NOFAIL) {
> + /*
> + * All existing users of the __GFP_NOFAIL are blockable, so warn
> + * of any new users that actually require GFP_NOWAIT
> + */
> + if (WARN_ON_ONCE(!can_direct_reclaim))
> + goto fail;
> +
> + /*
> + * PF_MEMALLOC request from this context is rather bizarre
> + * because we cannot reclaim anything and only can loop waiting
> + * for somebody to do a work for us
> + */
> + WARN_ON_ONCE(current->flags & PF_MEMALLOC);
> +
> + /*
> + * non failing costly orders are a hard requirement which we
> + * are not prepared for much so let's warn about these users
> + * so that we can identify them and convert them to something
> + * else.
> + */
> + WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
> +
> + cond_resched();
> + goto retry;
> + }
> +fail:
> warn_alloc(gfp_mask,
> "page allocation failure: order:%u", order);
> got_pg:
> --
> 2.10.2
On Thu 15-12-16 15:54:37, Hillf Danton wrote:
> On Wednesday, December 14, 2016 11:07 PM Michal Hocko wrote:
[...]
> > /* Avoid allocations with no watermarks from looping endlessly */
> > - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> > + if (test_thread_flag(TIF_MEMDIE))
> > goto nopage;
> >
> Nit: currently we allow TIF_MEMDIE & __GFP_NOFAIL request to
> try direct reclaim. Are you intentionally reclaiming that chance?
That is definitely not a nit! Thanks for catching that. We definitely
shouldn't bypass the direct reclaim because that would mean we rely on
somebody else makes progress for us.
Updated patch below:
---
>From cebd2d933f245a59504fdce31312b67186311e50 Mon Sep 17 00:00:00 2001
From: Michal Hocko <[email protected]>
Date: Tue, 22 Nov 2016 07:52:58 +0100
Subject: [PATCH] mm: consolidate GFP_NOFAIL checks in the allocator slowpath
Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
detection") has subtly changed semantic for costly high order requests
with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
My code inspection didn't reveal any such users in the tree but it is
true that this might lead to unexpected allocation failures and
subsequent OOPs.
__alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
There are few special cases but we are lacking a catch all place to be
sure we will not miss any case where the non failing allocation might
fail. This patch reorganizes the code a bit and puts all those special
cases under nopage label which is the generic go-to-fail path. Non
failing allocations are retried or those that cannot retry like
non-sleeping allocation go to the failure point directly. This should
make the code flow much easier to follow and make it less error prone
for future changes.
While we are there we have to move the stall check up to catch
potentially looping non-failing allocations.
Changes since v1
- do not skip direct reclaim for TIF_MEMDIE && GFP_NOFAIL as per Hillf
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Hillf Danton <[email protected]>
---
mm/page_alloc.c | 75 +++++++++++++++++++++++++++++++++------------------------
1 file changed, 44 insertions(+), 31 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3f2c9e535f7f..3f44a5115b4c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3640,35 +3640,21 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
goto got_pg;
/* Caller is not willing to reclaim, we can't balance anything */
- if (!can_direct_reclaim) {
- /*
- * All existing users of the __GFP_NOFAIL are blockable, so warn
- * of any new users that actually allow this type of allocation
- * to fail.
- */
- WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+ if (!can_direct_reclaim)
goto nopage;
- }
- /* Avoid recursion of direct reclaim */
- if (current->flags & PF_MEMALLOC) {
- /*
- * __GFP_NOFAIL request from this context is rather bizarre
- * because we cannot reclaim anything and only can loop waiting
- * for somebody to do a work for us.
- */
- if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
- cond_resched();
- goto retry;
- }
- goto nopage;
+ /* Make sure we know about allocations which stall for too long */
+ if (time_after(jiffies, alloc_start + stall_timeout)) {
+ warn_alloc(gfp_mask,
+ "page alloction stalls for %ums, order:%u",
+ jiffies_to_msecs(jiffies-alloc_start), order);
+ stall_timeout += 10 * HZ;
}
- /* Avoid allocations with no watermarks from looping endlessly */
- if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+ /* Avoid recursion of direct reclaim */
+ if (current->flags & PF_MEMALLOC)
goto nopage;
-
/* Try direct reclaim and then allocating */
page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
&did_some_progress);
@@ -3681,6 +3667,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page)
goto got_pg;
+ /* Avoid allocations with no watermarks from looping endlessly */
+ if (test_thread_flag(TIF_MEMDIE))
+ goto nopage;
+
/* Do not loop if specifically requested */
if (gfp_mask & __GFP_NORETRY)
goto nopage;
@@ -3692,14 +3682,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
goto nopage;
- /* Make sure we know about allocations which stall for too long */
- if (time_after(jiffies, alloc_start + stall_timeout)) {
- warn_alloc(gfp_mask,
- "page allocation stalls for %ums, order:%u",
- jiffies_to_msecs(jiffies-alloc_start), order);
- stall_timeout += 10 * HZ;
- }
-
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
did_some_progress > 0, &no_progress_loops))
goto retry;
@@ -3728,6 +3710,37 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
}
nopage:
+ /*
+ * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
+ * we always retry
+ */
+ if (gfp_mask & __GFP_NOFAIL) {
+ /*
+ * All existing users of the __GFP_NOFAIL are blockable, so warn
+ * of any new users that actually require GFP_NOWAIT
+ */
+ if (WARN_ON_ONCE(!can_direct_reclaim))
+ goto fail;
+
+ /*
+ * PF_MEMALLOC request from this context is rather bizarre
+ * because we cannot reclaim anything and only can loop waiting
+ * for somebody to do a work for us
+ */
+ WARN_ON_ONCE(current->flags & PF_MEMALLOC);
+
+ /*
+ * non failing costly orders are a hard requirement which we
+ * are not prepared for much so let's warn about these users
+ * so that we can identify them and convert them to something
+ * else.
+ */
+ WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
+
+ cond_resched();
+ goto retry;
+ }
+fail:
warn_alloc(gfp_mask,
"page allocation failure: order:%u", order);
got_pg:
--
2.10.2
--
Michal Hocko
SUSE Labs
Michal Hocko wrote:
> On Thu 15-12-16 15:54:37, Hillf Danton wrote:
> > On Wednesday, December 14, 2016 11:07 PM Michal Hocko wrote:
> [...]
> > > /* Avoid allocations with no watermarks from looping endlessly */
> > > - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> > > + if (test_thread_flag(TIF_MEMDIE))
> > > goto nopage;
> > >
> > Nit: currently we allow TIF_MEMDIE & __GFP_NOFAIL request to
> > try direct reclaim. Are you intentionally reclaiming that chance?
>
> That is definitely not a nit! Thanks for catching that. We definitely
> shouldn't bypass the direct reclaim because that would mean we rely on
> somebody else makes progress for us.
>
> Updated patch below:
> ---
> From cebd2d933f245a59504fdce31312b67186311e50 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <[email protected]>
> Date: Tue, 22 Nov 2016 07:52:58 +0100
> Subject: [PATCH] mm: consolidate GFP_NOFAIL checks in the allocator slowpath
>
> Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
> detection") has subtly changed semantic for costly high order requests
> with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
> My code inspection didn't reveal any such users in the tree but it is
> true that this might lead to unexpected allocation failures and
> subsequent OOPs.
>
> __alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
> There are few special cases but we are lacking a catch all place to be
> sure we will not miss any case where the non failing allocation might
> fail. This patch reorganizes the code a bit and puts all those special
> cases under nopage label which is the generic go-to-fail path. Non
> failing allocations are retried or those that cannot retry like
> non-sleeping allocation go to the failure point directly. This should
> make the code flow much easier to follow and make it less error prone
> for future changes.
>
> While we are there we have to move the stall check up to catch
> potentially looping non-failing allocations.
Currently we allow TIF_MEMDIE && __GFP_NOFAIL threads to call
__alloc_pages_may_oom() after !__alloc_pages_direct_reclaim() &&
!__alloc_pages_direct_compact() && !should_reclaim_retry() &&
!should_compact_retry().
But this patch changes TIF_MEMDIE && __GFP_NOFAIL threads not to call
__alloc_pages_may_oom(). If this is intentional, please describe it
(i.e. this patch adds a location which currently does not cause OOM
livelock) in change log. (I don't trust your assumption that __GFP_FS
allocations are running in parallel and they will call out_of_memory()
on behalf of TIF_MEMDIE && __GFP_NOFAIL threads. Surprising things
(e.g. all __GFP_FS allocations get stuck due to kswapd v.s.
shrink_inactive_list() trap) can happen if TIF_MEMDIE && __GFP_NOFAIL
has to loop forever. The OOM reaper allows selecting next OOM victim
by setting MMF_OOM_REAPED does not help if we hit surprising traps.
>
> Changes since v1
> - do not skip direct reclaim for TIF_MEMDIE && GFP_NOFAIL as per Hillf
>
> Signed-off-by: Michal Hocko <[email protected]>
> Acked-by: Vlastimil Babka <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Acked-by: Hillf Danton <[email protected]>
> ---
> mm/page_alloc.c | 75 +++++++++++++++++++++++++++++++++------------------------
> 1 file changed, 44 insertions(+), 31 deletions(-)
>
On Fri 16-12-16 20:39:12, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Thu 15-12-16 15:54:37, Hillf Danton wrote:
> > > On Wednesday, December 14, 2016 11:07 PM Michal Hocko wrote:
> > [...]
> > > > /* Avoid allocations with no watermarks from looping endlessly */
> > > > - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> > > > + if (test_thread_flag(TIF_MEMDIE))
> > > > goto nopage;
> > > >
> > > Nit: currently we allow TIF_MEMDIE & __GFP_NOFAIL request to
> > > try direct reclaim. Are you intentionally reclaiming that chance?
> >
> > That is definitely not a nit! Thanks for catching that. We definitely
> > shouldn't bypass the direct reclaim because that would mean we rely on
> > somebody else makes progress for us.
> >
> > Updated patch below:
> > ---
> > From cebd2d933f245a59504fdce31312b67186311e50 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <[email protected]>
> > Date: Tue, 22 Nov 2016 07:52:58 +0100
> > Subject: [PATCH] mm: consolidate GFP_NOFAIL checks in the allocator slowpath
> >
> > Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
> > detection") has subtly changed semantic for costly high order requests
> > with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
> > My code inspection didn't reveal any such users in the tree but it is
> > true that this might lead to unexpected allocation failures and
> > subsequent OOPs.
> >
> > __alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
> > There are few special cases but we are lacking a catch all place to be
> > sure we will not miss any case where the non failing allocation might
> > fail. This patch reorganizes the code a bit and puts all those special
> > cases under nopage label which is the generic go-to-fail path. Non
> > failing allocations are retried or those that cannot retry like
> > non-sleeping allocation go to the failure point directly. This should
> > make the code flow much easier to follow and make it less error prone
> > for future changes.
> >
> > While we are there we have to move the stall check up to catch
> > potentially looping non-failing allocations.
>
> Currently we allow TIF_MEMDIE && __GFP_NOFAIL threads to call
> __alloc_pages_may_oom() after !__alloc_pages_direct_reclaim() &&
> !__alloc_pages_direct_compact() && !should_reclaim_retry() &&
> !should_compact_retry().
>
> But this patch changes TIF_MEMDIE && __GFP_NOFAIL threads not to call
> __alloc_pages_may_oom(). If this is intentional, please describe it
> (i.e. this patch adds a location which currently does not cause OOM
> livelock) in change log.
No, it's not intentional. And you have a point, we shouldn't bypass
__alloc_pages_may_oom. Does the following on top look any better?
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3f44a5115b4c..095e2fa286de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3667,10 +3667,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page)
goto got_pg;
- /* Avoid allocations with no watermarks from looping endlessly */
- if (test_thread_flag(TIF_MEMDIE))
- goto nopage;
-
/* Do not loop if specifically requested */
if (gfp_mask & __GFP_NORETRY)
goto nopage;
@@ -3703,6 +3699,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page)
goto got_pg;
+ /* Avoid allocations with no watermarks from looping endlessly */
+ if (test_thread_flag(TIF_MEMDIE))
+ goto nopage;
+
/* Retry as long as the OOM killer is making progress */
if (did_some_progress) {
no_progress_loops = 0;
--
Michal Hocko
SUSE Labs