It does not make sense to retry compaction when a fatal signal is
pending.
In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
returned; albeit, not every zone, on the zone list, would be considered
in the case a fatal signal is found to be pending.
Yet, in should_compact_retry(), given the last known compaction result,
each zone, on the zone list, can be considered/or checked
(see compaction_zonelist_suitable()). For example, if a zone was found
to succeed, then reclaim/compaction would be tried again
(notwithstanding the above).
This patch ensures that compaction is not needlessly retried
irrespective of the last known compaction result e.g. if it was skipped,
in the unlikely case a fatal signal is found pending.
So, OOM is at least attempted.
Signed-off-by: Aaron Tomlin <[email protected]>
---
mm/page_alloc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aaa1655cf682..b317057ac186 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
if (!order)
return false;
+ if (fatal_signal_pending(current))
+ return false;
+
if (compaction_made_progress(compact_result))
(*compaction_retries)++;
--
2.26.3
On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <[email protected]> wrote:
> It does not make sense to retry compaction when a fatal signal is
> pending.
Well, it might make sense. Presumably it is beneficial to other tasks.
> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
> returned; albeit, not every zone, on the zone list, would be considered
> in the case a fatal signal is found to be pending.
> Yet, in should_compact_retry(), given the last known compaction result,
> each zone, on the zone list, can be considered/or checked
> (see compaction_zonelist_suitable()). For example, if a zone was found
> to succeed, then reclaim/compaction would be tried again
> (notwithstanding the above).
>
> This patch ensures that compaction is not needlessly retried
> irrespective of the last known compaction result e.g. if it was skipped,
> in the unlikely case a fatal signal is found pending.
> So, OOM is at least attempted.
What observed problems motivated this change?
What were the observed runtime effects of this change?
On 5/20/21 6:34 AM, Andrew Morton wrote:
> On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <[email protected]> wrote:
>
>> It does not make sense to retry compaction when a fatal signal is
>> pending.
>
> Well, it might make sense. Presumably it is beneficial to other tasks.
Yeah but the compaction won't happen. compact_zone() will immediately detect it
via __compact_finished() and bail out. So in that sense it does not make sense
to retry :)
>> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
>> returned; albeit, not every zone, on the zone list, would be considered
>> in the case a fatal signal is found to be pending.
>> Yet, in should_compact_retry(), given the last known compaction result,
>> each zone, on the zone list, can be considered/or checked
>> (see compaction_zonelist_suitable()). For example, if a zone was found
>> to succeed, then reclaim/compaction would be tried again
>> (notwithstanding the above).
>>
>> This patch ensures that compaction is not needlessly retried
>> irrespective of the last known compaction result e.g. if it was skipped,
>> in the unlikely case a fatal signal is found pending.
>> So, OOM is at least attempted.
>
> What observed problems motivated this change?
>
> What were the observed runtime effects of this change?
Yep those details from the previous thread should be included here.
On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote:
> On 5/20/21 6:34 AM, Andrew Morton wrote:
> >
> > What observed problems motivated this change?
> >
> > What were the observed runtime effects of this change?
>
> Yep those details from the previous thread should be included here.
Fair enough.
During kernel crash dump/or vmcore analysis: I discovered in the context of
__alloc_pages_slowpath() the value stored in the no_progress_loops variable
was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal
signal was pending against current.
#6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24
#7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4
#8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130
// w28 = *(sp + 148) /* no_progress_loops */
0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>: ldr w0, [sp,#148]
// w0 = w0 + 0x1
0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>: add w0, w0, #0x1
// *(sp + 148) = w0
0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>: str w0, [sp,#148]
// if (w0 >= 0x10)
// goto __alloc_pages_nodemask+0x904
0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>: cmp w0, #0x10
0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>: b.gt 0xffff0000102cd534
- The stack pointer was 0xffff00002e78f900
crash> p *(int *)(0xffff00002e78f900+148)
$1 = 31611688
crash> ps 521171
PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 521171 1 36 ffff8080e2128800 RU 0.0 34789440 18624 special
crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending
$2 = (struct sigpending *) 0xffff80809a416e40
crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0]
$3 = 0x804100
crash> sig -s 0x804100
SIGKILL SIGTERM SIGXCPU
crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1)
$4 = 0x100
Unfortunately, this incident was not reproduced, to date.
Kind regards,
--
Aaron Tomlin
On Wed, May 19, 2021 at 09:34:55PM -0700, Andrew Morton wrote:
> On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <[email protected]> wrote:
>
> > It does not make sense to retry compaction when a fatal signal is
> > pending.
>
> Well, it might make sense. Presumably it is beneficial to other tasks.
Apart from Vlastimil's point, if I hit ^C, I want the task to die,
as soon as possible. I don't want it to do things which are beneficial
to other tasks, I want my shell prompt back, not spending seconds
trying to compact memory. Some other task can do that if _it_ needs
large contiguous chunks.
On Thu, May 20, 2021 at 12:42:57PM +0100, Aaron Tomlin wrote:
> On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote:
> > On 5/20/21 6:34 AM, Andrew Morton wrote:
> > >
> > > What observed problems motivated this change?
> > >
> > > What were the observed runtime effects of this change?
> >
> > Yep those details from the previous thread should be included here.
>
> Fair enough.
>
> During kernel crash dump/or vmcore analysis: I discovered in the context of
> __alloc_pages_slowpath() the value stored in the no_progress_loops variable
> was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal
> signal was pending against current.
While this is true, it's not really answering Andrew's question.
What we want as part of the commit message is something like:
"A customer experienced a low memory situation and sent their task a
fatal signal. Instead of dying promptly, it looped in the page
allocator failing to make progress because ..."
>
> #6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24
> #7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4
> #8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130
>
> // w28 = *(sp + 148) /* no_progress_loops */
> 0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>: ldr w0, [sp,#148]
> // w0 = w0 + 0x1
> 0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>: add w0, w0, #0x1
> // *(sp + 148) = w0
> 0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>: str w0, [sp,#148]
> // if (w0 >= 0x10)
> // goto __alloc_pages_nodemask+0x904
> 0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>: cmp w0, #0x10
> 0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>: b.gt 0xffff0000102cd534
>
> - The stack pointer was 0xffff00002e78f900
>
> crash> p *(int *)(0xffff00002e78f900+148)
> $1 = 31611688
>
> crash> ps 521171
> PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > 521171 1 36 ffff8080e2128800 RU 0.0 34789440 18624 special
>
> crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending
> $2 = (struct sigpending *) 0xffff80809a416e40
>
> crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0]
> $3 = 0x804100
>
> crash> sig -s 0x804100
> SIGKILL SIGTERM SIGXCPU
>
> crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1)
> $4 = 0x100
>
>
> Unfortunately, this incident was not reproduced, to date.
>
>
>
>
>
> Kind regards,
>
> --
> Aaron Tomlin
A customer experienced a low-memory situation and decided to issue a
SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
would expect, the aforementioned task remained unresponsive.
Further investigation indicated that the task was "stuck" in the
reclaim/compaction retry loop. Now, it does not make sense to retry
compaction when a fatal signal is pending.
In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
returned; albeit, not every zone, on the zone list, would be considered
in the case a fatal signal is found to be pending.
Yet, in should_compact_retry(), given the last known compaction result,
each zone, on the zone list, can be considered/or checked
(see compaction_zonelist_suitable()). For example, if a zone was found
to succeed, then reclaim/compaction would be tried again
(notwithstanding the above).
This patch ensures that compaction is not needlessly retried
irrespective of the last known compaction result e.g. if it was skipped,
in the unlikely case a fatal signal is found pending.
So, OOM is at least attempted.
Signed-off-by: Aaron Tomlin <[email protected]>
---
mm/page_alloc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aaa1655cf682..b317057ac186 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
if (!order)
return false;
+ if (fatal_signal_pending(current))
+ return false;
+
if (compaction_made_progress(compact_result))
(*compaction_retries)++;
--
2.26.3
Matthew,
On Thu 2021-05-20 12:56 +0100, Matthew Wilcox wrote:
> While this is true, it's not really answering Andrew's question.
> What we want as part of the commit message is something like:
>
> "A customer experienced a low memory situation and sent their task a
> fatal signal. Instead of dying promptly, it looped in the page
> allocator failing to make progress because ..."
Apologies, I did not understand - I will follow up with a v4.
Kind regards,
--
Aaron Tomlin
On 5/20/21 4:29 PM, Aaron Tomlin wrote:
> A customer experienced a low-memory situation and decided to issue a
> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
> would expect, the aforementioned task remained unresponsive.
>
> Further investigation indicated that the task was "stuck" in the
> reclaim/compaction retry loop. Now, it does not make sense to retry
> compaction when a fatal signal is pending.
>
> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
> returned; albeit, not every zone, on the zone list, would be considered
> in the case a fatal signal is found to be pending.
> Yet, in should_compact_retry(), given the last known compaction result,
> each zone, on the zone list, can be considered/or checked
> (see compaction_zonelist_suitable()). For example, if a zone was found
> to succeed, then reclaim/compaction would be tried again
> (notwithstanding the above).
>
> This patch ensures that compaction is not needlessly retried
> irrespective of the last known compaction result e.g. if it was skipped,
> in the unlikely case a fatal signal is found pending.
> So, OOM is at least attempted.
>
> Signed-off-by: Aaron Tomlin <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
> ---
> mm/page_alloc.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index aaa1655cf682..b317057ac186 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
> if (!order)
> return false;
>
> + if (fatal_signal_pending(current))
> + return false;
> +
> if (compaction_made_progress(compact_result))
> (*compaction_retries)++;
>
>
On Thu 20-05-21 15:29:01, Aaron Tomlin wrote:
> A customer experienced a low-memory situation and decided to issue a
> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
> would expect, the aforementioned task remained unresponsive.
>
> Further investigation indicated that the task was "stuck" in the
> reclaim/compaction retry loop. Now, it does not make sense to retry
> compaction when a fatal signal is pending.
Is this really true in general? The memory reclaim is retried even when
fatal signals are pending. Why should be compaction different? I do
agree that retrying way too much is bad but is there any reason why this
special case doesn't follow the max retry logic?
--
Michal Hocko
SUSE Labs
On 5/31/21 1:33 PM, Michal Hocko wrote:
> On Thu 20-05-21 15:29:01, Aaron Tomlin wrote:
>> A customer experienced a low-memory situation and decided to issue a
>> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
>> would expect, the aforementioned task remained unresponsive.
>>
>> Further investigation indicated that the task was "stuck" in the
>> reclaim/compaction retry loop. Now, it does not make sense to retry
>> compaction when a fatal signal is pending.
>
> Is this really true in general? The memory reclaim is retried even when
> fatal signals are pending. Why should be compaction different? I do
> agree that retrying way too much is bad but is there any reason why this
> special case doesn't follow the max retry logic?
Compaction doesn't do anything if fatal signal is pending, it bails out
immediately and the checks are rather frequent. So why retry?
On Mon 31-05-21 13:35:31, Vlastimil Babka wrote:
> On 5/31/21 1:33 PM, Michal Hocko wrote:
> > On Thu 20-05-21 15:29:01, Aaron Tomlin wrote:
> >> A customer experienced a low-memory situation and decided to issue a
> >> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
> >> would expect, the aforementioned task remained unresponsive.
> >>
> >> Further investigation indicated that the task was "stuck" in the
> >> reclaim/compaction retry loop. Now, it does not make sense to retry
> >> compaction when a fatal signal is pending.
> >
> > Is this really true in general? The memory reclaim is retried even when
> > fatal signals are pending. Why should be compaction different? I do
> > agree that retrying way too much is bad but is there any reason why this
> > special case doesn't follow the max retry logic?
>
> Compaction doesn't do anything if fatal signal is pending, it bails out
> immediately and the checks are rather frequent. So why retry?
OK, I was not aware of that and it would be helpful to have that
mentioned in the changelog.
--
Michal Hocko
SUSE Labs