2021-05-19 20:24:14

by Aaron Tomlin

[permalink] [raw]
Subject: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

It does not make sense to retry compaction when a fatal signal is
pending.

In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
returned; albeit, not every zone, on the zone list, would be considered
in the case a fatal signal is found to be pending.
Yet, in should_compact_retry(), given the last known compaction result,
each zone, on the zone list, can be considered/or checked
(see compaction_zonelist_suitable()). For example, if a zone was found
to succeed, then reclaim/compaction would be tried again
(notwithstanding the above).

This patch ensures that compaction is not needlessly retried
irrespective of the last known compaction result e.g. if it was skipped,
in the unlikely case a fatal signal is found pending.
So, OOM is at least attempted.

Signed-off-by: Aaron Tomlin <[email protected]>
---
mm/page_alloc.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aaa1655cf682..b317057ac186 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
if (!order)
return false;

+ if (fatal_signal_pending(current))
+ return false;
+
if (compaction_made_progress(compact_result))
(*compaction_retries)++;

--
2.26.3



2021-05-20 04:39:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <[email protected]> wrote:

> It does not make sense to retry compaction when a fatal signal is
> pending.

Well, it might make sense. Presumably it is beneficial to other tasks.

> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
> returned; albeit, not every zone, on the zone list, would be considered
> in the case a fatal signal is found to be pending.
> Yet, in should_compact_retry(), given the last known compaction result,
> each zone, on the zone list, can be considered/or checked
> (see compaction_zonelist_suitable()). For example, if a zone was found
> to succeed, then reclaim/compaction would be tried again
> (notwithstanding the above).
>
> This patch ensures that compaction is not needlessly retried
> irrespective of the last known compaction result e.g. if it was skipped,
> in the unlikely case a fatal signal is found pending.
> So, OOM is at least attempted.

What observed problems motivated this change?

What were the observed runtime effects of this change?

2021-05-20 10:36:28

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

On 5/20/21 6:34 AM, Andrew Morton wrote:
> On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <[email protected]> wrote:
>
>> It does not make sense to retry compaction when a fatal signal is
>> pending.
>
> Well, it might make sense. Presumably it is beneficial to other tasks.

Yeah but the compaction won't happen. compact_zone() will immediately detect it
via __compact_finished() and bail out. So in that sense it does not make sense
to retry :)

>> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
>> returned; albeit, not every zone, on the zone list, would be considered
>> in the case a fatal signal is found to be pending.
>> Yet, in should_compact_retry(), given the last known compaction result,
>> each zone, on the zone list, can be considered/or checked
>> (see compaction_zonelist_suitable()). For example, if a zone was found
>> to succeed, then reclaim/compaction would be tried again
>> (notwithstanding the above).
>>
>> This patch ensures that compaction is not needlessly retried
>> irrespective of the last known compaction result e.g. if it was skipped,
>> in the unlikely case a fatal signal is found pending.
>> So, OOM is at least attempted.
>
> What observed problems motivated this change?
>
> What were the observed runtime effects of this change?

Yep those details from the previous thread should be included here.

2021-05-20 11:54:20

by Aaron Tomlin

[permalink] [raw]
Subject: Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote:
> On 5/20/21 6:34 AM, Andrew Morton wrote:
> >
> > What observed problems motivated this change?
> >
> > What were the observed runtime effects of this change?
>
> Yep those details from the previous thread should be included here.

Fair enough.

During kernel crash dump/or vmcore analysis: I discovered in the context of
__alloc_pages_slowpath() the value stored in the no_progress_loops variable
was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal
signal was pending against current.


#6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24
#7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4
#8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130

// w28 = *(sp + 148) /* no_progress_loops */
0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>: ldr w0, [sp,#148]
// w0 = w0 + 0x1
0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>: add w0, w0, #0x1
// *(sp + 148) = w0
0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>: str w0, [sp,#148]
// if (w0 >= 0x10)
// goto __alloc_pages_nodemask+0x904
0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>: cmp w0, #0x10
0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>: b.gt 0xffff0000102cd534

- The stack pointer was 0xffff00002e78f900

crash> p *(int *)(0xffff00002e78f900+148)
$1 = 31611688

crash> ps 521171
PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 521171 1 36 ffff8080e2128800 RU 0.0 34789440 18624 special

crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending
$2 = (struct sigpending *) 0xffff80809a416e40

crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0]
$3 = 0x804100

crash> sig -s 0x804100
SIGKILL SIGTERM SIGXCPU

crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1)
$4 = 0x100


Unfortunately, this incident was not reproduced, to date.





Kind regards,

--
Aaron Tomlin


Attachments:
(No filename) (2.46 kB)
signature.asc (849.00 B)
Download all attachments

2021-05-20 12:24:09

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

On Wed, May 19, 2021 at 09:34:55PM -0700, Andrew Morton wrote:
> On Wed, 19 May 2021 21:17:43 +0100 Aaron Tomlin <[email protected]> wrote:
>
> > It does not make sense to retry compaction when a fatal signal is
> > pending.
>
> Well, it might make sense. Presumably it is beneficial to other tasks.

Apart from Vlastimil's point, if I hit ^C, I want the task to die,
as soon as possible. I don't want it to do things which are beneficial
to other tasks, I want my shell prompt back, not spending seconds
trying to compact memory. Some other task can do that if _it_ needs
large contiguous chunks.

2021-05-20 12:42:15

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

On Thu, May 20, 2021 at 12:42:57PM +0100, Aaron Tomlin wrote:
> On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote:
> > On 5/20/21 6:34 AM, Andrew Morton wrote:
> > >
> > > What observed problems motivated this change?
> > >
> > > What were the observed runtime effects of this change?
> >
> > Yep those details from the previous thread should be included here.
>
> Fair enough.
>
> During kernel crash dump/or vmcore analysis: I discovered in the context of
> __alloc_pages_slowpath() the value stored in the no_progress_loops variable
> was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal
> signal was pending against current.

While this is true, it's not really answering Andrew's question.
What we want as part of the commit message is something like:

"A customer experienced a low memory situation and sent their task a
fatal signal. Instead of dying promptly, it looped in the page
allocator failing to make progress because ..."

>
> #6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24
> #7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4
> #8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130
>
> // w28 = *(sp + 148) /* no_progress_loops */
> 0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>: ldr w0, [sp,#148]
> // w0 = w0 + 0x1
> 0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>: add w0, w0, #0x1
> // *(sp + 148) = w0
> 0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>: str w0, [sp,#148]
> // if (w0 >= 0x10)
> // goto __alloc_pages_nodemask+0x904
> 0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>: cmp w0, #0x10
> 0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>: b.gt 0xffff0000102cd534
>
> - The stack pointer was 0xffff00002e78f900
>
> crash> p *(int *)(0xffff00002e78f900+148)
> $1 = 31611688
>
> crash> ps 521171
> PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > 521171 1 36 ffff8080e2128800 RU 0.0 34789440 18624 special
>
> crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending
> $2 = (struct sigpending *) 0xffff80809a416e40
>
> crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0]
> $3 = 0x804100
>
> crash> sig -s 0x804100
> SIGKILL SIGTERM SIGXCPU
>
> crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1)
> $4 = 0x100
>
>
> Unfortunately, this incident was not reproduced, to date.
>
>
>
>
>
> Kind regards,
>
> --
> Aaron Tomlin


2021-05-20 14:29:56

by Aaron Tomlin

[permalink] [raw]
Subject: [PATCH v4] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

A customer experienced a low-memory situation and decided to issue a
SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
would expect, the aforementioned task remained unresponsive.

Further investigation indicated that the task was "stuck" in the
reclaim/compaction retry loop. Now, it does not make sense to retry
compaction when a fatal signal is pending.

In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
returned; albeit, not every zone, on the zone list, would be considered
in the case a fatal signal is found to be pending.
Yet, in should_compact_retry(), given the last known compaction result,
each zone, on the zone list, can be considered/or checked
(see compaction_zonelist_suitable()). For example, if a zone was found
to succeed, then reclaim/compaction would be tried again
(notwithstanding the above).

This patch ensures that compaction is not needlessly retried
irrespective of the last known compaction result e.g. if it was skipped,
in the unlikely case a fatal signal is found pending.
So, OOM is at least attempted.

Signed-off-by: Aaron Tomlin <[email protected]>
---
mm/page_alloc.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aaa1655cf682..b317057ac186 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
if (!order)
return false;

+ if (fatal_signal_pending(current))
+ return false;
+
if (compaction_made_progress(compact_result))
(*compaction_retries)++;

--
2.26.3

2021-05-21 05:53:35

by Aaron Tomlin

[permalink] [raw]
Subject: Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

Matthew,

On Thu 2021-05-20 12:56 +0100, Matthew Wilcox wrote:
> While this is true, it's not really answering Andrew's question.
> What we want as part of the commit message is something like:
>
> "A customer experienced a low memory situation and sent their task a
> fatal signal. Instead of dying promptly, it looped in the page
> allocator failing to make progress because ..."

Apologies, I did not understand - I will follow up with a v4.


Kind regards,

--
Aaron Tomlin

2021-05-28 12:59:11

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

On 5/20/21 4:29 PM, Aaron Tomlin wrote:
> A customer experienced a low-memory situation and decided to issue a
> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
> would expect, the aforementioned task remained unresponsive.
>
> Further investigation indicated that the task was "stuck" in the
> reclaim/compaction retry loop. Now, it does not make sense to retry
> compaction when a fatal signal is pending.
>
> In the context of try_to_compact_pages(), indeed COMPACT_SKIPPED can be
> returned; albeit, not every zone, on the zone list, would be considered
> in the case a fatal signal is found to be pending.
> Yet, in should_compact_retry(), given the last known compaction result,
> each zone, on the zone list, can be considered/or checked
> (see compaction_zonelist_suitable()). For example, if a zone was found
> to succeed, then reclaim/compaction would be tried again
> (notwithstanding the above).
>
> This patch ensures that compaction is not needlessly retried
> irrespective of the last known compaction result e.g. if it was skipped,
> in the unlikely case a fatal signal is found pending.
> So, OOM is at least attempted.
>
> Signed-off-by: Aaron Tomlin <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

> ---
> mm/page_alloc.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index aaa1655cf682..b317057ac186 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4252,6 +4252,9 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
> if (!order)
> return false;
>
> + if (fatal_signal_pending(current))
> + return false;
> +
> if (compaction_made_progress(compact_result))
> (*compaction_retries)++;
>
>

2021-05-31 11:34:27

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v4] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

On Thu 20-05-21 15:29:01, Aaron Tomlin wrote:
> A customer experienced a low-memory situation and decided to issue a
> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
> would expect, the aforementioned task remained unresponsive.
>
> Further investigation indicated that the task was "stuck" in the
> reclaim/compaction retry loop. Now, it does not make sense to retry
> compaction when a fatal signal is pending.

Is this really true in general? The memory reclaim is retried even when
fatal signals are pending. Why should be compaction different? I do
agree that retrying way too much is bad but is there any reason why this
special case doesn't follow the max retry logic?
--
Michal Hocko
SUSE Labs

2021-05-31 11:37:56

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v4] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

On 5/31/21 1:33 PM, Michal Hocko wrote:
> On Thu 20-05-21 15:29:01, Aaron Tomlin wrote:
>> A customer experienced a low-memory situation and decided to issue a
>> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
>> would expect, the aforementioned task remained unresponsive.
>>
>> Further investigation indicated that the task was "stuck" in the
>> reclaim/compaction retry loop. Now, it does not make sense to retry
>> compaction when a fatal signal is pending.
>
> Is this really true in general? The memory reclaim is retried even when
> fatal signals are pending. Why should be compaction different? I do
> agree that retrying way too much is bad but is there any reason why this
> special case doesn't follow the max retry logic?

Compaction doesn't do anything if fatal signal is pending, it bails out
immediately and the checks are rather frequent. So why retry?

2021-05-31 13:31:05

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v4] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

On Mon 31-05-21 13:35:31, Vlastimil Babka wrote:
> On 5/31/21 1:33 PM, Michal Hocko wrote:
> > On Thu 20-05-21 15:29:01, Aaron Tomlin wrote:
> >> A customer experienced a low-memory situation and decided to issue a
> >> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
> >> would expect, the aforementioned task remained unresponsive.
> >>
> >> Further investigation indicated that the task was "stuck" in the
> >> reclaim/compaction retry loop. Now, it does not make sense to retry
> >> compaction when a fatal signal is pending.
> >
> > Is this really true in general? The memory reclaim is retried even when
> > fatal signals are pending. Why should be compaction different? I do
> > agree that retrying way too much is bad but is there any reason why this
> > special case doesn't follow the max retry logic?
>
> Compaction doesn't do anything if fatal signal is pending, it bails out
> immediately and the checks are rather frequent. So why retry?

OK, I was not aware of that and it would be helpful to have that
mentioned in the changelog.

--
Michal Hocko
SUSE Labs