LinuxLists.cc - [PATCH v1] mm/memory_hotplug: document the signal_pending() check in offline

2023-07-11 17:51:06

Subject: [PATCH v1] mm/memory_hotplug: document the signal_pending() check in offline_pages()

Let's update the documentation that any signal is sufficient, and
add a comment that not only checking for fatal signals is historical
baggage: changing it now could break existing user space. although
unlikely.

For example, when an app provides a custom SIGALRM handler and triggers
memory offlining, the timeout cmd would no longer stop memory offlining,
because SIGALRM would no longer be considered a fatal signal.

Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
Documentation/admin-guide/mm/memory-hotplug.rst | 2 +-
mm/memory_hotplug.c | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index 1b02fe5807cc..bd77841041af 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -669,7 +669,7 @@ when still encountering permanently unmovable pages within ZONE_MOVABLE
(-> BUG), memory offlining will keep retrying until it eventually succeeds.

When offlining is triggered from user space, the offlining context can be
-terminated by sending a fatal signal. A timeout based offlining can easily be
+terminated by sending a signal. A timeout based offlining can easily be
implemented via::

% timeout $TIMEOUT offline_block | failure_handling
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3f231cf1b410..7cfd13c91568 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1843,6 +1843,11 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
do {
pfn = start_pfn;
do {
+ /*
+ * Historically we always checked for any signal and
+ * can't limit it to fatal signals without eventually
+ * breaking user space.
+ */
if (signal_pending(current)) {
ret = -EINTR;
reason = "signal backoff";
--
2.41.0

2023-07-11 20:59:08

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH v1] mm/memory_hotplug: document the signal_pending() check in offline_pages()

On Tue 11-07-23 19:40:50, David Hildenbrand wrote:
> Let's update the documentation that any signal is sufficient, and
> add a comment that not only checking for fatal signals is historical
> baggage: changing it now could break existing user space. although
> unlikely.
>
> For example, when an app provides a custom SIGALRM handler and triggers
> memory offlining, the timeout cmd would no longer stop memory offlining,
> because SIGALRM would no longer be considered a fatal signal.

Yes, and it is likely goot to mention here that this is an antipattern
for many other kernel operations like IO (e.g. write) but it is a long
term behavior that somebody might depend on and it is safer to reflect
the documentation to the realitity rather than other way around (which
would be imho better).

> Cc: Michal Hocko <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> Documentation/admin-guide/mm/memory-hotplug.rst | 2 +-
> mm/memory_hotplug.c | 5 +++++
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
> index 1b02fe5807cc..bd77841041af 100644
> --- a/Documentation/admin-guide/mm/memory-hotplug.rst
> +++ b/Documentation/admin-guide/mm/memory-hotplug.rst
> @@ -669,7 +669,7 @@ when still encountering permanently unmovable pages within ZONE_MOVABLE
> (-> BUG), memory offlining will keep retrying until it eventually succeeds.
>
> When offlining is triggered from user space, the offlining context can be
> -terminated by sending a fatal signal. A timeout based offlining can easily be
> +terminated by sending a signal. A timeout based offlining can easily be
> implemented via::
>
> % timeout $TIMEOUT offline_block | failure_handling
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3f231cf1b410..7cfd13c91568 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1843,6 +1843,11 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> do {
> pfn = start_pfn;
> do {
> + /*
> + * Historically we always checked for any signal and
> + * can't limit it to fatal signals without eventually
> + * breaking user space.
> + */
> if (signal_pending(current)) {
> ret = -EINTR;
> reason = "signal backoff";
> --
> 2.41.0

--
Michal Hocko
SUSE Labs

2023-07-12 07:12:48

by Anshuman Khandual

[permalink] [raw]

Subject: Re: [PATCH v1] mm/memory_hotplug: document the signal_pending() check in offline_pages()

On 7/11/23 23:10, David Hildenbrand wrote:
> Let's update the documentation that any signal is sufficient, and
> add a comment that not only checking for fatal signals is historical
> baggage: changing it now could break existing user space. although
> unlikely.
>
> For example, when an app provides a custom SIGALRM handler and triggers
> memory offlining, the timeout cmd would no longer stop memory offlining,
> because SIGALRM would no longer be considered a fatal signal.
>
> Cc: Michal Hocko <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> Documentation/admin-guide/mm/memory-hotplug.rst | 2 +-
> mm/memory_hotplug.c | 5 +++++
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
> index 1b02fe5807cc..bd77841041af 100644
> --- a/Documentation/admin-guide/mm/memory-hotplug.rst
> +++ b/Documentation/admin-guide/mm/memory-hotplug.rst
> @@ -669,7 +669,7 @@ when still encountering permanently unmovable pages within ZONE_MOVABLE
> (-> BUG), memory offlining will keep retrying until it eventually succeeds.
>
> When offlining is triggered from user space, the offlining context can be
> -terminated by sending a fatal signal. A timeout based offlining can easily be
> +terminated by sending a signal. A timeout based offlining can easily be
> implemented via::
>
> % timeout $TIMEOUT offline_block | failure_handling
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 3f231cf1b410..7cfd13c91568 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1843,6 +1843,11 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> do {
> pfn = start_pfn;
> do {
> + /*
> + * Historically we always checked for any signal and
> + * can't limit it to fatal signals without eventually
> + * breaking user space.> + */

Just curious, could 'signal type' to stop memory offline process be considered
an ABI and cannot be changed in kernel ever if required ? Just wondering if an
additional '!fatal_signal_pending()' check be introduced to warn about support
being deprecated, before finally replacing it with fatal_signal_pending().

> if (signal_pending(current)) {
> ret = -EINTR;
> reason = "signal backoff";

2023-07-12 19:43:10

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH v1] mm/memory_hotplug: document the signal_pending() check in offline_pages()

On 11.07.23 22:47, Michal Hocko wrote:
> On Tue 11-07-23 19:40:50, David Hildenbrand wrote:
>> Let's update the documentation that any signal is sufficient, and
>> add a comment that not only checking for fatal signals is historical
>> baggage: changing it now could break existing user space. although
>> unlikely.
>>
>> For example, when an app provides a custom SIGALRM handler and triggers
>> memory offlining, the timeout cmd would no longer stop memory offlining,
>> because SIGALRM would no longer be considered a fatal signal.
>
> Yes, and it is likely goot to mention here that this is an antipattern
> for many other kernel operations like IO (e.g. write) but it is a long
> term behavior that somebody might depend on and it is safer to reflect
> the documentation to the realitity rather than other way around (which
> would be imho better).
>

You mean adding something like

"Note that using signal_pending() instead of fatal_signal_pending() is
an anti-pattern, but slowly deprecating that behavior to eventually
change it in the far future is probably not worth the effort. If this
ever becomes relevant for user-space, we might want to rethink."

Thanks!

--
Cheers,

David / dhildenb

2023-07-12 19:50:50

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH v1] mm/memory_hotplug: document the signal_pending() check in offline_pages()

On 12.07.23 08:47, Anshuman Khandual wrote:
>
>
> On 7/11/23 23:10, David Hildenbrand wrote:
>> Let's update the documentation that any signal is sufficient, and
>> add a comment that not only checking for fatal signals is historical
>> baggage: changing it now could break existing user space. although
>> unlikely.
>>
>> For example, when an app provides a custom SIGALRM handler and triggers
>> memory offlining, the timeout cmd would no longer stop memory offlining,
>> because SIGALRM would no longer be considered a fatal signal.
>>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Oscar Salvador <[email protected]>
>> Cc: Jonathan Corbet <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> Documentation/admin-guide/mm/memory-hotplug.rst | 2 +-
>> mm/memory_hotplug.c | 5 +++++
>> 2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
>> index 1b02fe5807cc..bd77841041af 100644
>> --- a/Documentation/admin-guide/mm/memory-hotplug.rst
>> +++ b/Documentation/admin-guide/mm/memory-hotplug.rst
>> @@ -669,7 +669,7 @@ when still encountering permanently unmovable pages within ZONE_MOVABLE
>> (-> BUG), memory offlining will keep retrying until it eventually succeeds.
>>
>> When offlining is triggered from user space, the offlining context can be
>> -terminated by sending a fatal signal. A timeout based offlining can easily be
>> +terminated by sending a signal. A timeout based offlining can easily be
>> implemented via::
>>
>> % timeout $TIMEOUT offline_block | failure_handling
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 3f231cf1b410..7cfd13c91568 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1843,6 +1843,11 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>> do {
>> pfn = start_pfn;
>> do {
>> + /*
>> + * Historically we always checked for any signal and
>> + * can't limit it to fatal signals without eventually
>> + * breaking user space.> + */
>
> Just curious, could 'signal type' to stop memory offline process be considered
> an ABI and cannot be changed in kernel ever if required ? Just wondering if an
> additional '!fatal_signal_pending()' check be introduced to warn about support
> being deprecated, before finally replacing it with fatal_signal_pending().

See my reply to Michal, while that would be doable it is probably not
worth the effort, and we'd still have to stick with the existing
handling for quite a while.

Thanks!

--
Cheers,

David / dhildenb

2023-07-13 09:01:13

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH v1] mm/memory_hotplug: document the signal_pending() check in offline_pages()

On Wed 12-07-23 21:09:25, David Hildenbrand wrote:
> On 11.07.23 22:47, Michal Hocko wrote:
> > On Tue 11-07-23 19:40:50, David Hildenbrand wrote:
> > > Let's update the documentation that any signal is sufficient, and
> > > add a comment that not only checking for fatal signals is historical
> > > baggage: changing it now could break existing user space. although
> > > unlikely.
> > >
> > > For example, when an app provides a custom SIGALRM handler and triggers
> > > memory offlining, the timeout cmd would no longer stop memory offlining,
> > > because SIGALRM would no longer be considered a fatal signal.
> >
> > Yes, and it is likely goot to mention here that this is an antipattern
> > for many other kernel operations like IO (e.g. write) but it is a long
> > term behavior that somebody might depend on and it is safer to reflect
> > the documentation to the realitity rather than other way around (which
> > would be imho better).
> >
>
> You mean adding something like
>
> "Note that using signal_pending() instead of fatal_signal_pending() is an
> anti-pattern, but slowly deprecating that behavior to eventually change it
> in the far future is probably not worth the effort. If this ever becomes
> relevant for user-space, we might want to rethink."

Yes, something like that. Thanks!

--
Michal Hocko
SUSE Labs