2020-01-22 23:40:22

by Yang Shi

[permalink] [raw]
Subject: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
the semantic of move_pages() was changed to return the number of
non-migrated pages (failed to migration) and the call would be aborted
immediately if migrate_pages() returns positive value. But it didn't
report the number of pages that we even haven't attempted to migrate.
So, fix it by including non-attempted pages in the return value.

Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
Suggested-by: Michal Hocko <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: <[email protected]> [4.17+]
Signed-off-by: Yang Shi <[email protected]>
---
v2: Rebased on top of the latest mainline kernel per Andrew

mm/migrate.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 86873b6..9b8eb5d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1627,8 +1627,18 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
start = i;
} else if (node != current_node) {
err = do_move_pages_to_node(mm, &pagelist, current_node);
- if (err)
+ if (err) {
+ /*
+ * Positive err means the number of failed
+ * pages to migrate. Since we are going to
+ * abort and return the number of non-migrated
+ * pages, so need incude the rest of the
+ * nr_pages that have not attempted as well.
+ */
+ if (err > 0)
+ err += nr_pages - i - 1;
goto out;
+ }
err = store_status(status, start, current_node, i - start);
if (err)
goto out;
@@ -1659,8 +1669,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
goto out_flush;

err = do_move_pages_to_node(mm, &pagelist, current_node);
- if (err)
+ if (err) {
+ if (err > 0)
+ err += nr_pages - i - 1;
goto out;
+ }
if (i > start) {
err = store_status(status, start, current_node, i - start);
if (err)
@@ -1674,6 +1687,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,

/* Make sure we do not overwrite the existing error */
err1 = do_move_pages_to_node(mm, &pagelist, current_node);
+ /*
+ * Don't have to report non-attempted pages here since:
+ * - If the above loop is done gracefully there is not non-attempted
+ * page.
+ * - If the above loop is aborted to it means more fatal error
+ * happened, should return err.
+ */
if (!err1)
err1 = store_status(status, start, current_node, i - start);
if (!err)
--
1.8.3.1


2020-01-23 03:28:39

by Wei Yang

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
>Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>the semantic of move_pages() was changed to return the number of
>non-migrated pages (failed to migration) and the call would be aborted
>immediately if migrate_pages() returns positive value. But it didn't
>report the number of pages that we even haven't attempted to migrate.
>So, fix it by including non-attempted pages in the return value.
>

First, we want to change the semantic of move_pages(2). The return value
indicates the number of pages we didn't managed to migrate?

Second, the return value from migrate_pages() doesn't mean the number of pages
we failed to migrate. For example, one -ENOMEM is returned on the first page,
migrate_pages() would return 1. But actually, no page successfully migrated.

Third, even the migrate_pages() return the exact non-migrate page, we are not
sure those non-migrated pages are at the tail of the list. Because in the last
case in migrate_pages(), it just remove the page from list. It could be a page
in the middle of the list. Then, in userspace, how the return value be
leveraged to determine the valid status? Any page in the list could be the
victim.

Sounds we need to think about this carefully.

>Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
>Suggested-by: Michal Hocko <[email protected]>
>Cc: Wei Yang <[email protected]>
>Cc: <[email protected]> [4.17+]
>Signed-off-by: Yang Shi <[email protected]>
>---
>v2: Rebased on top of the latest mainline kernel per Andrew
>
> mm/migrate.c | 24 ++++++++++++++++++++++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
>diff --git a/mm/migrate.c b/mm/migrate.c
>index 86873b6..9b8eb5d 100644
>--- a/mm/migrate.c
>+++ b/mm/migrate.c
>@@ -1627,8 +1627,18 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> start = i;
> } else if (node != current_node) {
> err = do_move_pages_to_node(mm, &pagelist, current_node);
>- if (err)
>+ if (err) {
>+ /*
>+ * Positive err means the number of failed
>+ * pages to migrate. Since we are going to
>+ * abort and return the number of non-migrated
>+ * pages, so need incude the rest of the
>+ * nr_pages that have not attempted as well.
>+ */
>+ if (err > 0)
>+ err += nr_pages - i - 1;
> goto out;
>+ }
> err = store_status(status, start, current_node, i - start);
> if (err)
> goto out;
>@@ -1659,8 +1669,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> goto out_flush;
>
> err = do_move_pages_to_node(mm, &pagelist, current_node);
>- if (err)
>+ if (err) {
>+ if (err > 0)
>+ err += nr_pages - i - 1;
> goto out;
>+ }
> if (i > start) {
> err = store_status(status, start, current_node, i - start);
> if (err)
>@@ -1674,6 +1687,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>
> /* Make sure we do not overwrite the existing error */
> err1 = do_move_pages_to_node(mm, &pagelist, current_node);
>+ /*
>+ * Don't have to report non-attempted pages here since:
>+ * - If the above loop is done gracefully there is not non-attempted
>+ * page.
>+ * - If the above loop is aborted to it means more fatal error
>+ * happened, should return err.
>+ */
> if (!err1)
> err1 = store_status(status, start, current_node, i - start);
> if (!err)
>--
>1.8.3.1

--
Wei Yang
Help you, Help me

2020-01-23 03:58:37

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages



On 1/22/20 7:27 PM, Wei Yang wrote:
> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
>> Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>> the semantic of move_pages() was changed to return the number of
>> non-migrated pages (failed to migration) and the call would be aborted
>> immediately if migrate_pages() returns positive value. But it didn't
>> report the number of pages that we even haven't attempted to migrate.
>> So, fix it by including non-attempted pages in the return value.
>>
> First, we want to change the semantic of move_pages(2). The return value
> indicates the number of pages we didn't managed to migrate?

This is my understanding.

>
> Second, the return value from migrate_pages() doesn't mean the number of pages
> we failed to migrate. For example, one -ENOMEM is returned on the first page,
> migrate_pages() would return 1. But actually, no page successfully migrated.

This would not happen at all since migrate_pages() would just return
-ENOMEM instead of a positive value, right?

>
> Third, even the migrate_pages() return the exact non-migrate page, we are not
> sure those non-migrated pages are at the tail of the list. Because in the last
> case in migrate_pages(), it just remove the page from list. It could be a page
> in the middle of the list. Then, in userspace, how the return value be
> leveraged to determine the valid status? Any page in the list could be the
> victim.

I think this problem has been discussed in another thread. Yes, the
status may have non-valid value, but it is supposed to have valid value
iff move_pages() return 0. Positive value is an error case, so the
validity of status is not guaranteed.

>
> Sounds we need to think about this carefully.
>
>> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
>> Suggested-by: Michal Hocko <[email protected]>
>> Cc: Wei Yang <[email protected]>
>> Cc: <[email protected]> [4.17+]
>> Signed-off-by: Yang Shi <[email protected]>
>> ---
>> v2: Rebased on top of the latest mainline kernel per Andrew
>>
>> mm/migrate.c | 24 ++++++++++++++++++++++--
>> 1 file changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 86873b6..9b8eb5d 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1627,8 +1627,18 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> start = i;
>> } else if (node != current_node) {
>> err = do_move_pages_to_node(mm, &pagelist, current_node);
>> - if (err)
>> + if (err) {
>> + /*
>> + * Positive err means the number of failed
>> + * pages to migrate. Since we are going to
>> + * abort and return the number of non-migrated
>> + * pages, so need incude the rest of the
>> + * nr_pages that have not attempted as well.
>> + */
>> + if (err > 0)
>> + err += nr_pages - i - 1;
>> goto out;
>> + }
>> err = store_status(status, start, current_node, i - start);
>> if (err)
>> goto out;
>> @@ -1659,8 +1669,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> goto out_flush;
>>
>> err = do_move_pages_to_node(mm, &pagelist, current_node);
>> - if (err)
>> + if (err) {
>> + if (err > 0)
>> + err += nr_pages - i - 1;
>> goto out;
>> + }
>> if (i > start) {
>> err = store_status(status, start, current_node, i - start);
>> if (err)
>> @@ -1674,6 +1687,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>>
>> /* Make sure we do not overwrite the existing error */
>> err1 = do_move_pages_to_node(mm, &pagelist, current_node);
>> + /*
>> + * Don't have to report non-attempted pages here since:
>> + * - If the above loop is done gracefully there is not non-attempted
>> + * page.
>> + * - If the above loop is aborted to it means more fatal error
>> + * happened, should return err.
>> + */
>> if (!err1)
>> err1 = store_status(status, start, current_node, i - start);
>> if (!err)
>> --
>> 1.8.3.1

2020-01-23 08:57:37

by Michal Hocko

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Thu 23-01-20 11:27:36, Wei Yang wrote:
> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
> >Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
> >the semantic of move_pages() was changed to return the number of
> >non-migrated pages (failed to migration) and the call would be aborted
> >immediately if migrate_pages() returns positive value. But it didn't
> >report the number of pages that we even haven't attempted to migrate.
> >So, fix it by including non-attempted pages in the return value.
> >
>
> First, we want to change the semantic of move_pages(2). The return value
> indicates the number of pages we didn't managed to migrate?
>
> Second, the return value from migrate_pages() doesn't mean the number of pages
> we failed to migrate. For example, one -ENOMEM is returned on the first page,
> migrate_pages() would return 1. But actually, no page successfully migrated.

ENOMEM is considered a permanent failure and as such it is returned by
migrate pages (see goto out).

> Third, even the migrate_pages() return the exact non-migrate page, we are not
> sure those non-migrated pages are at the tail of the list. Because in the last
> case in migrate_pages(), it just remove the page from list. It could be a page
> in the middle of the list. Then, in userspace, how the return value be
> leveraged to determine the valid status? Any page in the list could be the
> victim.

Yes, I was wrong when stating that the caller would know better which
status to check. I misremembered the original patch as it was quite some
time ago. While storing the error code would be possible after some
massaging of migrate_pages is this really something we deeply care
about. The caller can achieve the same by initializing the status array
to a non-node number - e.g. -1 - and check based on that.

This system call has quite a complex semantic and I am not 100% sure
what is the right thing to do here. Maybe we do want to continue and try
to migrate as much as possible on non-fatal migration failures and
accumulate the number of failed pages while doing so.

The main problem is that we can have an academic discussion but
the primary question is what do actual users want. A lack of real
bug reports suggests that nobody has actually noticed this. So I
would rather keep returning the correct number of non-migrated
pages. Why? Because new users could have started depending on it. It
is not all that unlikely that the current implementation would just
work for them because they are migrating a set of pages on to the same
node so the batch would be a single list throughout the whole given
page set.
--
Michal Hocko
SUSE Labs

2020-01-23 22:58:49

by Wei Yang

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Thu, Jan 23, 2020 at 09:55:26AM +0100, Michal Hocko wrote:
>On Thu 23-01-20 11:27:36, Wei Yang wrote:
>> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
>> >Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>> >the semantic of move_pages() was changed to return the number of
>> >non-migrated pages (failed to migration) and the call would be aborted
>> >immediately if migrate_pages() returns positive value. But it didn't
>> >report the number of pages that we even haven't attempted to migrate.
>> >So, fix it by including non-attempted pages in the return value.
>> >
>>
>> First, we want to change the semantic of move_pages(2). The return value
>> indicates the number of pages we didn't managed to migrate?
>>
>> Second, the return value from migrate_pages() doesn't mean the number of pages
>> we failed to migrate. For example, one -ENOMEM is returned on the first page,
>> migrate_pages() would return 1. But actually, no page successfully migrated.
>
>ENOMEM is considered a permanent failure and as such it is returned by
>migrate pages (see goto out).
>
>> Third, even the migrate_pages() return the exact non-migrate page, we are not
>> sure those non-migrated pages are at the tail of the list. Because in the last
>> case in migrate_pages(), it just remove the page from list. It could be a page
>> in the middle of the list. Then, in userspace, how the return value be
>> leveraged to determine the valid status? Any page in the list could be the
>> victim.
>
>Yes, I was wrong when stating that the caller would know better which
>status to check. I misremembered the original patch as it was quite some
>time ago. While storing the error code would be possible after some
>massaging of migrate_pages is this really something we deeply care
>about. The caller can achieve the same by initializing the status array
>to a non-node number - e.g. -1 - and check based on that.
>

So for a user, the best practice is to initialize the status array to -1 and
check each status to see whether the page is migrated successfully?

Then do we need to return the number of non-migrated page? What benefit could
user get from the number. How about just return an error code to indicate the
failure? I may miss some point, would you mind giving me a hint?

>This system call has quite a complex semantic and I am not 100% sure
>what is the right thing to do here. Maybe we do want to continue and try
>to migrate as much as possible on non-fatal migration failures and
>accumulate the number of failed pages while doing so.
>
>The main problem is that we can have an academic discussion but
>the primary question is what do actual users want. A lack of real
>bug reports suggests that nobody has actually noticed this. So I
>would rather keep returning the correct number of non-migrated
>pages. Why? Because new users could have started depending on it. It
>is not all that unlikely that the current implementation would just
>work for them because they are migrating a set of pages on to the same
>node so the batch would be a single list throughout the whole given
>page set.
>--
>Michal Hocko
>SUSE Labs

--
Wei Yang
Help you, Help me

2020-01-23 23:03:30

by Wei Yang

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
>Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>the semantic of move_pages() was changed to return the number of
>non-migrated pages (failed to migration) and the call would be aborted
>immediately if migrate_pages() returns positive value. But it didn't
>report the number of pages that we even haven't attempted to migrate.
>So, fix it by including non-attempted pages in the return value.
>
>Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
>Suggested-by: Michal Hocko <[email protected]>
>Cc: Wei Yang <[email protected]>
>Cc: <[email protected]> [4.17+]
>Signed-off-by: Yang Shi <[email protected]>
>---
>v2: Rebased on top of the latest mainline kernel per Andrew
>
> mm/migrate.c | 24 ++++++++++++++++++++++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
>diff --git a/mm/migrate.c b/mm/migrate.c
>index 86873b6..9b8eb5d 100644
>--- a/mm/migrate.c
>+++ b/mm/migrate.c
>@@ -1627,8 +1627,18 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> start = i;
> } else if (node != current_node) {
> err = do_move_pages_to_node(mm, &pagelist, current_node);
>- if (err)
>+ if (err) {
>+ /*
>+ * Positive err means the number of failed
>+ * pages to migrate. Since we are going to
>+ * abort and return the number of non-migrated
>+ * pages, so need incude the rest of the
>+ * nr_pages that have not attempted as well.
>+ */
>+ if (err > 0)
>+ err += nr_pages - i - 1;
> goto out;
>+ }
> err = store_status(status, start, current_node, i - start);
> if (err)
> goto out;
>@@ -1659,8 +1669,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> goto out_flush;
>
> err = do_move_pages_to_node(mm, &pagelist, current_node);
>- if (err)
>+ if (err) {
>+ if (err > 0)
>+ err += nr_pages - i - 1;
> goto out;
>+ }
> if (i > start) {
> err = store_status(status, start, current_node, i - start);
> if (err)
>@@ -1674,6 +1687,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>
> /* Make sure we do not overwrite the existing error */
> err1 = do_move_pages_to_node(mm, &pagelist, current_node);
>+ /*
>+ * Don't have to report non-attempted pages here since:

In previous comment, you use "non-migrated". Here is "non-attempted". What's
the difference?

>+ * - If the above loop is done gracefully there is not non-attempted
>+ * page.
>+ * - If the above loop is aborted to it means more fatal error
>+ * happened, should return err.
>+ */
> if (!err1)
> err1 = store_status(status, start, current_node, i - start);
> if (!err)
>--
>1.8.3.1

--
Wei Yang
Help you, Help me

2020-01-23 23:38:22

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages



On 1/23/20 2:59 PM, Wei Yang wrote:
> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
>> Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>> the semantic of move_pages() was changed to return the number of
>> non-migrated pages (failed to migration) and the call would be aborted
>> immediately if migrate_pages() returns positive value. But it didn't
>> report the number of pages that we even haven't attempted to migrate.
>> So, fix it by including non-attempted pages in the return value.
>>
>> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
>> Suggested-by: Michal Hocko <[email protected]>
>> Cc: Wei Yang <[email protected]>
>> Cc: <[email protected]> [4.17+]
>> Signed-off-by: Yang Shi <[email protected]>
>> ---
>> v2: Rebased on top of the latest mainline kernel per Andrew
>>
>> mm/migrate.c | 24 ++++++++++++++++++++++--
>> 1 file changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 86873b6..9b8eb5d 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1627,8 +1627,18 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> start = i;
>> } else if (node != current_node) {
>> err = do_move_pages_to_node(mm, &pagelist, current_node);
>> - if (err)
>> + if (err) {
>> + /*
>> + * Positive err means the number of failed
>> + * pages to migrate. Since we are going to
>> + * abort and return the number of non-migrated
>> + * pages, so need incude the rest of the
>> + * nr_pages that have not attempted as well.
>> + */
>> + if (err > 0)
>> + err += nr_pages - i - 1;
>> goto out;
>> + }
>> err = store_status(status, start, current_node, i - start);
>> if (err)
>> goto out;
>> @@ -1659,8 +1669,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> goto out_flush;
>>
>> err = do_move_pages_to_node(mm, &pagelist, current_node);
>> - if (err)
>> + if (err) {
>> + if (err > 0)
>> + err += nr_pages - i - 1;
>> goto out;
>> + }
>> if (i > start) {
>> err = store_status(status, start, current_node, i - start);
>> if (err)
>> @@ -1674,6 +1687,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>>
>> /* Make sure we do not overwrite the existing error */
>> err1 = do_move_pages_to_node(mm, &pagelist, current_node);
>> + /*
>> + * Don't have to report non-attempted pages here since:
> In previous comment, you use "non-migrated". Here is "non-attempted". What's
> the difference?

In that comment "non-migrated" includes both reported by migrate_pages()
and the non-attempted.

>
>> + * - If the above loop is done gracefully there is not non-attempted
>> + * page.
>> + * - If the above loop is aborted to it means more fatal error
>> + * happened, should return err.
>> + */
>> if (!err1)
>> err1 = store_status(status, start, current_node, i - start);
>> if (!err)
>> --
>> 1.8.3.1

2020-01-23 23:45:52

by Wei Yang

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Wed, Jan 22, 2020 at 07:56:50PM -0800, Yang Shi wrote:
>
>
>On 1/22/20 7:27 PM, Wei Yang wrote:
>> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
>> > Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>> > the semantic of move_pages() was changed to return the number of
>> > non-migrated pages (failed to migration) and the call would be aborted
>> > immediately if migrate_pages() returns positive value. But it didn't
>> > report the number of pages that we even haven't attempted to migrate.
>> > So, fix it by including non-attempted pages in the return value.
>> >
>> First, we want to change the semantic of move_pages(2). The return value
>> indicates the number of pages we didn't managed to migrate?
>
>This is my understanding.
>
>>
>> Second, the return value from migrate_pages() doesn't mean the number of pages
>> we failed to migrate. For example, one -ENOMEM is returned on the first page,
>> migrate_pages() would return 1. But actually, no page successfully migrated.
>
>This would not happen at all since migrate_pages() would just return -ENOMEM
>instead of a positive value, right?
>

Oh, you are right.


--
Wei Yang
Help you, Help me

2020-01-24 00:03:29

by Wei Yang

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Thu, Jan 23, 2020 at 03:36:58PM -0800, Yang Shi wrote:
>
>
>On 1/23/20 2:59 PM, Wei Yang wrote:
>> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
>> > Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>> > the semantic of move_pages() was changed to return the number of
>> > non-migrated pages (failed to migration) and the call would be aborted
>> > immediately if migrate_pages() returns positive value. But it didn't
>> > report the number of pages that we even haven't attempted to migrate.
>> > So, fix it by including non-attempted pages in the return value.
>> >
>> > Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
>> > Suggested-by: Michal Hocko <[email protected]>
>> > Cc: Wei Yang <[email protected]>
>> > Cc: <[email protected]> [4.17+]
>> > Signed-off-by: Yang Shi <[email protected]>
>> > ---
>> > v2: Rebased on top of the latest mainline kernel per Andrew
>> >
>> > mm/migrate.c | 24 ++++++++++++++++++++++--
>> > 1 file changed, 22 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/mm/migrate.c b/mm/migrate.c
>> > index 86873b6..9b8eb5d 100644
>> > --- a/mm/migrate.c
>> > +++ b/mm/migrate.c
>> > @@ -1627,8 +1627,18 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> > start = i;
>> > } else if (node != current_node) {
>> > err = do_move_pages_to_node(mm, &pagelist, current_node);
>> > - if (err)
>> > + if (err) {
>> > + /*
>> > + * Positive err means the number of failed
>> > + * pages to migrate. Since we are going to
>> > + * abort and return the number of non-migrated
>> > + * pages, so need incude the rest of the
>> > + * nr_pages that have not attempted as well.
>> > + */
>> > + if (err > 0)
>> > + err += nr_pages - i - 1;
>> > goto out;
>> > + }
>> > err = store_status(status, start, current_node, i - start);
>> > if (err)
>> > goto out;
>> > @@ -1659,8 +1669,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> > goto out_flush;
>> >
>> > err = do_move_pages_to_node(mm, &pagelist, current_node);
>> > - if (err)
>> > + if (err) {
>> > + if (err > 0)
>> > + err += nr_pages - i - 1;
>> > goto out;
>> > + }
>> > if (i > start) {
>> > err = store_status(status, start, current_node, i - start);
>> > if (err)
>> > @@ -1674,6 +1687,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> >
>> > /* Make sure we do not overwrite the existing error */
>> > err1 = do_move_pages_to_node(mm, &pagelist, current_node);
>> > + /*
>> > + * Don't have to report non-attempted pages here since:
>> In previous comment, you use "non-migrated". Here is "non-attempted". What's
>> the difference?
>
>In that comment "non-migrated" includes both reported by migrate_pages() and
>the non-attempted.
>

ok, I see the difference.

>>
>> > + * - If the above loop is done gracefully there is not non-attempted
>> > + * page.
>> > + * - If the above loop is aborted to it means more fatal error
>> > + * happened, should return err.
>> > + */
>> > if (!err1)
>> > err1 = store_status(status, start, current_node, i - start);
>> > if (!err)
>> > --
>> > 1.8.3.1

--
Wei Yang
Help you, Help me

2020-01-24 06:52:39

by Michal Hocko

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Fri 24-01-20 06:56:47, Wei Yang wrote:
> On Thu, Jan 23, 2020 at 09:55:26AM +0100, Michal Hocko wrote:
> >On Thu 23-01-20 11:27:36, Wei Yang wrote:
> >> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
> >> >Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
> >> >the semantic of move_pages() was changed to return the number of
> >> >non-migrated pages (failed to migration) and the call would be aborted
> >> >immediately if migrate_pages() returns positive value. But it didn't
> >> >report the number of pages that we even haven't attempted to migrate.
> >> >So, fix it by including non-attempted pages in the return value.
> >> >
> >>
> >> First, we want to change the semantic of move_pages(2). The return value
> >> indicates the number of pages we didn't managed to migrate?
> >>
> >> Second, the return value from migrate_pages() doesn't mean the number of pages
> >> we failed to migrate. For example, one -ENOMEM is returned on the first page,
> >> migrate_pages() would return 1. But actually, no page successfully migrated.
> >
> >ENOMEM is considered a permanent failure and as such it is returned by
> >migrate pages (see goto out).
> >
> >> Third, even the migrate_pages() return the exact non-migrate page, we are not
> >> sure those non-migrated pages are at the tail of the list. Because in the last
> >> case in migrate_pages(), it just remove the page from list. It could be a page
> >> in the middle of the list. Then, in userspace, how the return value be
> >> leveraged to determine the valid status? Any page in the list could be the
> >> victim.
> >
> >Yes, I was wrong when stating that the caller would know better which
> >status to check. I misremembered the original patch as it was quite some
> >time ago. While storing the error code would be possible after some
> >massaging of migrate_pages is this really something we deeply care
> >about. The caller can achieve the same by initializing the status array
> >to a non-node number - e.g. -1 - and check based on that.
> >
>
> So for a user, the best practice is to initialize the status array to -1 and
> check each status to see whether the page is migrated successfully?

Yes IMO. Just consider -errno return value. You have no way to find out
which pages have been migrated until we reached that error. The
possitive return value would fall into the same case.

> Then do we need to return the number of non-migrated page? What benefit could
> user get from the number. How about just return an error code to indicate the
> failure? I may miss some point, would you mind giving me a hint?

This is certainly possible. We can return -EAGAIN if some pages couldn't
be migrated because they are pinned. But please read my previous email
to the very end for arguments why this might cause more problems than it
actually solves.

> >This system call has quite a complex semantic and I am not 100% sure
> >what is the right thing to do here. Maybe we do want to continue and try
> >to migrate as much as possible on non-fatal migration failures and
> >accumulate the number of failed pages while doing so.
> >
> >The main problem is that we can have an academic discussion but
> >the primary question is what do actual users want. A lack of real
> >bug reports suggests that nobody has actually noticed this. So I
> >would rather keep returning the correct number of non-migrated
> >pages. Why? Because new users could have started depending on it. It
> >is not all that unlikely that the current implementation would just
> >work for them because they are migrating a set of pages on to the same
> >node so the batch would be a single list throughout the whole given
> >page set.

--
Michal Hocko
SUSE Labs

2020-01-24 20:53:23

by Wei Yang

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Fri, Jan 24, 2020 at 07:46:49AM +0100, Michal Hocko wrote:
>On Fri 24-01-20 06:56:47, Wei Yang wrote:
>> On Thu, Jan 23, 2020 at 09:55:26AM +0100, Michal Hocko wrote:
>> >On Thu 23-01-20 11:27:36, Wei Yang wrote:
>> >> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
>> >> >Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>> >> >the semantic of move_pages() was changed to return the number of
>> >> >non-migrated pages (failed to migration) and the call would be aborted
>> >> >immediately if migrate_pages() returns positive value. But it didn't
>> >> >report the number of pages that we even haven't attempted to migrate.
>> >> >So, fix it by including non-attempted pages in the return value.
>> >> >
>> >>
>> >> First, we want to change the semantic of move_pages(2). The return value
>> >> indicates the number of pages we didn't managed to migrate?
>> >>
>> >> Second, the return value from migrate_pages() doesn't mean the number of pages
>> >> we failed to migrate. For example, one -ENOMEM is returned on the first page,
>> >> migrate_pages() would return 1. But actually, no page successfully migrated.
>> >
>> >ENOMEM is considered a permanent failure and as such it is returned by
>> >migrate pages (see goto out).
>> >
>> >> Third, even the migrate_pages() return the exact non-migrate page, we are not
>> >> sure those non-migrated pages are at the tail of the list. Because in the last
>> >> case in migrate_pages(), it just remove the page from list. It could be a page
>> >> in the middle of the list. Then, in userspace, how the return value be
>> >> leveraged to determine the valid status? Any page in the list could be the
>> >> victim.
>> >
>> >Yes, I was wrong when stating that the caller would know better which
>> >status to check. I misremembered the original patch as it was quite some
>> >time ago. While storing the error code would be possible after some
>> >massaging of migrate_pages is this really something we deeply care
>> >about. The caller can achieve the same by initializing the status array
>> >to a non-node number - e.g. -1 - and check based on that.
>> >
>>
>> So for a user, the best practice is to initialize the status array to -1 and
>> check each status to see whether the page is migrated successfully?
>
>Yes IMO. Just consider -errno return value. You have no way to find out
>which pages have been migrated until we reached that error. The
>possitive return value would fall into the same case.
>
>> Then do we need to return the number of non-migrated page? What benefit could
>> user get from the number. How about just return an error code to indicate the
>> failure? I may miss some point, would you mind giving me a hint?
>
>This is certainly possible. We can return -EAGAIN if some pages couldn't
>be migrated because they are pinned. But please read my previous email
>to the very end for arguments why this might cause more problems than it
>actually solves.
>

Let me put your comment here:

Because new users could have started depending on it. It
is not all that unlikely that the current implementation would just
work for them because they are migrating a set of pages on to the same
node so the batch would be a single list throughout the whole given
page set.

Your idea is to preserve current semantic, return non-migrated pages number to
userspace.

And the reason is:

1. Users have started depending on it.
2. No real bug reported yet.
3. User always migrate page to the same node. (If my understanding is
correct)

I think this gets some reason, since we want to minimize the impact to
userland.

While let's see what user probably use this syscall. Since from the man page,
we never told the return value could be positive, the number of non-migrated
pages, user would think only 0 means a successful migration and all other
cases are failure. Then user probably handle negative and positive return
value the same way, like (!err).

If my guess is true, return a negative error value for this case could
minimize the impact to userland here.
1. Preserve the semantic of move_pages(2): 0 means success, negative means
some error and needs extra handling.
2. Trivial change to the man page.
3. Suppose no change to users.

Well, in case I missed your point, sorry about that.

>> >This system call has quite a complex semantic and I am not 100% sure
>> >what is the right thing to do here. Maybe we do want to continue and try
>> >to migrate as much as possible on non-fatal migration failures and
>> >accumulate the number of failed pages while doing so.
>> >
>> >The main problem is that we can have an academic discussion but
>> >the primary question is what do actual users want. A lack of real
>> >bug reports suggests that nobody has actually noticed this. So I
>> >would rather keep returning the correct number of non-migrated
>> >pages. Why? Because new users could have started depending on it. It
>> >is not all that unlikely that the current implementation would just
>> >work for them because they are migrating a set of pages on to the same
>> >node so the batch would be a single list throughout the whole given
>> >page set.
>
>--
>Michal Hocko
>SUSE Labs

--
Wei Yang
Help you, Help me

2020-01-24 20:54:25

by Michal Hocko

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Fri 24-01-20 23:26:42, Wei Yang wrote:
> On Fri, Jan 24, 2020 at 07:46:49AM +0100, Michal Hocko wrote:
> >On Fri 24-01-20 06:56:47, Wei Yang wrote:
> >> On Thu, Jan 23, 2020 at 09:55:26AM +0100, Michal Hocko wrote:
> >> >On Thu 23-01-20 11:27:36, Wei Yang wrote:
> >> >> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
> >> >> >Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
> >> >> >the semantic of move_pages() was changed to return the number of
> >> >> >non-migrated pages (failed to migration) and the call would be aborted
> >> >> >immediately if migrate_pages() returns positive value. But it didn't
> >> >> >report the number of pages that we even haven't attempted to migrate.
> >> >> >So, fix it by including non-attempted pages in the return value.
> >> >> >
> >> >>
> >> >> First, we want to change the semantic of move_pages(2). The return value
> >> >> indicates the number of pages we didn't managed to migrate?
> >> >>
> >> >> Second, the return value from migrate_pages() doesn't mean the number of pages
> >> >> we failed to migrate. For example, one -ENOMEM is returned on the first page,
> >> >> migrate_pages() would return 1. But actually, no page successfully migrated.
> >> >
> >> >ENOMEM is considered a permanent failure and as such it is returned by
> >> >migrate pages (see goto out).
> >> >
> >> >> Third, even the migrate_pages() return the exact non-migrate page, we are not
> >> >> sure those non-migrated pages are at the tail of the list. Because in the last
> >> >> case in migrate_pages(), it just remove the page from list. It could be a page
> >> >> in the middle of the list. Then, in userspace, how the return value be
> >> >> leveraged to determine the valid status? Any page in the list could be the
> >> >> victim.
> >> >
> >> >Yes, I was wrong when stating that the caller would know better which
> >> >status to check. I misremembered the original patch as it was quite some
> >> >time ago. While storing the error code would be possible after some
> >> >massaging of migrate_pages is this really something we deeply care
> >> >about. The caller can achieve the same by initializing the status array
> >> >to a non-node number - e.g. -1 - and check based on that.
> >> >
> >>
> >> So for a user, the best practice is to initialize the status array to -1 and
> >> check each status to see whether the page is migrated successfully?
> >
> >Yes IMO. Just consider -errno return value. You have no way to find out
> >which pages have been migrated until we reached that error. The
> >possitive return value would fall into the same case.
> >
> >> Then do we need to return the number of non-migrated page? What benefit could
> >> user get from the number. How about just return an error code to indicate the
> >> failure? I may miss some point, would you mind giving me a hint?
> >
> >This is certainly possible. We can return -EAGAIN if some pages couldn't
> >be migrated because they are pinned. But please read my previous email
> >to the very end for arguments why this might cause more problems than it
> >actually solves.
> >
>
> Let me put your comment here:
>
> Because new users could have started depending on it. It
> is not all that unlikely that the current implementation would just
> work for them because they are migrating a set of pages on to the same
> node so the batch would be a single list throughout the whole given
> page set.
>
> Your idea is to preserve current semantic, return non-migrated pages number to
> userspace.
>
> And the reason is:
>
> 1. Users have started depending on it.
> 2. No real bug reported yet.
> 3. User always migrate page to the same node. (If my understanding is
> correct)
>
> I think this gets some reason, since we want to minimize the impact to
> userland.
>
> While let's see what user probably use this syscall. Since from the man page,
> we never told the return value could be positive, the number of non-migrated
> pages, user would think only 0 means a successful migration and all other
> cases are failure. Then user probably handle negative and positive return
> value the same way, like (!err).
>
> If my guess is true, return a negative error value for this case could
> minimize the impact to userland here.
> 1. Preserve the semantic of move_pages(2): 0 means success, negative means
> some error and needs extra handling.
> 2. Trivial change to the man page.
> 3. Suppose no change to users.

Do you have any actual proposal we can discuss? I suspect we are going
in circles here. Sure both ways are possible. The disucssion we are
having here is which behavior makes more sense. The interface is and has
been in the past very awkward. Some corner cases have been fixed some
new created. While I am not happy about the later we should finally land
with some decision.
--
Michal Hocko
SUSE Labs

2020-01-24 20:59:04

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages



On 1/24/20 7:26 AM, Wei Yang wrote:
> On Fri, Jan 24, 2020 at 07:46:49AM +0100, Michal Hocko wrote:
>> On Fri 24-01-20 06:56:47, Wei Yang wrote:
>>> On Thu, Jan 23, 2020 at 09:55:26AM +0100, Michal Hocko wrote:
>>>> On Thu 23-01-20 11:27:36, Wei Yang wrote:
>>>>> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
>>>>>> Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>>>>>> the semantic of move_pages() was changed to return the number of
>>>>>> non-migrated pages (failed to migration) and the call would be aborted
>>>>>> immediately if migrate_pages() returns positive value. But it didn't
>>>>>> report the number of pages that we even haven't attempted to migrate.
>>>>>> So, fix it by including non-attempted pages in the return value.
>>>>>>
>>>>> First, we want to change the semantic of move_pages(2). The return value
>>>>> indicates the number of pages we didn't managed to migrate?
>>>>>
>>>>> Second, the return value from migrate_pages() doesn't mean the number of pages
>>>>> we failed to migrate. For example, one -ENOMEM is returned on the first page,
>>>>> migrate_pages() would return 1. But actually, no page successfully migrated.
>>>> ENOMEM is considered a permanent failure and as such it is returned by
>>>> migrate pages (see goto out).
>>>>
>>>>> Third, even the migrate_pages() return the exact non-migrate page, we are not
>>>>> sure those non-migrated pages are at the tail of the list. Because in the last
>>>>> case in migrate_pages(), it just remove the page from list. It could be a page
>>>>> in the middle of the list. Then, in userspace, how the return value be
>>>>> leveraged to determine the valid status? Any page in the list could be the
>>>>> victim.
>>>> Yes, I was wrong when stating that the caller would know better which
>>>> status to check. I misremembered the original patch as it was quite some
>>>> time ago. While storing the error code would be possible after some
>>>> massaging of migrate_pages is this really something we deeply care
>>>> about. The caller can achieve the same by initializing the status array
>>>> to a non-node number - e.g. -1 - and check based on that.
>>>>
>>> So for a user, the best practice is to initialize the status array to -1 and
>>> check each status to see whether the page is migrated successfully?
>> Yes IMO. Just consider -errno return value. You have no way to find out
>> which pages have been migrated until we reached that error. The
>> possitive return value would fall into the same case.
>>
>>> Then do we need to return the number of non-migrated page? What benefit could
>>> user get from the number. How about just return an error code to indicate the
>>> failure? I may miss some point, would you mind giving me a hint?
>> This is certainly possible. We can return -EAGAIN if some pages couldn't
>> be migrated because they are pinned. But please read my previous email
>> to the very end for arguments why this might cause more problems than it
>> actually solves.
>>
> Let me put your comment here:
>
> Because new users could have started depending on it. It
> is not all that unlikely that the current implementation would just
> work for them because they are migrating a set of pages on to the same
> node so the batch would be a single list throughout the whole given
> page set.
>
> Your idea is to preserve current semantic, return non-migrated pages number to
> userspace.
>
> And the reason is:
>
> 1. Users have started depending on it.
> 2. No real bug reported yet.
> 3. User always migrate page to the same node. (If my understanding is
> correct)
>
> I think this gets some reason, since we want to minimize the impact to
> userland.
>
> While let's see what user probably use this syscall. Since from the man page,
> we never told the return value could be positive, the number of non-migrated
> pages, user would think only 0 means a successful migration and all other
> cases are failure. Then user probably handle negative and positive return
> value the same way, like (!err).
>
> If my guess is true, return a negative error value for this case could
> minimize the impact to userland here.
> 1. Preserve the semantic of move_pages(2): 0 means success, negative means
> some error and needs extra handling.
> 2. Trivial change to the man page.
> 3. Suppose no change to users.
>
> Well, in case I missed your point, sorry about that.

I think we should compare the new semantic with the old one. With the
old semantic the move_pages() return 0 for both success *and* migration
failure. So, I'm supposed (I don't have any real usecase) the user may
do the below with the old semantic:
    - Just check if it is failed (ignore migration failure), "!err" is
good enough.  This usecase is fine as well with the new semantic since
migration failure is also a kind of error cases.
     - Care about migration failure, the user needs traverse all bits
in the status array. With the new semantic they just need check if "err
> 0", if they want to know what specific pages are failed to migrate,
then traverse the status array (with initialized as -1 as Michal
suggested in earlier email).

So, with returning errno for migration failure if the userspace wants to
see if migration is failed, they need do:
    1. Check "!err"
    2. Read errno if #1 returns false
    3. Traverse status array to see how many pages are failed to migrate

But with the new semantic they just need check if "err > 0", one step is
fine for the most cases. So I said this approach seems more
straightforward to the userspace and makes more sense IMHO.

>>>> This system call has quite a complex semantic and I am not 100% sure
>>>> what is the right thing to do here. Maybe we do want to continue and try
>>>> to migrate as much as possible on non-fatal migration failures and
>>>> accumulate the number of failed pages while doing so.
>>>>
>>>> The main problem is that we can have an academic discussion but
>>>> the primary question is what do actual users want. A lack of real
>>>> bug reports suggests that nobody has actually noticed this. So I
>>>> would rather keep returning the correct number of non-migrated
>>>> pages. Why? Because new users could have started depending on it. It
>>>> is not all that unlikely that the current implementation would just
>>>> work for them because they are migrating a set of pages on to the same
>>>> node so the batch would be a single list throughout the whole given
>>>> page set.
>> --
>> Michal Hocko
>> SUSE Labs

2020-01-24 23:20:22

by Wei Yang

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Fri, Jan 24, 2020 at 04:40:15PM +0100, Michal Hocko wrote:
>On Fri 24-01-20 23:26:42, Wei Yang wrote:
>> On Fri, Jan 24, 2020 at 07:46:49AM +0100, Michal Hocko wrote:
>> >On Fri 24-01-20 06:56:47, Wei Yang wrote:
>> >> On Thu, Jan 23, 2020 at 09:55:26AM +0100, Michal Hocko wrote:
>> >> >On Thu 23-01-20 11:27:36, Wei Yang wrote:
>> >> >> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
>> >> >> >Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>> >> >> >the semantic of move_pages() was changed to return the number of
>> >> >> >non-migrated pages (failed to migration) and the call would be aborted
>> >> >> >immediately if migrate_pages() returns positive value. But it didn't
>> >> >> >report the number of pages that we even haven't attempted to migrate.
>> >> >> >So, fix it by including non-attempted pages in the return value.
>> >> >> >
>> >> >>
>> >> >> First, we want to change the semantic of move_pages(2). The return value
>> >> >> indicates the number of pages we didn't managed to migrate?
>> >> >>
>> >> >> Second, the return value from migrate_pages() doesn't mean the number of pages
>> >> >> we failed to migrate. For example, one -ENOMEM is returned on the first page,
>> >> >> migrate_pages() would return 1. But actually, no page successfully migrated.
>> >> >
>> >> >ENOMEM is considered a permanent failure and as such it is returned by
>> >> >migrate pages (see goto out).
>> >> >
>> >> >> Third, even the migrate_pages() return the exact non-migrate page, we are not
>> >> >> sure those non-migrated pages are at the tail of the list. Because in the last
>> >> >> case in migrate_pages(), it just remove the page from list. It could be a page
>> >> >> in the middle of the list. Then, in userspace, how the return value be
>> >> >> leveraged to determine the valid status? Any page in the list could be the
>> >> >> victim.
>> >> >
>> >> >Yes, I was wrong when stating that the caller would know better which
>> >> >status to check. I misremembered the original patch as it was quite some
>> >> >time ago. While storing the error code would be possible after some
>> >> >massaging of migrate_pages is this really something we deeply care
>> >> >about. The caller can achieve the same by initializing the status array
>> >> >to a non-node number - e.g. -1 - and check based on that.
>> >> >
>> >>
>> >> So for a user, the best practice is to initialize the status array to -1 and
>> >> check each status to see whether the page is migrated successfully?
>> >
>> >Yes IMO. Just consider -errno return value. You have no way to find out
>> >which pages have been migrated until we reached that error. The
>> >possitive return value would fall into the same case.
>> >
>> >> Then do we need to return the number of non-migrated page? What benefit could
>> >> user get from the number. How about just return an error code to indicate the
>> >> failure? I may miss some point, would you mind giving me a hint?
>> >
>> >This is certainly possible. We can return -EAGAIN if some pages couldn't
>> >be migrated because they are pinned. But please read my previous email
>> >to the very end for arguments why this might cause more problems than it
>> >actually solves.
>> >
>>
>> Let me put your comment here:
>>
>> Because new users could have started depending on it. It
>> is not all that unlikely that the current implementation would just
>> work for them because they are migrating a set of pages on to the same
>> node so the batch would be a single list throughout the whole given
>> page set.
>>
>> Your idea is to preserve current semantic, return non-migrated pages number to
>> userspace.
>>
>> And the reason is:
>>
>> 1. Users have started depending on it.
>> 2. No real bug reported yet.
>> 3. User always migrate page to the same node. (If my understanding is
>> correct)
>>
>> I think this gets some reason, since we want to minimize the impact to
>> userland.
>>
>> While let's see what user probably use this syscall. Since from the man page,
>> we never told the return value could be positive, the number of non-migrated
>> pages, user would think only 0 means a successful migration and all other
>> cases are failure. Then user probably handle negative and positive return
>> value the same way, like (!err).
>>
>> If my guess is true, return a negative error value for this case could
>> minimize the impact to userland here.
>> 1. Preserve the semantic of move_pages(2): 0 means success, negative means
>> some error and needs extra handling.
>> 2. Trivial change to the man page.
>> 3. Suppose no change to users.
>
>Do you have any actual proposal we can discuss? I suspect we are going
>in circles here. Sure both ways are possible. The disucssion we are
>having here is which behavior makes more sense. The interface is and has
>been in the past very awkward. Some corner cases have been fixed some
>new created. While I am not happy about the later we should finally land
>with some decision.

Ok, I found myself may miss some mechanism about the err reporting from kernel
to userland.

If do_pages_move() returns a negative err, the value would be set into errno
and actually user just see a return value of -1?

So userland just see two types of return value if kernel comply with man page:

0 : success
-1: failure, with reason set into errno

Is my understanding correct? I tried to read the syscall path, but not find
how the negative value is set into errno.

Since our kernel already return a positive value one migration failure, so the
exact return value from move_pages() syscall is:

> 0: number of non-migrate pages
0 : success
-1 : failure, with reason set into errno

Since everything looks good to userland now, we just extend the semantic of
move_pages() to make positive return value an explicit error case.

Is my understanding correct here?

If this is the case, I agree with this fix. It looks the minimal change to
current real world.

>--
>Michal Hocko
>SUSE Labs

--
Wei Yang
Help you, Help me

2020-01-24 23:22:44

by Wei Yang

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Fri, Jan 24, 2020 at 09:48:30AM -0800, Yang Shi wrote:
>
>
>On 1/24/20 7:26 AM, Wei Yang wrote:
>> On Fri, Jan 24, 2020 at 07:46:49AM +0100, Michal Hocko wrote:
>> > On Fri 24-01-20 06:56:47, Wei Yang wrote:
>> > > On Thu, Jan 23, 2020 at 09:55:26AM +0100, Michal Hocko wrote:
>> > > > On Thu 23-01-20 11:27:36, Wei Yang wrote:
>> > > > > On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
>> > > > > > Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>> > > > > > the semantic of move_pages() was changed to return the number of
>> > > > > > non-migrated pages (failed to migration) and the call would be aborted
>> > > > > > immediately if migrate_pages() returns positive value. But it didn't
>> > > > > > report the number of pages that we even haven't attempted to migrate.
>> > > > > > So, fix it by including non-attempted pages in the return value.
>> > > > > >
>> > > > > First, we want to change the semantic of move_pages(2). The return value
>> > > > > indicates the number of pages we didn't managed to migrate?
>> > > > >
>> > > > > Second, the return value from migrate_pages() doesn't mean the number of pages
>> > > > > we failed to migrate. For example, one -ENOMEM is returned on the first page,
>> > > > > migrate_pages() would return 1. But actually, no page successfully migrated.
>> > > > ENOMEM is considered a permanent failure and as such it is returned by
>> > > > migrate pages (see goto out).
>> > > >
>> > > > > Third, even the migrate_pages() return the exact non-migrate page, we are not
>> > > > > sure those non-migrated pages are at the tail of the list. Because in the last
>> > > > > case in migrate_pages(), it just remove the page from list. It could be a page
>> > > > > in the middle of the list. Then, in userspace, how the return value be
>> > > > > leveraged to determine the valid status? Any page in the list could be the
>> > > > > victim.
>> > > > Yes, I was wrong when stating that the caller would know better which
>> > > > status to check. I misremembered the original patch as it was quite some
>> > > > time ago. While storing the error code would be possible after some
>> > > > massaging of migrate_pages is this really something we deeply care
>> > > > about. The caller can achieve the same by initializing the status array
>> > > > to a non-node number - e.g. -1 - and check based on that.
>> > > >
>> > > So for a user, the best practice is to initialize the status array to -1 and
>> > > check each status to see whether the page is migrated successfully?
>> > Yes IMO. Just consider -errno return value. You have no way to find out
>> > which pages have been migrated until we reached that error. The
>> > possitive return value would fall into the same case.
>> >
>> > > Then do we need to return the number of non-migrated page? What benefit could
>> > > user get from the number. How about just return an error code to indicate the
>> > > failure? I may miss some point, would you mind giving me a hint?
>> > This is certainly possible. We can return -EAGAIN if some pages couldn't
>> > be migrated because they are pinned. But please read my previous email
>> > to the very end for arguments why this might cause more problems than it
>> > actually solves.
>> >
>> Let me put your comment here:
>>
>> Because new users could have started depending on it. It
>> is not all that unlikely that the current implementation would just
>> work for them because they are migrating a set of pages on to the same
>> node so the batch would be a single list throughout the whole given
>> page set.
>>
>> Your idea is to preserve current semantic, return non-migrated pages number to
>> userspace.
>>
>> And the reason is:
>>
>> 1. Users have started depending on it.
>> 2. No real bug reported yet.
>> 3. User always migrate page to the same node. (If my understanding is
>> correct)
>>
>> I think this gets some reason, since we want to minimize the impact to
>> userland.
>>
>> While let's see what user probably use this syscall. Since from the man page,
>> we never told the return value could be positive, the number of non-migrated
>> pages, user would think only 0 means a successful migration and all other
>> cases are failure. Then user probably handle negative and positive return
>> value the same way, like (!err).
>>
>> If my guess is true, return a negative error value for this case could
>> minimize the impact to userland here.
>> 1. Preserve the semantic of move_pages(2): 0 means success, negative means
>> some error and needs extra handling.
>> 2. Trivial change to the man page.
>> 3. Suppose no change to users.
>>
>> Well, in case I missed your point, sorry about that.
>
>I think we should compare the new semantic with the old one. With the old
>semantic the move_pages() return 0 for both success *and* migration failure.
>So, I'm supposed (I don't have any real usecase) the user may do the below
>with the old semantic:
>??? - Just check if it is failed (ignore migration failure), "!err" is good
>enough.? This usecase is fine as well with the new semantic since migration
>failure is also a kind of error cases.
>???? - Care about migration failure, the user needs traverse all bits in the
>status array. With the new semantic they just need check if "err > 0", if
>they want to know what specific pages are failed to migrate, then traverse
>the status array (with initialized as -1 as Michal suggested in earlier
>email).
>
>So, with returning errno for migration failure if the userspace wants to see
>if migration is failed, they need do:
>??? 1. Check "!err"
>??? 2. Read errno if #1 returns false
>??? 3. Traverse status array to see how many pages are failed to migrate
>

You are right. I misunderstand the mechanism of error handling on err and
errno.

>But with the new semantic they just need check if "err > 0", one step is fine
>for the most cases. So I said this approach seems more straightforward to the
>userspace and makes more sense IMHO.
>
>> > > > This system call has quite a complex semantic and I am not 100% sure
>> > > > what is the right thing to do here. Maybe we do want to continue and try
>> > > > to migrate as much as possible on non-fatal migration failures and
>> > > > accumulate the number of failed pages while doing so.
>> > > >
>> > > > The main problem is that we can have an academic discussion but
>> > > > the primary question is what do actual users want. A lack of real
>> > > > bug reports suggests that nobody has actually noticed this. So I
>> > > > would rather keep returning the correct number of non-migrated
>> > > > pages. Why? Because new users could have started depending on it. It
>> > > > is not all that unlikely that the current implementation would just
>> > > > work for them because they are migrating a set of pages on to the same
>> > > > node so the batch would be a single list throughout the whole given
>> > > > page set.
>> > --
>> > Michal Hocko
>> > SUSE Labs

--
Wei Yang
Help you, Help me

2020-01-27 09:59:16

by Michal Hocko

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages

On Thu 23-01-20 07:38:51, Yang Shi wrote:
> Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
> the semantic of move_pages() was changed to return the number of
> non-migrated pages (failed to migration) and the call would be aborted
> immediately if migrate_pages() returns positive value. But it didn't
> report the number of pages that we even haven't attempted to migrate.
> So, fix it by including non-attempted pages in the return value.

I would rephrased the changelog like this
"
Since commit 49bd4d71637 ("mm, numa: rework do_pages_move"),
the semantic of move_pages() has changed to return the number of
non-migrated pages if they were result of a non-fatal reasons (usually a
busy page). This was an unintentional change that hasn't been noticed
except for LTP tests which checked for the documented behavior.

There are two ways to go around this change. We can even get back to the
original behavior and return -EAGAIN whenever migrate_pages is not able
to migrate pages due to non-fatal reasons. Another option would be to
simply continue with the changed semantic and extend move_pages
documentation to clarify that -errno is returned on an invalid input or
when migration simply cannot succeed (e.g. -ENOMEM, -EBUSY) or the
number of pages that couldn't have been migrated due to ephemeral
reasons (e.g. page is pinned or locked for other reasons).

This patch implements the second option because this behavior is in
place for some time without anybody complaining and possibly new users
depending on it. Also it allows to have a slightly easier error handling
as the caller knows that it is worth to retry when err > 0.
"

> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
> Suggested-by: Michal Hocko <[email protected]>
> Cc: Wei Yang <[email protected]>
> Cc: <[email protected]> [4.17+]
> Signed-off-by: Yang Shi <[email protected]>

With a more clarification, feel free to add
Acked-by: Michal Hocko <[email protected]>

> ---
> v2: Rebased on top of the latest mainline kernel per Andrew
>
> mm/migrate.c | 24 ++++++++++++++++++++++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 86873b6..9b8eb5d 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1627,8 +1627,18 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> start = i;
> } else if (node != current_node) {
> err = do_move_pages_to_node(mm, &pagelist, current_node);
> - if (err)
> + if (err) {
> + /*
> + * Positive err means the number of failed
> + * pages to migrate. Since we are going to
> + * abort and return the number of non-migrated
> + * pages, so need incude the rest of the
> + * nr_pages that have not attempted as well.
> + */
> + if (err > 0)
> + err += nr_pages - i - 1;
> goto out;
> + }
> err = store_status(status, start, current_node, i - start);
> if (err)
> goto out;
> @@ -1659,8 +1669,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> goto out_flush;
>
> err = do_move_pages_to_node(mm, &pagelist, current_node);
> - if (err)
> + if (err) {
> + if (err > 0)
> + err += nr_pages - i - 1;
> goto out;
> + }
> if (i > start) {
> err = store_status(status, start, current_node, i - start);
> if (err)
> @@ -1674,6 +1687,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>
> /* Make sure we do not overwrite the existing error */
> err1 = do_move_pages_to_node(mm, &pagelist, current_node);
> + /*
> + * Don't have to report non-attempted pages here since:
> + * - If the above loop is done gracefully there is not non-attempted
> + * page.
> + * - If the above loop is aborted to it means more fatal error
> + * happened, should return err.
> + */
> if (!err1)
> err1 = store_status(status, start, current_node, i - start);
> if (!err)
> --
> 1.8.3.1

--
Michal Hocko
SUSE Labs

2020-01-27 16:35:45

by Yang Shi

[permalink] [raw]
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted pages



On 1/27/20 1:55 AM, Michal Hocko wrote:
> On Thu 23-01-20 07:38:51, Yang Shi wrote:
>> Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
>> the semantic of move_pages() was changed to return the number of
>> non-migrated pages (failed to migration) and the call would be aborted
>> immediately if migrate_pages() returns positive value. But it didn't
>> report the number of pages that we even haven't attempted to migrate.
>> So, fix it by including non-attempted pages in the return value.
> I would rephrased the changelog like this
> "
> Since commit 49bd4d71637 ("mm, numa: rework do_pages_move"),
> the semantic of move_pages() has changed to return the number of
> non-migrated pages if they were result of a non-fatal reasons (usually a
> busy page). This was an unintentional change that hasn't been noticed
> except for LTP tests which checked for the documented behavior.
>
> There are two ways to go around this change. We can even get back to the
> original behavior and return -EAGAIN whenever migrate_pages is not able
> to migrate pages due to non-fatal reasons. Another option would be to
> simply continue with the changed semantic and extend move_pages
> documentation to clarify that -errno is returned on an invalid input or
> when migration simply cannot succeed (e.g. -ENOMEM, -EBUSY) or the
> number of pages that couldn't have been migrated due to ephemeral
> reasons (e.g. page is pinned or locked for other reasons).
>
> This patch implements the second option because this behavior is in
> place for some time without anybody complaining and possibly new users
> depending on it. Also it allows to have a slightly easier error handling
> as the caller knows that it is worth to retry when err > 0.
> "
>
>> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
>> Suggested-by: Michal Hocko <[email protected]>
>> Cc: Wei Yang <[email protected]>
>> Cc: <[email protected]> [4.17+]
>> Signed-off-by: Yang Shi <[email protected]>
> With a more clarification, feel free to add
> Acked-by: Michal Hocko <[email protected]>

Thanks. Will post v3 with the rephrased commit log.

>
>> ---
>> v2: Rebased on top of the latest mainline kernel per Andrew
>>
>> mm/migrate.c | 24 ++++++++++++++++++++++--
>> 1 file changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 86873b6..9b8eb5d 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1627,8 +1627,18 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> start = i;
>> } else if (node != current_node) {
>> err = do_move_pages_to_node(mm, &pagelist, current_node);
>> - if (err)
>> + if (err) {
>> + /*
>> + * Positive err means the number of failed
>> + * pages to migrate. Since we are going to
>> + * abort and return the number of non-migrated
>> + * pages, so need incude the rest of the
>> + * nr_pages that have not attempted as well.
>> + */
>> + if (err > 0)
>> + err += nr_pages - i - 1;
>> goto out;
>> + }
>> err = store_status(status, start, current_node, i - start);
>> if (err)
>> goto out;
>> @@ -1659,8 +1669,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> goto out_flush;
>>
>> err = do_move_pages_to_node(mm, &pagelist, current_node);
>> - if (err)
>> + if (err) {
>> + if (err > 0)
>> + err += nr_pages - i - 1;
>> goto out;
>> + }
>> if (i > start) {
>> err = store_status(status, start, current_node, i - start);
>> if (err)
>> @@ -1674,6 +1687,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>>
>> /* Make sure we do not overwrite the existing error */
>> err1 = do_move_pages_to_node(mm, &pagelist, current_node);
>> + /*
>> + * Don't have to report non-attempted pages here since:
>> + * - If the above loop is done gracefully there is not non-attempted
>> + * page.
>> + * - If the above loop is aborted to it means more fatal error
>> + * happened, should return err.
>> + */
>> if (!err1)
>> err1 = store_status(status, start, current_node, i - start);
>> if (!err)
>> --
>> 1.8.3.1