In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
left in thp_split_pages list. We should move them back to migration
list so that they could be put back to the right list by the caller
otherwise the page refcnt will be leaked here. Also adjust nr_failed
and nr_thp_failed accordingly to make vm events account more accurate.
Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()")
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
---
mm/migrate.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/migrate.c b/mm/migrate.c
index 63a87ef0996f..97dfd1f4870d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1438,6 +1438,14 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
}
nr_failed_pages += nr_subpages;
+ /*
+ * There might be some subpages of fail-to-migrate THPs
+ * left in thp_split_pages list. Move them back to migration
+ * list so that they could be put back to the right list by
+ * the caller otherwise the page refcnt will be leaked.
+ */
+ list_splice_init(&thp_split_pages, from);
+ nr_thp_failed += thp_retry;
goto out;
case -EAGAIN:
if (is_thp)
--
2.23.0
On 2022/3/18 15:08, Feng Tang wrote:
>> In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
>> left in thp_split_pages list. We should move them back to migration
>> list so that they could be put back to the right list by the caller
>> otherwise the page refcnt will be leaked here. Also adjust nr_failed
>> and nr_thp_failed accordingly to make vm events account more accurate.
>
> We just met a real world case for this when checking a malloc-oom
> issue and our fix is similar with yours :).
>
Oh, what a coincidence! :)
> So I think you can remove the 'potential' from the patch subject.
> Feel free to add
>
> Tested-by: Feng Tang <[email protected]>
> Reviewed-by: Feng Tang <[email protected]>
Many thanks for your test and comment!
>
> Thanks,
> Feng
>
>> Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()")
>> Signed-off-by: Miaohe Lin <[email protected]>
>> Reviewed-by: Zi Yan <[email protected]>
>> Reviewed-by: "Huang, Ying" <[email protected]>
>> Reviewed-by: Baolin Wang <[email protected]>
>> ---
>> mm/migrate.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 63a87ef0996f..97dfd1f4870d 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1438,6 +1438,14 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>> }
>>
>> nr_failed_pages += nr_subpages;
>> + /*
>> + * There might be some subpages of fail-to-migrate THPs
>> + * left in thp_split_pages list. Move them back to migration
>> + * list so that they could be put back to the right list by
>> + * the caller otherwise the page refcnt will be leaked.
>> + */
>> + list_splice_init(&thp_split_pages, from);
>> + nr_thp_failed += thp_retry;
>> goto out;
>> case -EAGAIN:
>> if (is_thp)
>> --
>> 2.23.0
>
> .
>
> In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
> left in thp_split_pages list. We should move them back to migration
> list so that they could be put back to the right list by the caller
> otherwise the page refcnt will be leaked here. Also adjust nr_failed
> and nr_thp_failed accordingly to make vm events account more accurate.
We just met a real world case for this when checking a malloc-oom
issue and our fix is similar with yours :).
So I think you can remove the 'potential' from the patch subject.
Feel free to add
Tested-by: Feng Tang <[email protected]>
Reviewed-by: Feng Tang <[email protected]>
Thanks,
Feng
> Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()")
> Signed-off-by: Miaohe Lin <[email protected]>
> Reviewed-by: Zi Yan <[email protected]>
> Reviewed-by: "Huang, Ying" <[email protected]>
> Reviewed-by: Baolin Wang <[email protected]>
> ---
> mm/migrate.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 63a87ef0996f..97dfd1f4870d 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1438,6 +1438,14 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
> }
>
> nr_failed_pages += nr_subpages;
> + /*
> + * There might be some subpages of fail-to-migrate THPs
> + * left in thp_split_pages list. Move them back to migration
> + * list so that they could be put back to the right list by
> + * the caller otherwise the page refcnt will be leaked.
> + */
> + list_splice_init(&thp_split_pages, from);
> + nr_thp_failed += thp_retry;
> goto out;
> case -EAGAIN:
> if (is_thp)
> --
> 2.23.0
On Thu, Mar 17, 2022 at 10:41 PM Miaohe Lin <[email protected]> wrote:
>
> In -ENOMEM case, there might be some subpages of fail-to-migrate THPs
> left in thp_split_pages list. We should move them back to migration
> list so that they could be put back to the right list by the caller
> otherwise the page refcnt will be leaked here. Also adjust nr_failed
> and nr_thp_failed accordingly to make vm events account more accurate.
>
> Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()")
> Signed-off-by: Miaohe Lin <[email protected]>
> Reviewed-by: Zi Yan <[email protected]>
> Reviewed-by: "Huang, Ying" <[email protected]>
> Reviewed-by: Baolin Wang <[email protected]>
Reviewed-by: Muchun Song <[email protected]>