2020-07-23 03:23:36

by Baoquan He

[permalink] [raw]
Subject: [PATCH v2 0/4] mm/hugetlb: Small cleanup and improvement

v1 is here:
https://lore.kernel.org/linux-mm/[email protected]/

Patch 1~3 are small clean up.

Patch 4 is adding warning message when the number of persistent huge
pages is not changed to the exact value written to the sysfs or proc
nr_hugepages file.

v1->v2:
Drop the old patch 1/5 in v1 post, which was thought as typo, while
actually another kind of abbreviation.

Updated patch log of patch 4 which is rephrased by Mike. And move the
added message logging code after the hugetlb_lock dropping, this is
suggested by Mike.


Baoquan He (4):
mm/hugetlb.c: make is_hugetlb_entry_hwpoisoned return bool
mm/hugetlb.c: Remove the unnecessary non_swap_entry()
doc/vm: fix typo in the hugetlb admin documentation
mm/hugetl.c: warn out if expected count of huge pages adjustment is
not achieved

Documentation/admin-guide/mm/hugetlbpage.rst | 2 +-
mm/hugetlb.c | 27 +++++++++++++++-----
2 files changed, 21 insertions(+), 8 deletions(-)

--
2.17.2


2020-07-23 03:23:50

by Baoquan He

[permalink] [raw]
Subject: [PATCH v2 2/4] mm/hugetlb.c: Remove the unnecessary non_swap_entry()

The checking is_migration_entry() and is_hwpoison_entry() are stricter
than non_swap_entry(), means they have covered the conditional check
which non_swap_entry() is doing.

Hence remove the unnecessary non_swap_entry() in is_hugetlb_entry_migration()
and is_hugetlb_entry_hwpoisoned() to simplify code.

Signed-off-by: Baoquan He <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
---
mm/hugetlb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3569e731e66b..c14837854392 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3748,7 +3748,7 @@ bool is_hugetlb_entry_migration(pte_t pte)
if (huge_pte_none(pte) || pte_present(pte))
return false;
swp = pte_to_swp_entry(pte);
- if (non_swap_entry(swp) && is_migration_entry(swp))
+ if (is_migration_entry(swp))
return true;
else
return false;
@@ -3761,7 +3761,7 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
if (huge_pte_none(pte) || pte_present(pte))
return false;
swp = pte_to_swp_entry(pte);
- if (non_swap_entry(swp) && is_hwpoison_entry(swp))
+ if (is_hwpoison_entry(swp))
return true;
else
return false;
--
2.17.2

2020-07-23 03:24:02

by Baoquan He

[permalink] [raw]
Subject: [PATCH v2 3/4] doc/vm: fix typo in the hugetlb admin documentation

Change 'pecify' to 'Specify'.

Signed-off-by: Baoquan He <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
---
Documentation/admin-guide/mm/hugetlbpage.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
index 015a5f7d7854..f7b1c7462991 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -131,7 +131,7 @@ hugepages
parameter is preceded by an invalid hugepagesz parameter, it will
be ignored.
default_hugepagesz
- pecify the default huge page size. This parameter can
+ Specify the default huge page size. This parameter can
only be specified once on the command line. default_hugepagesz can
optionally be followed by the hugepages parameter to preallocate a
specific number of huge pages of default size. The number of default
--
2.17.2

2020-07-23 03:25:31

by Baoquan He

[permalink] [raw]
Subject: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

A customer complained that no message is logged when the number of
persistent huge pages is not changed to the exact value written to
the sysfs or proc nr_hugepages file.

In the current code, a best effort is made to satisfy requests made
via the nr_hugepages file. However, requests may be only partially
satisfied.

Log a message if the code was unsuccessful in fully satisfying a
request. This includes both increasing and decreasing the number
of persistent huge pages.

Signed-off-by: Baoquan He <[email protected]>
---
mm/hugetlb.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c14837854392..b5aa32a13569 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2661,7 +2661,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
nodemask_t *nodes_allowed)
{
- unsigned long min_count, ret;
+ unsigned long min_count, ret, old_max, new_max;
NODEMASK_ALLOC(nodemask_t, node_alloc_noretry, GFP_KERNEL);

/*
@@ -2723,6 +2723,7 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
* pool might be one hugepage larger than it needs to be, but
* within all the constraints specified by the sysctls.
*/
+ old_max = persistent_huge_pages(h);
while (h->surplus_huge_pages && count > persistent_huge_pages(h)) {
if (!adjust_pool_surplus(h, nodes_allowed, -1))
break;
@@ -2779,8 +2780,20 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
}
out:
h->max_huge_pages = persistent_huge_pages(h);
+ new_max = h->max_huge_pages;
spin_unlock(&hugetlb_lock);

+ if (count != new_max) {
+ char buf[32];
+
+ string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
+ pr_warn("HugeTLB: %s %lu of page size %s failed. Only %s %lu hugepages.\n",
+ count > old_max ? "increasing" : "decreasing",
+ abs(count - old_max), buf,
+ count > old_max ? "increased" : "decreased",
+ abs(old_max - new_max));
+ }
+
NODEMASK_FREE(node_alloc_noretry);

return 0;
--
2.17.2

2020-07-23 05:07:20

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] mm/hugetlb.c: Remove the unnecessary non_swap_entry()



On 07/23/2020 08:52 AM, Baoquan He wrote:
> The checking is_migration_entry() and is_hwpoison_entry() are stricter
> than non_swap_entry(), means they have covered the conditional check
> which non_swap_entry() is doing.

They are no stricter as such but implicitly contains non_swap_entry() in itself.
If a swap entry tests positive for either is_[migration|hwpoison]_entry(), then
its swap_type() is among SWP_MIGRATION_READ, SWP_MIGRATION_WRITE and SWP_HWPOISON.
All these types >= MAX_SWAPFILES, exactly what is asserted with non_swap_entry().

>
> Hence remove the unnecessary non_swap_entry() in is_hugetlb_entry_migration()
> and is_hugetlb_entry_hwpoisoned() to simplify code.
>
> Signed-off-by: Baoquan He <[email protected]>
> Reviewed-by: Mike Kravetz <[email protected]>
> Reviewed-by: David Hildenbrand <[email protected]>
> ---
> mm/hugetlb.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3569e731e66b..c14837854392 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3748,7 +3748,7 @@ bool is_hugetlb_entry_migration(pte_t pte)
> if (huge_pte_none(pte) || pte_present(pte))
> return false;
> swp = pte_to_swp_entry(pte);
> - if (non_swap_entry(swp) && is_migration_entry(swp))
> + if (is_migration_entry(swp))
> return true;
> else
> return false;
> @@ -3761,7 +3761,7 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
> if (huge_pte_none(pte) || pte_present(pte))
> return false;
> swp = pte_to_swp_entry(pte);
> - if (non_swap_entry(swp) && is_hwpoison_entry(swp))
> + if (is_hwpoison_entry(swp))
> return true;
> else
> return false;
>

It would be better if the commit message contains details about
the existing redundant check. But either way.

Reviewed-by: Anshuman Khandual <[email protected]>

2020-07-23 05:20:51

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] doc/vm: fix typo in the hugetlb admin documentation



On 07/23/2020 08:52 AM, Baoquan He wrote:
> Change 'pecify' to 'Specify'.
>
> Signed-off-by: Baoquan He <[email protected]>
> Reviewed-by: Mike Kravetz <[email protected]>
> Reviewed-by: David Hildenbrand <[email protected]>
> ---
> Documentation/admin-guide/mm/hugetlbpage.rst | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
> index 015a5f7d7854..f7b1c7462991 100644
> --- a/Documentation/admin-guide/mm/hugetlbpage.rst
> +++ b/Documentation/admin-guide/mm/hugetlbpage.rst
> @@ -131,7 +131,7 @@ hugepages
> parameter is preceded by an invalid hugepagesz parameter, it will
> be ignored.
> default_hugepagesz
> - pecify the default huge page size. This parameter can
> + Specify the default huge page size. This parameter can
> only be specified once on the command line. default_hugepagesz can
> optionally be followed by the hugepages parameter to preallocate a
> specific number of huge pages of default size. The number of default
>

This does not apply on 5.8-rc6 and the original typo seems to be missing
there as well. This section was introduced recently with following commit.

282f4214384e ("hugetlbfs: clean up command line processing")

2020-07-23 05:59:18

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] doc/vm: fix typo in the hugetlb admin documentation

On 07/23/20 at 10:47am, Anshuman Khandual wrote:
>
>
> On 07/23/2020 08:52 AM, Baoquan He wrote:
> > Change 'pecify' to 'Specify'.
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > Reviewed-by: Mike Kravetz <[email protected]>
> > Reviewed-by: David Hildenbrand <[email protected]>
> > ---
> > Documentation/admin-guide/mm/hugetlbpage.rst | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
> > index 015a5f7d7854..f7b1c7462991 100644
> > --- a/Documentation/admin-guide/mm/hugetlbpage.rst
> > +++ b/Documentation/admin-guide/mm/hugetlbpage.rst
> > @@ -131,7 +131,7 @@ hugepages
> > parameter is preceded by an invalid hugepagesz parameter, it will
> > be ignored.
> > default_hugepagesz
> > - pecify the default huge page size. This parameter can
> > + Specify the default huge page size. This parameter can
> > only be specified once on the command line. default_hugepagesz can
> > optionally be followed by the hugepages parameter to preallocate a
> > specific number of huge pages of default size. The number of default
> >
>
> This does not apply on 5.8-rc6 and the original typo seems to be missing
> there as well. This section was introduced recently with following commit.
>
> 282f4214384e ("hugetlbfs: clean up command line processing")

Thanks a lot for reviewing. This patchset is based on the latest
next/master branch, seems below commit introduced the typo which is
later than commit 282f4214384e, it haven't been merged into mainline
tree.

commit 72a3e3e25a5142284c6bc76ecf170c2a18dcdf6e
Author: Mauro Carvalho Chehab <[email protected]>
Date: Tue Jun 23 09:09:06 2020 +0200

docs: hugetlbpage.rst: fix some warnings

2020-07-23 06:15:20

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] mm/hugetlb.c: Remove the unnecessary non_swap_entry()

On 07/23/20 at 10:36am, Anshuman Khandual wrote:
>
>
> On 07/23/2020 08:52 AM, Baoquan He wrote:
> > The checking is_migration_entry() and is_hwpoison_entry() are stricter
> > than non_swap_entry(), means they have covered the conditional check
> > which non_swap_entry() is doing.
>
> They are no stricter as such but implicitly contains non_swap_entry() in itself.
> If a swap entry tests positive for either is_[migration|hwpoison]_entry(), then
> its swap_type() is among SWP_MIGRATION_READ, SWP_MIGRATION_WRITE and SWP_HWPOISON.
> All these types >= MAX_SWAPFILES, exactly what is asserted with non_swap_entry().
>
> >
> > Hence remove the unnecessary non_swap_entry() in is_hugetlb_entry_migration()
> > and is_hugetlb_entry_hwpoisoned() to simplify code.
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > Reviewed-by: Mike Kravetz <[email protected]>
> > Reviewed-by: David Hildenbrand <[email protected]>
> > ---
> > mm/hugetlb.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 3569e731e66b..c14837854392 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -3748,7 +3748,7 @@ bool is_hugetlb_entry_migration(pte_t pte)
> > if (huge_pte_none(pte) || pte_present(pte))
> > return false;
> > swp = pte_to_swp_entry(pte);
> > - if (non_swap_entry(swp) && is_migration_entry(swp))
> > + if (is_migration_entry(swp))
> > return true;
> > else
> > return false;
> > @@ -3761,7 +3761,7 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
> > if (huge_pte_none(pte) || pte_present(pte))
> > return false;
> > swp = pte_to_swp_entry(pte);
> > - if (non_swap_entry(swp) && is_hwpoison_entry(swp))
> > + if (is_hwpoison_entry(swp))
> > return true;
> > else
> > return false;
> >
>
> It would be better if the commit message contains details about
> the existing redundant check. But either way.

Thanks for your advice. Do you think updating the log as below is OK?

~~~~~~~~
If a swap entry tests positive for either is_[migration|hwpoison]_entry(), then
its swap_type() is among SWP_MIGRATION_READ, SWP_MIGRATION_WRITE and SWP_HWPOISON.
All these types >= MAX_SWAPFILES, exactly what is asserted with non_swap_entry().

So the checking non_swap_entry() in is_hugetlb_entry_migration() and
is_hugetlb_entry_hwpoisoned() is redundant.

Let's remove it to optimize code.
~~~~~~~~

Thanks
Baoquan

2020-07-23 06:19:27

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved



On 07/23/2020 08:52 AM, Baoquan He wrote:
> A customer complained that no message is logged wh en the number of
> persistent huge pages is not changed to the exact value written to
> the sysfs or proc nr_hugepages file.
>
> In the current code, a best effort is made to satisfy requests made
> via the nr_hugepages file. However, requests may be only partially
> satisfied.
>
> Log a message if the code was unsuccessful in fully satisfying a
> request. This includes both increasing and decreasing the number
> of persistent huge pages.

But is kernel expected to warn for all such situations where the user
requested resources could not be allocated completely ? Otherwise, it
does not make sense to add an warning for just one such situation.

2020-07-23 08:54:36

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] mm/hugetlb.c: Remove the unnecessary non_swap_entry()



On 07/23/2020 11:44 AM, Baoquan He wrote:
> On 07/23/20 at 10:36am, Anshuman Khandual wrote:
>>
>>
>> On 07/23/2020 08:52 AM, Baoquan He wrote:
>>> The checking is_migration_entry() and is_hwpoison_entry() are stricter
>>> than non_swap_entry(), means they have covered the conditional check
>>> which non_swap_entry() is doing.
>>
>> They are no stricter as such but implicitly contains non_swap_entry() in itself.
>> If a swap entry tests positive for either is_[migration|hwpoison]_entry(), then
>> its swap_type() is among SWP_MIGRATION_READ, SWP_MIGRATION_WRITE and SWP_HWPOISON.
>> All these types >= MAX_SWAPFILES, exactly what is asserted with non_swap_entry().
>>
>>>
>>> Hence remove the unnecessary non_swap_entry() in is_hugetlb_entry_migration()
>>> and is_hugetlb_entry_hwpoisoned() to simplify code.
>>>
>>> Signed-off-by: Baoquan He <[email protected]>
>>> Reviewed-by: Mike Kravetz <[email protected]>
>>> Reviewed-by: David Hildenbrand <[email protected]>
>>> ---
>>> mm/hugetlb.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 3569e731e66b..c14837854392 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -3748,7 +3748,7 @@ bool is_hugetlb_entry_migration(pte_t pte)
>>> if (huge_pte_none(pte) || pte_present(pte))
>>> return false;
>>> swp = pte_to_swp_entry(pte);
>>> - if (non_swap_entry(swp) && is_migration_entry(swp))
>>> + if (is_migration_entry(swp))
>>> return true;
>>> else
>>> return false;
>>> @@ -3761,7 +3761,7 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
>>> if (huge_pte_none(pte) || pte_present(pte))
>>> return false;
>>> swp = pte_to_swp_entry(pte);
>>> - if (non_swap_entry(swp) && is_hwpoison_entry(swp))
>>> + if (is_hwpoison_entry(swp))
>>> return true;
>>> else
>>> return false;
>>>
>>
>> It would be better if the commit message contains details about
>> the existing redundant check. But either way.
>
> Thanks for your advice. Do you think updating the log as below is OK?
>
> ~~~~~~~~
> If a swap entry tests positive for either is_[migration|hwpoison]_entry(), then
> its swap_type() is among SWP_MIGRATION_READ, SWP_MIGRATION_WRITE and SWP_HWPOISON.
> All these types >= MAX_SWAPFILES, exactly what is asserted with non_swap_entry().
>
> So the checking non_swap_entry() in is_hugetlb_entry_migration() and
> is_hugetlb_entry_hwpoisoned() is redundant.
>
> Let's remove it to optimize code.
> ~~~~~~~~

Something like above would be good.

2020-07-23 09:12:29

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

On 07/23/20 at 11:46am, Anshuman Khandual wrote:
>
>
> On 07/23/2020 08:52 AM, Baoquan He wrote:
> > A customer complained that no message is logged wh en the number of
> > persistent huge pages is not changed to the exact value written to
> > the sysfs or proc nr_hugepages file.
> >
> > In the current code, a best effort is made to satisfy requests made
> > via the nr_hugepages file. However, requests may be only partially
> > satisfied.
> >
> > Log a message if the code was unsuccessful in fully satisfying a
> > request. This includes both increasing and decreasing the number
> > of persistent huge pages.
>
> But is kernel expected to warn for all such situations where the user
> requested resources could not be allocated completely ? Otherwise, it
> does not make sense to add an warning for just one such situation.

It's not for just one such situation, we have already had one to warn
out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().

As Mike said, in one time of persistent huge page number setting,
comparing the old value with the new vlaue is good enough for customer
to get the information. However, if customer want to detect and analyze
previous setting failure, logging message will be helpful. So I think
logging the failure or partial success makes sense.

Thanks
Baoquan

2020-07-23 10:49:20

by Baoquan He

[permalink] [raw]
Subject: [PATCH v3 2/4] mm/hugetlb.c: Remove the unnecessary non_swap_entry()

If a swap entry tests positive for either is_[migration|hwpoison]_entry(),
then its swap_type() is among SWP_MIGRATION_READ, SWP_MIGRATION_WRITE and
SWP_HWPOISON. All these types >= MAX_SWAPFILES, exactly what is asserted
with non_swap_entry().

So the checking non_swap_entry() in is_hugetlb_entry_migration() and
is_hugetlb_entry_hwpoisoned() is redundant.

Let's remove it to optimize code.

Signed-off-by: Baoquan He <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
---
v2->v3:
Updated patch log according to Anshuman's comment.

mm/hugetlb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3569e731e66b..c14837854392 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3748,7 +3748,7 @@ bool is_hugetlb_entry_migration(pte_t pte)
if (huge_pte_none(pte) || pte_present(pte))
return false;
swp = pte_to_swp_entry(pte);
- if (non_swap_entry(swp) && is_migration_entry(swp))
+ if (is_migration_entry(swp))
return true;
else
return false;
@@ -3761,7 +3761,7 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
if (huge_pte_none(pte) || pte_present(pte))
return false;
swp = pte_to_swp_entry(pte);
- if (non_swap_entry(swp) && is_hwpoison_entry(swp))
+ if (is_hwpoison_entry(swp))
return true;
else
return false;
--
2.17.2

2020-07-23 18:24:49

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

On 7/23/20 2:11 AM, Baoquan He wrote:
> On 07/23/20 at 11:46am, Anshuman Khandual wrote:
>>
>>
>> On 07/23/2020 08:52 AM, Baoquan He wrote:
>>> A customer complained that no message is logged wh en the number of
>>> persistent huge pages is not changed to the exact value written to
>>> the sysfs or proc nr_hugepages file.
>>>
>>> In the current code, a best effort is made to satisfy requests made
>>> via the nr_hugepages file. However, requests may be only partially
>>> satisfied.
>>>
>>> Log a message if the code was unsuccessful in fully satisfying a
>>> request. This includes both increasing and decreasing the number
>>> of persistent huge pages.
>>
>> But is kernel expected to warn for all such situations where the user
>> requested resources could not be allocated completely ? Otherwise, it
>> does not make sense to add an warning for just one such situation.
>
> It's not for just one such situation, we have already had one to warn
> out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().

Those are a little different in that they are warnings based on kernel
command line parameters.

> As Mike said, in one time of persistent huge page number setting,
> comparing the old value with the new vlaue is good enough for customer
> to get the information. However, if customer want to detect and analyze
> previous setting failure, logging message will be helpful. So I think
> logging the failure or partial success makes sense.

I can understand the argument against adding a new warning for this.
You could even argue that this condition has existed since the time
hugetlb was added to the kernel which was long ago. And, nobody has
complained enough to add a warning. I have even heard of a sysadmin
practice of asking for a ridiculously large amount of hugetlb pages
just so that the kernel will allocate as many as possible. They do
not 'expect' to get the ridiculous amount they asked for. In such
cases, this will be a new warning in their log.

As mentioned in a previous e-mail, when one sets nr_hugepages by writing
to the sysfs or proc file, one needs to read the file to determine if the
number of requested pages were actually allocated. Anyone who does not
do this is just asking for trouble. Yet, I imagine that it may happen.

To be honest, I do not see this log message as something that would be
helpful to end users. Rather, I could see this as being useful to support
people. Support always asks for system logs and this could point out a
possible issue with hugetlb usage.

I do not feel strongly one way or another about adding the warning. Since
it is fairly trivial and could help diagnose issues I am in favor of adding
it. If people feel strongly that it should not be added, I am open to
those arguments.
--
Mike Kravetz

2020-07-24 15:03:12

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

On 07/23/20 at 11:21am, Mike Kravetz wrote:
> On 7/23/20 2:11 AM, Baoquan He wrote:
> > On 07/23/20 at 11:46am, Anshuman Khandual wrote:
> >>
> >>
> >> On 07/23/2020 08:52 AM, Baoquan He wrote:
> >>> A customer complained that no message is logged wh en the number of
> >>> persistent huge pages is not changed to the exact value written to
> >>> the sysfs or proc nr_hugepages file.
> >>>
> >>> In the current code, a best effort is made to satisfy requests made
> >>> via the nr_hugepages file. However, requests may be only partially
> >>> satisfied.
> >>>
> >>> Log a message if the code was unsuccessful in fully satisfying a
> >>> request. This includes both increasing and decreasing the number
> >>> of persistent huge pages.
> >>
> >> But is kernel expected to warn for all such situations where the user
> >> requested resources could not be allocated completely ? Otherwise, it
> >> does not make sense to add an warning for just one such situation.
> >
> > It's not for just one such situation, we have already had one to warn
> > out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().
>
> Those are a little different in that they are warnings based on kernel
> command line parameters.
>
> > As Mike said, in one time of persistent huge page number setting,
> > comparing the old value with the new vlaue is good enough for customer
> > to get the information. However, if customer want to detect and analyze
> > previous setting failure, logging message will be helpful. So I think
> > logging the failure or partial success makes sense.
>
> I can understand the argument against adding a new warning for this.
> You could even argue that this condition has existed since the time
> hugetlb was added to the kernel which was long ago. And, nobody has
> complained enough to add a warning. I have even heard of a sysadmin
> practice of asking for a ridiculously large amount of hugetlb pages
> just so that the kernel will allocate as many as possible. They do
> not 'expect' to get the ridiculous amount they asked for. In such
> cases, this will be a new warning in their log.
>
> As mentioned in a previous e-mail, when one sets nr_hugepages by writing
> to the sysfs or proc file, one needs to read the file to determine if the
> number of requested pages were actually allocated. Anyone who does not
> do this is just asking for trouble. Yet, I imagine that it may happen.
>
> To be honest, I do not see this log message as something that would be
> helpful to end users. Rather, I could see this as being useful to support
> people. Support always asks for system logs and this could point out a
> possible issue with hugetlb usage.
>
> I do not feel strongly one way or another about adding the warning. Since
> it is fairly trivial and could help diagnose issues I am in favor of adding
> it. If people feel strongly that it should not be added, I am open to
> those arguments.

Seems it's all done, and very fair. I appreciate your understanding on
this issue. Will see if any strong concern is raised on the log adding.

2020-08-11 02:15:52

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

Hi Mike,

On 07/23/20 at 11:21am, Mike Kravetz wrote:
> On 7/23/20 2:11 AM, Baoquan He wrote:
...
> >> But is kernel expected to warn for all such situations where the user
> >> requested resources could not be allocated completely ? Otherwise, it
> >> does not make sense to add an warning for just one such situation.
> >
> > It's not for just one such situation, we have already had one to warn
> > out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().
>
> Those are a little different in that they are warnings based on kernel
> command line parameters.
>
> > As Mike said, in one time of persistent huge page number setting,
> > comparing the old value with the new vlaue is good enough for customer
> > to get the information. However, if customer want to detect and analyze
> > previous setting failure, logging message will be helpful. So I think
> > logging the failure or partial success makes sense.
>
> I can understand the argument against adding a new warning for this.
> You could even argue that this condition has existed since the time
> hugetlb was added to the kernel which was long ago. And, nobody has
> complained enough to add a warning. I have even heard of a sysadmin
> practice of asking for a ridiculously large amount of hugetlb pages
> just so that the kernel will allocate as many as possible. They do
> not 'expect' to get the ridiculous amount they asked for. In such
> cases, this will be a new warning in their log.
>
> As mentioned in a previous e-mail, when one sets nr_hugepages by writing
> to the sysfs or proc file, one needs to read the file to determine if the
> number of requested pages were actually allocated. Anyone who does not
> do this is just asking for trouble. Yet, I imagine that it may happen.
>
> To be honest, I do not see this log message as something that would be
> helpful to end users. Rather, I could see this as being useful to support
> people. Support always asks for system logs and this could point out a
> possible issue with hugetlb usage.
>
> I do not feel strongly one way or another about adding the warning. Since
> it is fairly trivial and could help diagnose issues I am in favor of adding
> it. If people feel strongly that it should not be added, I am open to
> those arguments.

Ping!

It's been a while, seems no objection to log the message. Do you
consider accepting this patch or offering an Ack?

Thanks
Baoquan

2020-08-11 03:39:07

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

Cc: Michal

On 8/10/20 7:11 PM, Baoquan He wrote:
> Hi Mike,
>
> On 07/23/20 at 11:21am, Mike Kravetz wrote:
>> On 7/23/20 2:11 AM, Baoquan He wrote:
> ...
>>>> But is kernel expected to warn for all such situations where the user
>>>> requested resources could not be allocated completely ? Otherwise, it
>>>> does not make sense to add an warning for just one such situation.
>>>
>>> It's not for just one such situation, we have already had one to warn
>>> out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().
>>
>> Those are a little different in that they are warnings based on kernel
>> command line parameters.
>>
>>> As Mike said, in one time of persistent huge page number setting,
>>> comparing the old value with the new vlaue is good enough for customer
>>> to get the information. However, if customer want to detect and analyze
>>> previous setting failure, logging message will be helpful. So I think
>>> logging the failure or partial success makes sense.
>>
>> I can understand the argument against adding a new warning for this.
>> You could even argue that this condition has existed since the time
>> hugetlb was added to the kernel which was long ago. And, nobody has
>> complained enough to add a warning. I have even heard of a sysadmin
>> practice of asking for a ridiculously large amount of hugetlb pages
>> just so that the kernel will allocate as many as possible. They do
>> not 'expect' to get the ridiculous amount they asked for. In such
>> cases, this will be a new warning in their log.
>>
>> As mentioned in a previous e-mail, when one sets nr_hugepages by writing
>> to the sysfs or proc file, one needs to read the file to determine if the
>> number of requested pages were actually allocated. Anyone who does not
>> do this is just asking for trouble. Yet, I imagine that it may happen.
>>
>> To be honest, I do not see this log message as something that would be
>> helpful to end users. Rather, I could see this as being useful to support
>> people. Support always asks for system logs and this could point out a
>> possible issue with hugetlb usage.
>>
>> I do not feel strongly one way or another about adding the warning. Since
>> it is fairly trivial and could help diagnose issues I am in favor of adding
>> it. If people feel strongly that it should not be added, I am open to
>> those arguments.
>
> Ping!
>
> It's been a while, seems no objection to log the message. Do you
> consider accepting this patch or offering an Ack?
>
> Thanks
> Baoquan

Adding Michal as he has had opinions about hugetlbfs log messages in the past.

--
Mike Kravetz

2020-08-11 07:26:02

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

On Mon 10-08-20 20:35:25, Mike Kravetz wrote:
> Cc: Michal
>
> On 8/10/20 7:11 PM, Baoquan He wrote:
> > Hi Mike,
> >
> > On 07/23/20 at 11:21am, Mike Kravetz wrote:
> >> On 7/23/20 2:11 AM, Baoquan He wrote:
> > ...
> >>>> But is kernel expected to warn for all such situations where the user
> >>>> requested resources could not be allocated completely ? Otherwise, it
> >>>> does not make sense to add an warning for just one such situation.
> >>>
> >>> It's not for just one such situation, we have already had one to warn
> >>> out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().
> >>
> >> Those are a little different in that they are warnings based on kernel
> >> command line parameters.
> >>
> >>> As Mike said, in one time of persistent huge page number setting,
> >>> comparing the old value with the new vlaue is good enough for customer
> >>> to get the information. However, if customer want to detect and analyze
> >>> previous setting failure, logging message will be helpful. So I think
> >>> logging the failure or partial success makes sense.
> >>
> >> I can understand the argument against adding a new warning for this.
> >> You could even argue that this condition has existed since the time
> >> hugetlb was added to the kernel which was long ago. And, nobody has
> >> complained enough to add a warning. I have even heard of a sysadmin
> >> practice of asking for a ridiculously large amount of hugetlb pages
> >> just so that the kernel will allocate as many as possible. They do
> >> not 'expect' to get the ridiculous amount they asked for. In such
> >> cases, this will be a new warning in their log.
> >>
> >> As mentioned in a previous e-mail, when one sets nr_hugepages by writing
> >> to the sysfs or proc file, one needs to read the file to determine if the
> >> number of requested pages were actually allocated. Anyone who does not
> >> do this is just asking for trouble. Yet, I imagine that it may happen.
> >>
> >> To be honest, I do not see this log message as something that would be
> >> helpful to end users. Rather, I could see this as being useful to support
> >> people. Support always asks for system logs and this could point out a
> >> possible issue with hugetlb usage.
> >>
> >> I do not feel strongly one way or another about adding the warning. Since
> >> it is fairly trivial and could help diagnose issues I am in favor of adding
> >> it. If people feel strongly that it should not be added, I am open to
> >> those arguments.
> >
> > Ping!
> >
> > It's been a while, seems no objection to log the message. Do you
> > consider accepting this patch or offering an Ack?
> >
> > Thanks
> > Baoquan
>
> Adding Michal as he has had opinions about hugetlbfs log messages in the past.

My opinion is that the warning is too late to add at this stage. It
would have been much better if the user interface has provided a
reasonable feedback on how much the request was sucessful. But this
is not the case (except for few error cases) and we have to live with
the interface where the caller has to read the value after writing to
it. Lame but a reality.

I have heard about people making an opportunistic attempt to grab as
many hugetlb pages as possible and they do expect the failure and scale
the request size down. I do not think those would appreciate warnings.

That being said I would rather keep the existing behavior even though it
is suboptimal. It is just trivial to add the check in the userspace
without risking complains by other users. Besides the warning is not
really telling us much more than a subsequent read anyway. You are not
going to learn why the allocation has failed because that one is done
(intentionaly) as __GFP_NOWARN.
--
Michal Hocko
SUSE Labs

2020-08-11 23:13:05

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

On 8/11/20 12:24 AM, Michal Hocko wrote:
>
> My opinion is that the warning is too late to add at this stage. It
> would have been much better if the user interface has provided a
> reasonable feedback on how much the request was sucessful. But this
> is not the case (except for few error cases) and we have to live with
> the interface where the caller has to read the value after writing to
> it. Lame but a reality.
>
> I have heard about people making an opportunistic attempt to grab as
> many hugetlb pages as possible and they do expect the failure and scale
> the request size down. I do not think those would appreciate warnings.
>
> That being said I would rather keep the existing behavior even though it
> is suboptimal. It is just trivial to add the check in the userspace
> without risking complains by other users. Besides the warning is not
> really telling us much more than a subsequent read anyway. You are not
> going to learn why the allocation has failed because that one is done
> (intentionaly) as __GFP_NOWARN.
>

Thanks Michal.

As previously stated, I do not have a strong opinion about this. Because of
this, let's just leave things as they are and not add the message.

It is pretty clear that a user needs to read the value after writing to
determine if all pages were allocated. The log message would add little
benefit to the end user.
--
Mike Kravetz

2020-08-24 21:59:31

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] mm/hugetlb: Small cleanup and improvement

Hello Andrew,

Unless someone objects, can you add patches 1-3 of this series to your tree.
They have been reviewed and are fairly simple cleanups.
--
Mike Kravetz

On 7/22/20 8:22 PM, Baoquan He wrote:
> v1 is here:
> https://lore.kernel.org/linux-mm/[email protected]/
>
> Patch 1~3 are small clean up.
>
> Patch 4 is adding warning message when the number of persistent huge
> pages is not changed to the exact value written to the sysfs or proc
> nr_hugepages file.
>
> v1->v2:
> Drop the old patch 1/5 in v1 post, which was thought as typo, while
> actually another kind of abbreviation.
>
> Updated patch log of patch 4 which is rephrased by Mike. And move the
> added message logging code after the hugetlb_lock dropping, this is
> suggested by Mike.
>
>
> Baoquan He (4):
> mm/hugetlb.c: make is_hugetlb_entry_hwpoisoned return bool
> mm/hugetlb.c: Remove the unnecessary non_swap_entry()
> doc/vm: fix typo in the hugetlb admin documentation
> mm/hugetl.c: warn out if expected count of huge pages adjustment is
> not achieved
>
> Documentation/admin-guide/mm/hugetlbpage.rst | 2 +-
> mm/hugetlb.c | 27 +++++++++++++++-----
> 2 files changed, 21 insertions(+), 8 deletions(-)
>