2020-09-17 06:43:04

by Vijay Balakrishna

[permalink] [raw]
Subject: [v4] mm: khugepaged: avoid overriding min_free_kbytes set by user

set_recommended_min_free_kbytes need to honor min_free_kbytes set by the
user. Post start-of-day THP enable or memory hotplug operations can
lose user specified min_free_kbytes, in particular when it is higher than
calculated recommended value.

Signed-off-by: Vijay Balakrishna <[email protected]>
Cc: [email protected]
Reviewed-by: Pavel Tatashin <[email protected]>
---
mm/khugepaged.c | 3 ++-
mm/page_alloc.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 4f7107476a6f..888802e85162 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2283,7 +2283,8 @@ static void set_recommended_min_free_kbytes(void)
(unsigned long) nr_free_buffer_pages() / 20);
recommended_min <<= (PAGE_SHIFT-10);

- if (recommended_min > min_free_kbytes) {
+ if (recommended_min > min_free_kbytes ||
+ recommended_min > user_min_free_kbytes) {
if (user_min_free_kbytes >= 0)
pr_info("raising min_free_kbytes from %d to %lu to help transparent hugepage allocations\n",
min_free_kbytes, recommended_min);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab5e97dc9ca..472646aef477 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -315,7 +315,7 @@ compound_page_dtor * const compound_page_dtors[NR_COMPOUND_DTORS] = {
};

int min_free_kbytes = 1024;
-int user_min_free_kbytes = -1;
+int user_min_free_kbytes;
#ifdef CONFIG_DISCONTIGMEM
/*
* DiscontigMem defines memory ranges as separate pg_data_t even if the ranges
--
2.28.0


2020-09-17 09:38:03

by Michal Hocko

[permalink] [raw]
Subject: Re: [v4] mm: khugepaged: avoid overriding min_free_kbytes set by user

On Thu 17-09-20 11:28:06, Michal Hocko wrote:
> On Wed 16-09-20 23:39:39, Vijay Balakrishna wrote:
> > set_recommended_min_free_kbytes need to honor min_free_kbytes set by the
> > user. Post start-of-day THP enable or memory hotplug operations can
> > lose user specified min_free_kbytes, in particular when it is higher than
> > calculated recommended value.
>
> I was about to recommend a more detailed explanation when I have
> realized that this patch is not really needed after all. Unless I am
> missing something.
>
> init_per_zone_wmark_min ignores the newly calculated min_free_kbytes if
> it is lower than user_min_free_kbytes. So calculated min_free_kbytes >=
> user_min_free_kbytes.
>
> Except for value clamping when the value is reduced and this likely
> needs fixing. But set_recommended_min_free_kbytes should be fine.

Something like
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab5e97dc9ca..69731b19d9bf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7875,9 +7875,9 @@ int __meminit init_per_zone_wmark_min(void)
if (new_min_free_kbytes > user_min_free_kbytes) {
min_free_kbytes = new_min_free_kbytes;
if (min_free_kbytes < 128)
- min_free_kbytes = 128;
+ min_free_kbytes = max(128, user_min_free_kbytes);
if (min_free_kbytes > 262144)
- min_free_kbytes = 262144;
+ min_free_kbytes = max(262144, user_min_free_kbytes);
} else {
pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
new_min_free_kbytes, user_min_free_kbytes);
--
Michal Hocko
SUSE Labs

2020-09-17 09:45:32

by Michal Hocko

[permalink] [raw]
Subject: Re: [v4] mm: khugepaged: avoid overriding min_free_kbytes set by user

On Wed 16-09-20 23:39:39, Vijay Balakrishna wrote:
> set_recommended_min_free_kbytes need to honor min_free_kbytes set by the
> user. Post start-of-day THP enable or memory hotplug operations can
> lose user specified min_free_kbytes, in particular when it is higher than
> calculated recommended value.

I was about to recommend a more detailed explanation when I have
realized that this patch is not really needed after all. Unless I am
missing something.

init_per_zone_wmark_min ignores the newly calculated min_free_kbytes if
it is lower than user_min_free_kbytes. So calculated min_free_kbytes >=
user_min_free_kbytes.

Except for value clamping when the value is reduced and this likely
needs fixing. But set_recommended_min_free_kbytes should be fine.
--
Michal Hocko
SUSE Labs

2020-09-17 17:31:12

by Vijay Balakrishna

[permalink] [raw]
Subject: Re: [v4] mm: khugepaged: avoid overriding min_free_kbytes set by user



On 9/17/2020 2:28 AM, Michal Hocko wrote:
> On Wed 16-09-20 23:39:39, Vijay Balakrishna wrote:
>> set_recommended_min_free_kbytes need to honor min_free_kbytes set by the
>> user. Post start-of-day THP enable or memory hotplug operations can
>> lose user specified min_free_kbytes, in particular when it is higher than
>> calculated recommended value.
>
> I was about to recommend a more detailed explanation when I have
> realized that this patch is not really needed after all. Unless I am
> missing something.
>
> init_per_zone_wmark_min ignores the newly calculated min_free_kbytes if
> it is lower than user_min_free_kbytes. So calculated min_free_kbytes >=
> user_min_free_kbytes.
>
> Except for value clamping when the value is reduced and this likely
> needs fixing. But set_recommended_min_free_kbytes should be fine.
>

IIUC, after start-of-day if a user performs
- THP disable
- modifies min_free_bytes
- THP enable
above sequence currently wouldn't result in calling init_per_zone_wmark_min.

Thanks,
Vijay

2020-09-17 17:54:22

by Michal Hocko

[permalink] [raw]
Subject: Re: [v4] mm: khugepaged: avoid overriding min_free_kbytes set by user

On Thu 17-09-20 10:27:16, Vijay Balakrishna wrote:
>
>
> On 9/17/2020 2:28 AM, Michal Hocko wrote:
> > On Wed 16-09-20 23:39:39, Vijay Balakrishna wrote:
> > > set_recommended_min_free_kbytes need to honor min_free_kbytes set by the
> > > user. Post start-of-day THP enable or memory hotplug operations can
> > > lose user specified min_free_kbytes, in particular when it is higher than
> > > calculated recommended value.
> >
> > I was about to recommend a more detailed explanation when I have
> > realized that this patch is not really needed after all. Unless I am
> > missing something.
> >
> > init_per_zone_wmark_min ignores the newly calculated min_free_kbytes if
> > it is lower than user_min_free_kbytes. So calculated min_free_kbytes >=
> > user_min_free_kbytes.
> >
> > Except for value clamping when the value is reduced and this likely
> > needs fixing. But set_recommended_min_free_kbytes should be fine.
> >
>
> IIUC, after start-of-day if a user performs
> - THP disable
> - modifies min_free_bytes
> - THP enable
> above sequence currently wouldn't result in calling init_per_zone_wmark_min.

I will not, but why do you think this matters? All we should care about
is that auto-tuning shouldn't reduce user provided value [1] and that
the memory hotplug should be consistent with the boot time heuristic.
init_per_zone_wmark_min should make sure that the user value is not
reduced and thp heuristic makes sure it will not reduce this value.
So the property should be transitive with the existing code (modulo the
problem I have highlighted).

[1] one could argue that it shouldn't even increase the value strictly
speaking because an admin might have a very good reason to decrease the
value but this has never been the semantic and changing it now might be
problematic
--
Michal Hocko
SUSE Labs

2020-09-17 18:24:26

by Vijay Balakrishna

[permalink] [raw]
Subject: Re: [v4] mm: khugepaged: avoid overriding min_free_kbytes set by user



On 9/17/2020 10:52 AM, Michal Hocko wrote:
> On Thu 17-09-20 10:27:16, Vijay Balakrishna wrote:
>>
>>
>> On 9/17/2020 2:28 AM, Michal Hocko wrote:
>>> On Wed 16-09-20 23:39:39, Vijay Balakrishna wrote:
>>>> set_recommended_min_free_kbytes need to honor min_free_kbytes set by the
>>>> user. Post start-of-day THP enable or memory hotplug operations can
>>>> lose user specified min_free_kbytes, in particular when it is higher than
>>>> calculated recommended value.
>>>
>>> I was about to recommend a more detailed explanation when I have
>>> realized that this patch is not really needed after all. Unless I am
>>> missing something.
>>>
>>> init_per_zone_wmark_min ignores the newly calculated min_free_kbytes if
>>> it is lower than user_min_free_kbytes. So calculated min_free_kbytes >=
>>> user_min_free_kbytes.
>>>
>>> Except for value clamping when the value is reduced and this likely
>>> needs fixing. But set_recommended_min_free_kbytes should be fine.
>>>
>>
>> IIUC, after start-of-day if a user performs
>> - THP disable
>> - modifies min_free_bytes
>> - THP enable
>> above sequence currently wouldn't result in calling init_per_zone_wmark_min.
>
> I will not, but why do you think this matters? All we should care about
> is that auto-tuning shouldn't reduce user provided value [1] and that
> the memory hotplug should be consistent with the boot time heuristic.
> init_per_zone_wmark_min should make sure that the user value is not
> reduced and thp heuristic makes sure it will not reduce this value.
> So the property should be transitive with the existing code (modulo the
> problem I have highlighted).
>
> [1] one could argue that it shouldn't even increase the value strictly
> speaking because an admin might have a very good reason to decrease the
> value but this has never been the semantic and changing it now might be
> problematic
>

I made an attempt to address Kirill A. Shutemov's comment. And incrased
min_free_kbytes to see the issue in my testing and attempted a fix. I'm
ok leaving as it is. Do not want introduce any changes that may cause
regression.

Thanks,
Vijay

2020-09-18 06:00:09

by Michal Hocko

[permalink] [raw]
Subject: Re: [v4] mm: khugepaged: avoid overriding min_free_kbytes set by user

On Thu 17-09-20 11:16:55, Vijay Balakrishna wrote:
>
>
> On 9/17/2020 10:52 AM, Michal Hocko wrote:
> > On Thu 17-09-20 10:27:16, Vijay Balakrishna wrote:
> > >
> > >
> > > On 9/17/2020 2:28 AM, Michal Hocko wrote:
> > > > On Wed 16-09-20 23:39:39, Vijay Balakrishna wrote:
> > > > > set_recommended_min_free_kbytes need to honor min_free_kbytes set by the
> > > > > user. Post start-of-day THP enable or memory hotplug operations can
> > > > > lose user specified min_free_kbytes, in particular when it is higher than
> > > > > calculated recommended value.
> > > >
> > > > I was about to recommend a more detailed explanation when I have
> > > > realized that this patch is not really needed after all. Unless I am
> > > > missing something.
> > > >
> > > > init_per_zone_wmark_min ignores the newly calculated min_free_kbytes if
> > > > it is lower than user_min_free_kbytes. So calculated min_free_kbytes >=
> > > > user_min_free_kbytes.
> > > >
> > > > Except for value clamping when the value is reduced and this likely
> > > > needs fixing. But set_recommended_min_free_kbytes should be fine.
> > > >
> > >
> > > IIUC, after start-of-day if a user performs
> > > - THP disable
> > > - modifies min_free_bytes
> > > - THP enable
> > > above sequence currently wouldn't result in calling init_per_zone_wmark_min.
> >
> > I will not, but why do you think this matters? All we should care about
> > is that auto-tuning shouldn't reduce user provided value [1] and that
> > the memory hotplug should be consistent with the boot time heuristic.
> > init_per_zone_wmark_min should make sure that the user value is not
> > reduced and thp heuristic makes sure it will not reduce this value.
> > So the property should be transitive with the existing code (modulo the
> > problem I have highlighted).
> >
> > [1] one could argue that it shouldn't even increase the value strictly
> > speaking because an admin might have a very good reason to decrease the
> > value but this has never been the semantic and changing it now might be
> > problematic
> >
>
> I made an attempt to address Kirill A. Shutemov's comment.

This is for Kirill to comment on but my take would be that memory
hotplug really has to alter the user defined min_free_kbytes because it
is manipulating the amount of memory. There are usecases which are
adding a lot of memory.

We are trying to not decrease the value which is arguably a weird semantic
but this is what've been doing for years. We would need to hear a
specific usecase where this matters (e.g. memory hotremove heavy
workalod with manually tuned min_free_kbytes) that misbehaves.

> And incrased
> min_free_kbytes to see the issue in my testing and attempted a fix. I'm ok
> leaving as it is. Do not want introduce any changes that may cause
> regression.

I would recommend reposting the patch which adds heuristic for THP (if
THP is enabled) into the hotplug path, arguing with the consistency and
surprising results when adding memory decreases the value. Your initial
problem is in sizing as mentioned in other email thread and you should
be investigating more but this inconsistency might really come as a
surprise.

All that if Kirill is reconsidering his initial position of course.
--
Michal Hocko
SUSE Labs

2020-09-21 19:08:46

by Vijay Balakrishna

[permalink] [raw]
Subject: Re: [v4] mm: khugepaged: avoid overriding min_free_kbytes set by user



On 9/17/2020 10:56 PM, Michal Hocko wrote:
> On Thu 17-09-20 11:16:55, Vijay Balakrishna wrote:
>>
>>
>> On 9/17/2020 10:52 AM, Michal Hocko wrote:
>>> On Thu 17-09-20 10:27:16, Vijay Balakrishna wrote:
>>>>
>>>>
>>>> On 9/17/2020 2:28 AM, Michal Hocko wrote:
>>>>> On Wed 16-09-20 23:39:39, Vijay Balakrishna wrote:
>>>>>> set_recommended_min_free_kbytes need to honor min_free_kbytes set by the
>>>>>> user. Post start-of-day THP enable or memory hotplug operations can
>>>>>> lose user specified min_free_kbytes, in particular when it is higher than
>>>>>> calculated recommended value.
>>>>>
>>>>> I was about to recommend a more detailed explanation when I have
>>>>> realized that this patch is not really needed after all. Unless I am
>>>>> missing something.
>>>>>
>>>>> init_per_zone_wmark_min ignores the newly calculated min_free_kbytes if
>>>>> it is lower than user_min_free_kbytes. So calculated min_free_kbytes >=
>>>>> user_min_free_kbytes.
>>>>>
>>>>> Except for value clamping when the value is reduced and this likely
>>>>> needs fixing. But set_recommended_min_free_kbytes should be fine.
>>>>>
>>>>
>>>> IIUC, after start-of-day if a user performs
>>>> - THP disable
>>>> - modifies min_free_bytes
>>>> - THP enable
>>>> above sequence currently wouldn't result in calling init_per_zone_wmark_min.
>>>
>>> I will not, but why do you think this matters? All we should care about
>>> is that auto-tuning shouldn't reduce user provided value [1] and that
>>> the memory hotplug should be consistent with the boot time heuristic.
>>> init_per_zone_wmark_min should make sure that the user value is not
>>> reduced and thp heuristic makes sure it will not reduce this value.
>>> So the property should be transitive with the existing code (modulo the
>>> problem I have highlighted).
>>>
>>> [1] one could argue that it shouldn't even increase the value strictly
>>> speaking because an admin might have a very good reason to decrease the
>>> value but this has never been the semantic and changing it now might be
>>> problematic
>>>
>>
>> I made an attempt to address Kirill A. Shutemov's comment.
>
> This is for Kirill to comment on but my take would be that memory
> hotplug really has to alter the user defined min_free_kbytes because it
> is manipulating the amount of memory. There are usecases which are
> adding a lot of memory.
>
> We are trying to not decrease the value which is arguably a weird semantic
> but this is what've been doing for years. We would need to hear a
> specific usecase where this matters (e.g. memory hotremove heavy
> workalod with manually tuned min_free_kbytes) that misbehaves.

In our use case memory hotremove done normally during shutdown and we
aren't manually tuning min_free_kbytes.

>
>> And incrased
>> min_free_kbytes to see the issue in my testing and attempted a fix. I'm ok
>> leaving as it is. Do not want introduce any changes that may cause
>> regression.
>
> I would recommend reposting the patch which adds heuristic for THP (if
> THP is enabled) into the hotplug path, arguing with the consistency and
> surprising results when adding memory decreases the value.

I hope my reposted patch
([v3 1/2] mm: khugepaged: recalculate min_free_kbytes after memory
hotplug as expected by khugepaged)
change log is ok:

When memory is hotplug added or removed the min_free_kbytes must be
recalculated based on what is expected by khugepaged. Currently
after hotplug, min_free_kbytes will be set to a lower default and higher
default set when THP enabled is lost. This change restores
min_free_kbytes as expected for THP consumers.


> Your initial
> problem is in sizing as mentioned in other email thread and you should
> be investigating more but this inconsistency might really come as a
> surprise.
>
> All that if Kirill is reconsidering his initial position of course.
>

Kirill, can you comment or share your opinion?

Thanks,
Vijay

2020-09-22 08:32:04

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [v4] mm: khugepaged: avoid overriding min_free_kbytes set by user

On Mon, Sep 21, 2020 at 12:07:23PM -0700, Vijay Balakrishna wrote:
> >
> > I would recommend reposting the patch which adds heuristic for THP (if
> > THP is enabled) into the hotplug path, arguing with the consistency and
> > surprising results when adding memory decreases the value.
>
> I hope my reposted patch
> ([v3 1/2] mm: khugepaged: recalculate min_free_kbytes after memory hotplug
> as expected by khugepaged)
> change log is ok:
>
> When memory is hotplug added or removed the min_free_kbytes must be
> recalculated based on what is expected by khugepaged. Currently
> after hotplug, min_free_kbytes will be set to a lower default and higher
> default set when THP enabled is lost. This change restores min_free_kbytes
> as expected for THP consumers.

Any scenario when hotremove would result in changing min_free_kbytes?

> > Your initial
> > problem is in sizing as mentioned in other email thread and you should
> > be investigating more but this inconsistency might really come as a
> > surprise.
> >
> > All that if Kirill is reconsidering his initial position of course.
> >
>
> Kirill, can you comment or share your opinion?

Looking again, never decreasing min_free_kbytes is the most reasonable
policy. Sorry for the noise.

But I would like to see a scenario when hotremov will end up changing
min_free_kbytes. It's not obvious to me.

--
Kirill A. Shutemov

2020-09-22 10:08:31

by Michal Hocko

[permalink] [raw]
Subject: Re: [v4] mm: khugepaged: avoid overriding min_free_kbytes set by user

On Tue 22-09-20 10:07:26, Kirill A. Shutemov wrote:
> On Mon, Sep 21, 2020 at 12:07:23PM -0700, Vijay Balakrishna wrote:
> > >
> > > I would recommend reposting the patch which adds heuristic for THP (if
> > > THP is enabled) into the hotplug path, arguing with the consistency and
> > > surprising results when adding memory decreases the value.
> >
> > I hope my reposted patch
> > ([v3 1/2] mm: khugepaged: recalculate min_free_kbytes after memory hotplug
> > as expected by khugepaged)
> > change log is ok:
> >
> > When memory is hotplug added or removed the min_free_kbytes must be
> > recalculated based on what is expected by khugepaged. Currently
> > after hotplug, min_free_kbytes will be set to a lower default and higher
> > default set when THP enabled is lost. This change restores min_free_kbytes
> > as expected for THP consumers.
>
> Any scenario when hotremove would result in changing min_free_kbytes?

init_per_zone_wmark_min is called from both online and offline path. But
I believe the problem is not in the offlining path. A decrease wrt
previous auto tuned value is to be expected. The primary problem is that
the hotadding memory after boot (without any user configured value) will
decrease the value effectively because khugepaged tuning
(set_recommended_min_free_kbytes) is not called.

--
Michal Hocko
SUSE Labs

2020-09-22 16:14:33

by Vijay Balakrishna

[permalink] [raw]
Subject: Re: [v4] mm: khugepaged: avoid overriding min_free_kbytes set by user



On 9/22/2020 3:07 AM, Michal Hocko wrote:
> On Tue 22-09-20 10:07:26, Kirill A. Shutemov wrote:
>> On Mon, Sep 21, 2020 at 12:07:23PM -0700, Vijay Balakrishna wrote:
>>>>
>>>> I would recommend reposting the patch which adds heuristic for THP (if
>>>> THP is enabled) into the hotplug path, arguing with the consistency and
>>>> surprising results when adding memory decreases the value.
>>>
>>> I hope my reposted patch
>>> ([v3 1/2] mm: khugepaged: recalculate min_free_kbytes after memory hotplug
>>> as expected by khugepaged)
>>> change log is ok:
>>>
>>> When memory is hotplug added or removed the min_free_kbytes must be
>>> recalculated based on what is expected by khugepaged. Currently
>>> after hotplug, min_free_kbytes will be set to a lower default and higher
>>> default set when THP enabled is lost. This change restores min_free_kbytes
>>> as expected for THP consumers.
>>
>> Any scenario when hotremove would result in changing min_free_kbytes?
>
> init_per_zone_wmark_min is called from both online and offline path. But
> I believe the problem is not in the offlining path. A decrease wrt
> previous auto tuned value is to be expected. The primary problem is that
> the hotadding memory after boot (without any user configured value) will
> decrease the value effectively because khugepaged tuning
> (set_recommended_min_free_kbytes) is not called.

Thank you Michal and Kirill.

Vijay
>