2014-01-02 18:05:29

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On 12/31/2013 04:29 PM, Han Pingtian wrote:
> min_free_kbytes may be updated during thp's initialization. Sometimes,
> this will change the value being set by user. Showing message will
> clarify this confusion.
...
> - if (recommended_min > min_free_kbytes)
> + if (recommended_min > min_free_kbytes) {
> min_free_kbytes = recommended_min;
> + pr_info("min_free_kbytes is updated to %d by enabling transparent hugepage.\n",
> + min_free_kbytes);
> + }

"updated" doesn't tell us much. It's also kinda nasty that if we enable
then disable THP, we end up with an elevated min_free_kbytes. Maybe we
should at least put something in that tells the user how to get back
where they were if they care:

"raising min_free_kbytes from %d to %d to help transparent hugepage
allocations"


2014-01-02 21:58:57

by David Rientjes

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Thu, 2 Jan 2014, Dave Hansen wrote:

> > min_free_kbytes may be updated during thp's initialization. Sometimes,
> > this will change the value being set by user. Showing message will
> > clarify this confusion.
> ...
> > - if (recommended_min > min_free_kbytes)
> > + if (recommended_min > min_free_kbytes) {
> > min_free_kbytes = recommended_min;
> > + pr_info("min_free_kbytes is updated to %d by enabling transparent hugepage.\n",
> > + min_free_kbytes);
> > + }
>
> "updated" doesn't tell us much. It's also kinda nasty that if we enable
> then disable THP, we end up with an elevated min_free_kbytes. Maybe we
> should at least put something in that tells the user how to get back
> where they were if they care:
>

The default value of min_free_kbytes depends on the implementation of the
VM regardless of any config options that you may have enabled. We don't
specify what the non-thp default is in the kernel log, so why do we need
to specify what the thp default is?

2014-01-02 22:10:17

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On 01/02/2014 01:58 PM, David Rientjes wrote:
> On Thu, 2 Jan 2014, Dave Hansen wrote:
>
>>> min_free_kbytes may be updated during thp's initialization. Sometimes,
>>> this will change the value being set by user. Showing message will
>>> clarify this confusion.
>> ...
>>> - if (recommended_min > min_free_kbytes)
>>> + if (recommended_min > min_free_kbytes) {
>>> min_free_kbytes = recommended_min;
>>> + pr_info("min_free_kbytes is updated to %d by enabling transparent hugepage.\n",
>>> + min_free_kbytes);
>>> + }
>>
>> "updated" doesn't tell us much. It's also kinda nasty that if we enable
>> then disable THP, we end up with an elevated min_free_kbytes. Maybe we
>> should at least put something in that tells the user how to get back
>> where they were if they care:
>
> The default value of min_free_kbytes depends on the implementation of the
> VM regardless of any config options that you may have enabled. We don't
> specify what the non-thp default is in the kernel log, so why do we need
> to specify what the thp default is?

Let's say enabling THP made my system behave badly. How do I get it
back to the state before I enabled THP? The user has to have gone and
recorded what their min_free_kbytes was before turning THP on in order
to get it back to where it was. Folks also have to either plan in
advance (archiving *ALL* the sysctl settings), somehow *know* somehow
that THP can affect min_free_kbytes, or just plain be clairvoyant.

This seems like a pretty straightforward way to be transparent about
what the kernel mucked with, and exactly how it did it instead of
requiring clairvoyant sysadmins.

2014-01-02 23:36:50

by David Rientjes

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Thu, 2 Jan 2014, Dave Hansen wrote:

> > The default value of min_free_kbytes depends on the implementation of the
> > VM regardless of any config options that you may have enabled. We don't
> > specify what the non-thp default is in the kernel log, so why do we need
> > to specify what the thp default is?
>
> Let's say enabling THP made my system behave badly. How do I get it
> back to the state before I enabled THP? The user has to have gone and
> recorded what their min_free_kbytes was before turning THP on in order
> to get it back to where it was. Folks also have to either plan in
> advance (archiving *ALL* the sysctl settings), somehow *know* somehow
> that THP can affect min_free_kbytes, or just plain be clairvoyant.
>

How is this different from some initscript changing the value? We should
either specify that min_free_kbytes changed from its default, which may
change from kernel version to kernel version itself, in all cases or just
leave it as it currently is. There's no reason to special-case thp in
this way if there are other ways to change the value.

2014-01-02 23:48:46

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On 01/02/2014 03:36 PM, David Rientjes wrote:
> On Thu, 2 Jan 2014, Dave Hansen wrote:
>> Let's say enabling THP made my system behave badly. How do I get it
>> back to the state before I enabled THP? The user has to have gone and
>> recorded what their min_free_kbytes was before turning THP on in order
>> to get it back to where it was. Folks also have to either plan in
>> advance (archiving *ALL* the sysctl settings), somehow *know* somehow
>> that THP can affect min_free_kbytes, or just plain be clairvoyant.
>>
> How is this different from some initscript changing the value? We should
> either specify that min_free_kbytes changed from its default, which may
> change from kernel version to kernel version itself, in all cases or just
> leave it as it currently is. There's no reason to special-case thp in
> this way if there are other ways to change the value.

Ummm.... It's different because one is the kernel changing it and the
other is userspace. If I wonder how the heck this got set:

kernel.core_pattern = |/usr/share/apport/apport %p %s %c

I do:

$ grep -r /usr/share/apport/apport /etc/
/etc/init/apport.conf: /usr/share/apport/apportcheckresume || true
/etc/init/apport.conf: echo "|/usr/share/apport/apport %p %s %c" >
/proc/sys/kernel/core_pattern

There's usually a record of how it got set, somewhere, if it happened
from userspace. Printing messages like this in the kernel does the
same: it gives the sysadmin a _chance_ of finding out what happened.
Doing it silently (like it's done today) isn't very nice.

You're arguing that "if userspace can set it arbitrarily, then the
kernel should be able to do it silently too." That's nonsense.

It would be nice to have tracepoints explicitly for tracing who messed
with sysctl values, too.

2014-01-03 03:33:13

by Han Pingtian

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Thu, Jan 02, 2014 at 10:05:21AM -0800, Dave Hansen wrote:
> On 12/31/2013 04:29 PM, Han Pingtian wrote:
> > min_free_kbytes may be updated during thp's initialization. Sometimes,
> > this will change the value being set by user. Showing message will
> > clarify this confusion.
> ...
> > - if (recommended_min > min_free_kbytes)
> > + if (recommended_min > min_free_kbytes) {
> > min_free_kbytes = recommended_min;
> > + pr_info("min_free_kbytes is updated to %d by enabling transparent hugepage.\n",
> > + min_free_kbytes);
> > + }
>
> "updated" doesn't tell us much. It's also kinda nasty that if we enable
> then disable THP, we end up with an elevated min_free_kbytes. Maybe we
> should at least put something in that tells the user how to get back
> where they were if they care:
>
> "raising min_free_kbytes from %d to %d to help transparent hugepage
> allocations"
>
Thanks. I have updated it according to your suggestion.


>From f9902b16ff0c326349e72eca9facef2c98f8595d Mon Sep 17 00:00:00 2001
From: Han Pingtian <[email protected]>
Date: Fri, 3 Jan 2014 11:10:49 +0800
Subject: [PATCH] mm: show message when raising min_free_kbytes in THP

min_free_kbytes may be raised during THP's initialization. Sometimes,
this will change the value being set by user. Showing message will
clarify this confusion.

Showing the old value of min_free_kbytes according to Dave Hansen's
suggestion. This will give user the chance to restore old value of
min_free_kbytes.

Signed-off-by: Han Pingtian <[email protected]>
---
mm/huge_memory.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7de1bf8..1f0356d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -130,8 +130,11 @@ static int set_recommended_min_free_kbytes(void)
(unsigned long) nr_free_buffer_pages() / 20);
recommended_min <<= (PAGE_SHIFT-10);

- if (recommended_min > min_free_kbytes)
+ if (recommended_min > min_free_kbytes) {
+ pr_info("raising min_free_kbytes from %d to %d to help transparent hugepage allocations\n",
+ min_free_kbytes, recommended_min);
min_free_kbytes = recommended_min;
+ }
setup_per_zone_wmarks();
return 0;
}
--
1.7.7.6

2014-01-03 18:18:05

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On 01/02/2014 07:33 PM, Han Pingtian wrote:
> @@ -130,8 +130,11 @@ static int set_recommended_min_free_kbytes(void)
> (unsigned long) nr_free_buffer_pages() / 20);
> recommended_min <<= (PAGE_SHIFT-10);
>
> - if (recommended_min > min_free_kbytes)
> + if (recommended_min > min_free_kbytes) {
> + pr_info("raising min_free_kbytes from %d to %d to help transparent hugepage allocations\n",
> + min_free_kbytes, recommended_min);
> min_free_kbytes = recommended_min;
> + }
> setup_per_zone_wmarks();
> return 0;
> }

I know I gave you that big bloated string, but 108 columns is a _wee_
bit over 80. :)

Otherwise, I do like the new message

2014-01-05 00:35:10

by Han Pingtian

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Fri, Jan 03, 2014 at 10:17:54AM -0800, Dave Hansen wrote:
> On 01/02/2014 07:33 PM, Han Pingtian wrote:
> > @@ -130,8 +130,11 @@ static int set_recommended_min_free_kbytes(void)
> > (unsigned long) nr_free_buffer_pages() / 20);
> > recommended_min <<= (PAGE_SHIFT-10);
> >
> > - if (recommended_min > min_free_kbytes)
> > + if (recommended_min > min_free_kbytes) {
> > + pr_info("raising min_free_kbytes from %d to %d to help transparent hugepage allocations\n",
> > + min_free_kbytes, recommended_min);
> > min_free_kbytes = recommended_min;
> > + }
> > setup_per_zone_wmarks();
> > return 0;
> > }
>
> I know I gave you that big bloated string, but 108 columns is a _wee_
> bit over 80. :)
>
> Otherwise, I do like the new message

Thanks. This is the new version:

>From f4d085a880dfae7638b33c242554efb0afc0852b Mon Sep 17 00:00:00 2001
From: Han Pingtian <[email protected]>
Date: Fri, 3 Jan 2014 11:10:49 +0800
Subject: [PATCH] mm: show message when raising min_free_kbytes in THP

min_free_kbytes may be raised during THP's initialization. Sometimes,
this will change the value being set by user. Showing message will
clarify this confusion.

Showing the old value of min_free_kbytes according to Dave Hansen's
suggestion. This will give user the chance to restore old value of
min_free_kbytes.

Signed-off-by: Han Pingtian <[email protected]>
---
mm/huge_memory.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7de1bf8..7910360 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -130,8 +130,12 @@ static int set_recommended_min_free_kbytes(void)
(unsigned long) nr_free_buffer_pages() / 20);
recommended_min <<= (PAGE_SHIFT-10);

- if (recommended_min > min_free_kbytes)
+ if (recommended_min > min_free_kbytes) {
+ pr_info("raising min_free_kbytes from %d to %d "
+ "to help transparent hugepage allocations\n",
+ min_free_kbytes, recommended_min);
min_free_kbytes = recommended_min;
+ }
setup_per_zone_wmarks();
return 0;
}
--
1.7.7.6

2014-01-06 16:46:11

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Sun 05-01-14 08:35:01, Han Pingtian wrote:
[...]
> From f4d085a880dfae7638b33c242554efb0afc0852b Mon Sep 17 00:00:00 2001
> From: Han Pingtian <[email protected]>
> Date: Fri, 3 Jan 2014 11:10:49 +0800
> Subject: [PATCH] mm: show message when raising min_free_kbytes in THP
>
> min_free_kbytes may be raised during THP's initialization. Sometimes,
> this will change the value being set by user. Showing message will
> clarify this confusion.

I do not have anything against informing about changing value
set by user but this will inform also when the default value is
updated. Is this what you want? Don't you want to check against
user_min_free_kbytes? (0 if not set by user)

Btw. Do we want to restore the original value when khugepaged is
disabled?

> Showing the old value of min_free_kbytes according to Dave Hansen's
> suggestion. This will give user the chance to restore old value of
> min_free_kbytes.
>
> Signed-off-by: Han Pingtian <[email protected]>
> ---
> mm/huge_memory.c | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7de1bf8..7910360 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -130,8 +130,12 @@ static int set_recommended_min_free_kbytes(void)
> (unsigned long) nr_free_buffer_pages() / 20);
> recommended_min <<= (PAGE_SHIFT-10);
>
> - if (recommended_min > min_free_kbytes)
> + if (recommended_min > min_free_kbytes) {
> + pr_info("raising min_free_kbytes from %d to %d "
> + "to help transparent hugepage allocations\n",
> + min_free_kbytes, recommended_min);
> min_free_kbytes = recommended_min;
> + }
> setup_per_zone_wmarks();
> return 0;
> }
> --
> 1.7.7.6
>

--
Michal Hocko
SUSE Labs

2014-01-08 04:00:01

by Han Pingtian

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Mon, Jan 06, 2014 at 05:46:04PM +0100, Michal Hocko wrote:
> On Sun 05-01-14 08:35:01, Han Pingtian wrote:
> [...]
> > From f4d085a880dfae7638b33c242554efb0afc0852b Mon Sep 17 00:00:00 2001
> > From: Han Pingtian <[email protected]>
> > Date: Fri, 3 Jan 2014 11:10:49 +0800
> > Subject: [PATCH] mm: show message when raising min_free_kbytes in THP
> >
> > min_free_kbytes may be raised during THP's initialization. Sometimes,
> > this will change the value being set by user. Showing message will
> > clarify this confusion.
>
> I do not have anything against informing about changing value
> set by user but this will inform also when the default value is
> updated. Is this what you want? Don't you want to check against
> user_min_free_kbytes? (0 if not set by user)
>
But looks like the user can set min_free_kbytes to 0 by

echo 0 > /proc/sys/vm/min_free_kbytes

and even set it to -1 the same way. So I think we need to restrict the
value of min_free_kbytes > 0 first?

> Btw. Do we want to restore the original value when khugepaged is
> disabled?
>

2014-01-08 08:20:11

by Han Pingtian

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Mon, Jan 06, 2014 at 05:46:04PM +0100, Michal Hocko wrote:
> On Sun 05-01-14 08:35:01, Han Pingtian wrote:
> [...]
> > From f4d085a880dfae7638b33c242554efb0afc0852b Mon Sep 17 00:00:00 2001
> > From: Han Pingtian <[email protected]>
> > Date: Fri, 3 Jan 2014 11:10:49 +0800
> > Subject: [PATCH] mm: show message when raising min_free_kbytes in THP
> >
> > min_free_kbytes may be raised during THP's initialization. Sometimes,
> > this will change the value being set by user. Showing message will
> > clarify this confusion.
>
> I do not have anything against informing about changing value
> set by user but this will inform also when the default value is
> updated. Is this what you want? Don't you want to check against
> user_min_free_kbytes? (0 if not set by user)
>

To use user_min_free_kbytes in mm/huge_memory.c, we need a

extern int user_min_free_kbytes;

in somewhere? Where should we put it? I guess it is mm/internal.h,
right?

Thanks.

2014-01-08 10:16:18

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Wed 08-01-14 16:20:01, Han Pingtian wrote:
> On Mon, Jan 06, 2014 at 05:46:04PM +0100, Michal Hocko wrote:
> > On Sun 05-01-14 08:35:01, Han Pingtian wrote:
> > [...]
> > > From f4d085a880dfae7638b33c242554efb0afc0852b Mon Sep 17 00:00:00 2001
> > > From: Han Pingtian <[email protected]>
> > > Date: Fri, 3 Jan 2014 11:10:49 +0800
> > > Subject: [PATCH] mm: show message when raising min_free_kbytes in THP
> > >
> > > min_free_kbytes may be raised during THP's initialization. Sometimes,
> > > this will change the value being set by user. Showing message will
> > > clarify this confusion.
> >
> > I do not have anything against informing about changing value
> > set by user but this will inform also when the default value is
> > updated. Is this what you want? Don't you want to check against
> > user_min_free_kbytes? (0 if not set by user)
> >
>
> To use user_min_free_kbytes in mm/huge_memory.c, we need a
>
> extern int user_min_free_kbytes;

The variable is not defined as static so you can use it outside of
mm/page_alloc.c.

> in somewhere? Where should we put it? I guess it is mm/internal.h,
> right?

I do not think this has to be globaly visible though. Why not just
extern declaration in mm/huge_memory.c?

--
Michal Hocko
SUSE Labs

2014-01-09 07:33:09

by Han Pingtian

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Wed, Jan 08, 2014 at 11:16:11AM +0100, Michal Hocko wrote:
> On Wed 08-01-14 16:20:01, Han Pingtian wrote:
> > On Mon, Jan 06, 2014 at 05:46:04PM +0100, Michal Hocko wrote:
> > > On Sun 05-01-14 08:35:01, Han Pingtian wrote:
> > > [...]
> > > > From f4d085a880dfae7638b33c242554efb0afc0852b Mon Sep 17 00:00:00 2001
> > > > From: Han Pingtian <[email protected]>
> > > > Date: Fri, 3 Jan 2014 11:10:49 +0800
> > > > Subject: [PATCH] mm: show message when raising min_free_kbytes in THP
> > > >
> > > > min_free_kbytes may be raised during THP's initialization. Sometimes,
> > > > this will change the value being set by user. Showing message will
> > > > clarify this confusion.
> > >
> > > I do not have anything against informing about changing value
> > > set by user but this will inform also when the default value is
> > > updated. Is this what you want? Don't you want to check against
> > > user_min_free_kbytes? (0 if not set by user)
> > >
> >
> > To use user_min_free_kbytes in mm/huge_memory.c, we need a
> >
> > extern int user_min_free_kbytes;
>
> The variable is not defined as static so you can use it outside of
> mm/page_alloc.c.
>
> > in somewhere? Where should we put it? I guess it is mm/internal.h,
> > right?
>
> I do not think this has to be globaly visible though. Why not just
> extern declaration in mm/huge_memory.c?
>

This is the new patch, please review. Thanks.


>From b8db4f157a17d6d8652cc9cff024a192c3ee0779 Mon Sep 17 00:00:00 2001
From: Han Pingtian <[email protected]>
Date: Thu, 9 Jan 2014 15:24:26 +0800
Subject: [PATCH] mm: show message when raising min_free_kbytes in THP

min_free_kbytes may be raised during THP's initialization. Sometimes,
this will change the value being set by user. Showing message will
clarify this confusion.

Only show this message when changing the value set by user according to
Michal Hocko's suggestion.

Showing the old value of min_free_kbytes according to Dave Hansen's
suggestion. This will give user the chance to restore old value of
min_free_kbytes.

Signed-off-by: Han Pingtian <[email protected]>
---
mm/huge_memory.c | 9 ++++++++-
mm/page_alloc.c | 2 +-
2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7de1bf8..e0e4e29 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -100,6 +100,7 @@ static struct khugepaged_scan khugepaged_scan = {
.mm_head = LIST_HEAD_INIT(khugepaged_scan.mm_head),
};

+extern int user_min_free_kbytes;

static int set_recommended_min_free_kbytes(void)
{
@@ -130,8 +131,14 @@ static int set_recommended_min_free_kbytes(void)
(unsigned long) nr_free_buffer_pages() / 20);
recommended_min <<= (PAGE_SHIFT-10);

- if (recommended_min > min_free_kbytes)
+ if (recommended_min > min_free_kbytes) {
+ if (user_min_free_kbytes >= 0)
+ pr_info("raising min_free_kbytes from %d to %lu "
+ "to help transparent hugepage allocations\n",
+ min_free_kbytes, recommended_min);
+
min_free_kbytes = recommended_min;
+ }
setup_per_zone_wmarks();
return 0;
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ea62b2..a9dcfd8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -205,7 +205,7 @@ static char * const zone_names[MAX_NR_ZONES] = {
};

int min_free_kbytes = 1024;
-int user_min_free_kbytes;
+int user_min_free_kbytes = -1;

static unsigned long __meminitdata nr_kernel_pages;
static unsigned long __meminitdata nr_all_pages;
--
1.7.7.6

2014-01-09 09:02:25

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Thu 09-01-14 15:32:59, Han Pingtian wrote:
[...]
> From b8db4f157a17d6d8652cc9cff024a192c3ee0779 Mon Sep 17 00:00:00 2001
> From: Han Pingtian <[email protected]>
> Date: Thu, 9 Jan 2014 15:24:26 +0800
> Subject: [PATCH] mm: show message when raising min_free_kbytes in THP
>
> min_free_kbytes may be raised during THP's initialization. Sometimes,
> this will change the value being set by user. Showing message will
> clarify this confusion.
>
> Only show this message when changing the value set by user according to
> Michal Hocko's suggestion.
>
> Showing the old value of min_free_kbytes according to Dave Hansen's
> suggestion. This will give user the chance to restore old value of
> min_free_kbytes.
>
> Signed-off-by: Han Pingtian <[email protected]>

Looks good to me
Reviewed-by: Michal Hocko <[email protected]>

Thanks!

> ---
> mm/huge_memory.c | 9 ++++++++-
> mm/page_alloc.c | 2 +-
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7de1bf8..e0e4e29 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -100,6 +100,7 @@ static struct khugepaged_scan khugepaged_scan = {
> .mm_head = LIST_HEAD_INIT(khugepaged_scan.mm_head),
> };
>
> +extern int user_min_free_kbytes;
>
> static int set_recommended_min_free_kbytes(void)
> {
> @@ -130,8 +131,14 @@ static int set_recommended_min_free_kbytes(void)
> (unsigned long) nr_free_buffer_pages() / 20);
> recommended_min <<= (PAGE_SHIFT-10);
>
> - if (recommended_min > min_free_kbytes)
> + if (recommended_min > min_free_kbytes) {
> + if (user_min_free_kbytes >= 0)
> + pr_info("raising min_free_kbytes from %d to %lu "
> + "to help transparent hugepage allocations\n",
> + min_free_kbytes, recommended_min);
> +
> min_free_kbytes = recommended_min;
> + }
> setup_per_zone_wmarks();
> return 0;
> }
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9ea62b2..a9dcfd8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -205,7 +205,7 @@ static char * const zone_names[MAX_NR_ZONES] = {
> };
>
> int min_free_kbytes = 1024;
> -int user_min_free_kbytes;
> +int user_min_free_kbytes = -1;
>
> static unsigned long __meminitdata nr_kernel_pages;
> static unsigned long __meminitdata nr_all_pages;
> --
> 1.7.7.6
>

--
Michal Hocko
SUSE Labs

2014-01-09 21:16:05

by David Rientjes

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Thu, 9 Jan 2014, Han Pingtian wrote:

> min_free_kbytes may be raised during THP's initialization. Sometimes,
> this will change the value being set by user. Showing message will
> clarify this confusion.
>
> Only show this message when changing the value set by user according to
> Michal Hocko's suggestion.
>
> Showing the old value of min_free_kbytes according to Dave Hansen's
> suggestion. This will give user the chance to restore old value of
> min_free_kbytes.
>
> Signed-off-by: Han Pingtian <[email protected]>
> ---
> mm/huge_memory.c | 9 ++++++++-
> mm/page_alloc.c | 2 +-
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7de1bf8..e0e4e29 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -100,6 +100,7 @@ static struct khugepaged_scan khugepaged_scan = {
> .mm_head = LIST_HEAD_INIT(khugepaged_scan.mm_head),
> };
>
> +extern int user_min_free_kbytes;
>

We don't add extern declarations to .c files. How many other examples of
this can you find in mm/?

> static int set_recommended_min_free_kbytes(void)
> {
> @@ -130,8 +131,14 @@ static int set_recommended_min_free_kbytes(void)
> (unsigned long) nr_free_buffer_pages() / 20);
> recommended_min <<= (PAGE_SHIFT-10);
>
> - if (recommended_min > min_free_kbytes)
> + if (recommended_min > min_free_kbytes) {
> + if (user_min_free_kbytes >= 0)
> + pr_info("raising min_free_kbytes from %d to %lu "
> + "to help transparent hugepage allocations\n",
> + min_free_kbytes, recommended_min);
> +
> min_free_kbytes = recommended_min;
> + }
> setup_per_zone_wmarks();
> return 0;
> }

Does this even ever trigger since set_recommended_min_free_kbytes() is
called via late_initcall()?

2014-01-10 08:05:11

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Thu 09-01-14 13:15:54, David Rientjes wrote:
> On Thu, 9 Jan 2014, Han Pingtian wrote:
>
> > min_free_kbytes may be raised during THP's initialization. Sometimes,
> > this will change the value being set by user. Showing message will
> > clarify this confusion.
> >
> > Only show this message when changing the value set by user according to
> > Michal Hocko's suggestion.
> >
> > Showing the old value of min_free_kbytes according to Dave Hansen's
> > suggestion. This will give user the chance to restore old value of
> > min_free_kbytes.
> >
> > Signed-off-by: Han Pingtian <[email protected]>
> > ---
> > mm/huge_memory.c | 9 ++++++++-
> > mm/page_alloc.c | 2 +-
> > 2 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 7de1bf8..e0e4e29 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -100,6 +100,7 @@ static struct khugepaged_scan khugepaged_scan = {
> > .mm_head = LIST_HEAD_INIT(khugepaged_scan.mm_head),
> > };
> >
> > +extern int user_min_free_kbytes;
> >
>
> We don't add extern declarations to .c files. How many other examples of
> this can you find in mm/?

I have suggested this because general visibility is not needed. But if
you think that it should then include/linux/mm.h sounds like a proper
place.

> > static int set_recommended_min_free_kbytes(void)
> > {
> > @@ -130,8 +131,14 @@ static int set_recommended_min_free_kbytes(void)
> > (unsigned long) nr_free_buffer_pages() / 20);
> > recommended_min <<= (PAGE_SHIFT-10);
> >
> > - if (recommended_min > min_free_kbytes)
> > + if (recommended_min > min_free_kbytes) {
> > + if (user_min_free_kbytes >= 0)
> > + pr_info("raising min_free_kbytes from %d to %lu "
> > + "to help transparent hugepage allocations\n",
> > + min_free_kbytes, recommended_min);
> > +
> > min_free_kbytes = recommended_min;
> > + }
> > setup_per_zone_wmarks();
> > return 0;
> > }
>
> Does this even ever trigger since set_recommended_min_free_kbytes() is
> called via late_initcall()?

This is called whenever THP is enabled so it might be called later after
boot. The point is AFAIU to warn user that the admin decision about
min_free_kbytes is overridden.

--
Michal Hocko
SUSE Labs

2014-01-10 08:12:33

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Fri, 10 Jan 2014 09:05:04 +0100 Michal Hocko <[email protected]> wrote:

> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -100,6 +100,7 @@ static struct khugepaged_scan khugepaged_scan = {
> > > .mm_head = LIST_HEAD_INIT(khugepaged_scan.mm_head),
> > > };
> > >
> > > +extern int user_min_free_kbytes;
> > >
> >
> > We don't add extern declarations to .c files. How many other examples of
> > this can you find in mm/?
>
> I have suggested this because general visibility is not needed.

It's best to use a common declaration which is seen by the definition
site and all references, so everyone agrees on the variable's type.
Otherwise we could have "long foo;" in one file and "extern char foo;"
in another and the compiler won't tell us. I think the linker could
tell us, but it doesn't, afaik. Perhaps there's an option...

> But if
> you think that it should then include/linux/mm.h sounds like a proper
> place.

mm/internal.h might suit.

2014-01-10 08:17:47

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Fri 10-01-14 00:13:44, Andrew Morton wrote:
> On Fri, 10 Jan 2014 09:05:04 +0100 Michal Hocko <[email protected]> wrote:
>
> > > > --- a/mm/huge_memory.c
> > > > +++ b/mm/huge_memory.c
> > > > @@ -100,6 +100,7 @@ static struct khugepaged_scan khugepaged_scan = {
> > > > .mm_head = LIST_HEAD_INIT(khugepaged_scan.mm_head),
> > > > };
> > > >
> > > > +extern int user_min_free_kbytes;
> > > >
> > >
> > > We don't add extern declarations to .c files. How many other examples of
> > > this can you find in mm/?
> >
> > I have suggested this because general visibility is not needed.
>
> It's best to use a common declaration which is seen by the definition
> site and all references, so everyone agrees on the variable's type.
> Otherwise we could have "long foo;" in one file and "extern char foo;"
> in another and the compiler won't tell us. I think the linker could
> tell us, but it doesn't, afaik. Perhaps there's an option...
>
> > But if
> > you think that it should then include/linux/mm.h sounds like a proper
> > place.
>
> mm/internal.h might suit.

min_free_kbytes is in mm.h so I thought having them together would be
appropriate.

--
Michal Hocko
SUSE Labs

2014-01-11 03:27:20

by Han Pingtian

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Fri, Jan 10, 2014 at 09:17:44AM +0100, Michal Hocko wrote:
> On Fri 10-01-14 00:13:44, Andrew Morton wrote:
> > On Fri, 10 Jan 2014 09:05:04 +0100 Michal Hocko <[email protected]> wrote:
> >
> > > > > --- a/mm/huge_memory.c
> > > > > +++ b/mm/huge_memory.c
> > > > > @@ -100,6 +100,7 @@ static struct khugepaged_scan khugepaged_scan = {
> > > > > .mm_head = LIST_HEAD_INIT(khugepaged_scan.mm_head),
> > > > > };
> > > > >
> > > > > +extern int user_min_free_kbytes;
> > > > >
> > > >
> > > > We don't add extern declarations to .c files. How many other examples of
> > > > this can you find in mm/?
> > >
> > > I have suggested this because general visibility is not needed.
> >
> > It's best to use a common declaration which is seen by the definition
> > site and all references, so everyone agrees on the variable's type.
> > Otherwise we could have "long foo;" in one file and "extern char foo;"
> > in another and the compiler won't tell us. I think the linker could
> > tell us, but it doesn't, afaik. Perhaps there's an option...
> >
> > > But if
> > > you think that it should then include/linux/mm.h sounds like a proper
> > > place.
> >
> > mm/internal.h might suit.
>
> min_free_kbytes is in mm.h so I thought having them together would be
> appropriate.
>

At present, we only use user_min_free_kbytes in memory subsystem. So I
think mm/internal.h is suit.

Thanks.

2014-01-14 20:07:31

by Han Pingtian

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Fri, Jan 10, 2014 at 09:17:44AM +0100, Michal Hocko wrote:
> On Fri 10-01-14 00:13:44, Andrew Morton wrote:
> > On Fri, 10 Jan 2014 09:05:04 +0100 Michal Hocko <[email protected]> wrote:
> >
> > > > > --- a/mm/huge_memory.c
> > > > > +++ b/mm/huge_memory.c
> > > > > @@ -100,6 +100,7 @@ static struct khugepaged_scan khugepaged_scan = {
> > > > > .mm_head = LIST_HEAD_INIT(khugepaged_scan.mm_head),
> > > > > };
> > > > >
> > > > > +extern int user_min_free_kbytes;
> > > > >
> > > >
> > > > We don't add extern declarations to .c files. How many other examples of
> > > > this can you find in mm/?
> > >
> > > I have suggested this because general visibility is not needed.
> >
> > It's best to use a common declaration which is seen by the definition
> > site and all references, so everyone agrees on the variable's type.
> > Otherwise we could have "long foo;" in one file and "extern char foo;"
> > in another and the compiler won't tell us. I think the linker could
> > tell us, but it doesn't, afaik. Perhaps there's an option...
> >
> > > But if
> > > you think that it should then include/linux/mm.h sounds like a proper
> > > place.
> >
> > mm/internal.h might suit.
>
> min_free_kbytes is in mm.h so I thought having them together would be
> appropriate.
>

This is the latest version which put user_min_free_kbytes in
mm/internal.h. Please have a look. Thanks.


>From 0d2583bea1f8ffa919e2cee3ee8ed08ec547284a Mon Sep 17 00:00:00 2001
From: Han Pingtian <[email protected]>
Date: Thu, 9 Jan 2014 15:24:26 +0800
Subject: [PATCH] mm: show message when raising min_free_kbytes in THP

min_free_kbytes may be raised during THP's initialization. Sometimes,
this will change the value being set by user. Showing message will
clarify this confusion.

Only show this message when changing the value set by user according to
Michal Hocko's suggestion.

Showing the old value of min_free_kbytes according to Dave Hansen's
suggestion. This will give user the chance to restore old value of
min_free_kbytes.

Signed-off-by: Han Pingtian <[email protected]>
---
mm/huge_memory.c | 8 +++++++-
mm/internal.h | 1 +
mm/page_alloc.c | 2 +-
3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7de1bf8..2ca526b8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -130,8 +130,14 @@ static int set_recommended_min_free_kbytes(void)
(unsigned long) nr_free_buffer_pages() / 20);
recommended_min <<= (PAGE_SHIFT-10);

- if (recommended_min > min_free_kbytes)
+ if (recommended_min > min_free_kbytes) {
+ if (user_min_free_kbytes >= 0)
+ pr_info("raising min_free_kbytes from %d to %lu "
+ "to help transparent hugepage allocations\n",
+ min_free_kbytes, recommended_min);
+
min_free_kbytes = recommended_min;
+ }
setup_per_zone_wmarks();
return 0;
}
diff --git a/mm/internal.h b/mm/internal.h
index 684f7aa..110d8da 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -101,6 +101,7 @@ extern void prep_compound_page(struct page *page, unsigned long order);
#ifdef CONFIG_MEMORY_FAILURE
extern bool is_free_buddy_page(struct page *page);
#endif
+extern int user_min_free_kbytes;

#if defined CONFIG_COMPACTION || defined CONFIG_CMA

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ea62b2..a9dcfd8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -205,7 +205,7 @@ static char * const zone_names[MAX_NR_ZONES] = {
};

int min_free_kbytes = 1024;
-int user_min_free_kbytes;
+int user_min_free_kbytes = -1;

static unsigned long __meminitdata nr_kernel_pages;
static unsigned long __meminitdata nr_all_pages;
--
1.7.7.6

2014-01-14 23:52:46

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Wed, 15 Jan 2014 04:07:20 +0800 Han Pingtian <[email protected]> wrote:

> min_free_kbytes may be raised during THP's initialization. Sometimes,
> this will change the value being set by user. Showing message will
> clarify this confusion.
>
> Only show this message when changing the value set by user according to
> Michal Hocko's suggestion.
>
> Showing the old value of min_free_kbytes according to Dave Hansen's
> suggestion. This will give user the chance to restore old value of
> min_free_kbytes.
>

This is all a bit nasty, isn't it? THP goes and alters min_free_kbytes
to improve its own reliability, but min_free_kbytes is also
user-modifiable. And over many years we have trained a *lot* of users
to alter min_free_kbytes. Often to prevent nasty page allocation
failure warnings from net drivers.

So there are probably quite a lot of people out there who are manually
rubbing out THP's efforts. And there may also be people who are
setting min_free_kbytes to a value which is unnecessarily high for more
recent kernels.

I don't know what to do about this mess though :(

> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -130,8 +130,14 @@ static int set_recommended_min_free_kbytes(void)
> (unsigned long) nr_free_buffer_pages() / 20);
> recommended_min <<= (PAGE_SHIFT-10);
>
> - if (recommended_min > min_free_kbytes)
> + if (recommended_min > min_free_kbytes) {
> + if (user_min_free_kbytes >= 0)
> + pr_info("raising min_free_kbytes from %d to %lu "
> + "to help transparent hugepage allocations\n",
> + min_free_kbytes, recommended_min);

hm, recommended_min shouldn't have had long type. Oh well, we've done
worse things.

> min_free_kbytes = recommended_min;
> + }
> setup_per_zone_wmarks();
> return 0;
> }

2014-01-15 00:25:17

by David Rientjes

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Tue, 14 Jan 2014, Andrew Morton wrote:

> This is all a bit nasty, isn't it? THP goes and alters min_free_kbytes
> to improve its own reliability, but min_free_kbytes is also
> user-modifiable. And over many years we have trained a *lot* of users
> to alter min_free_kbytes. Often to prevent nasty page allocation
> failure warnings from net drivers.
>

I can vouch for kernel logs that are spammed with tons of net page
allocation failure warnings, in fact we're going through and adding
__GFP_NOWARN to most of these.

> So there are probably quite a lot of people out there who are manually
> rubbing out THP's efforts. And there may also be people who are
> setting min_free_kbytes to a value which is unnecessarily high for more
> recent kernels.
>

Indeed, we have initscripts that modified min_free_kbytes before thp was
introduced but luckily they were comparing their newly computed value to
the existing value and only writing if the new value is greater.
Hopefully most users are doing the same thing.

Would it be overkill to save the kernel default both with and without thp
and then doing a WARN_ON_ONCE() if a user-written value is ever less?

2014-01-15 00:35:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Tue, 14 Jan 2014 16:25:10 -0800 (PST) David Rientjes <[email protected]> wrote:

> On Tue, 14 Jan 2014, Andrew Morton wrote:
>
> > This is all a bit nasty, isn't it? THP goes and alters min_free_kbytes
> > to improve its own reliability, but min_free_kbytes is also
> > user-modifiable. And over many years we have trained a *lot* of users
> > to alter min_free_kbytes. Often to prevent nasty page allocation
> > failure warnings from net drivers.
> >
>
> I can vouch for kernel logs that are spammed with tons of net page
> allocation failure warnings, in fact we're going through and adding
> __GFP_NOWARN to most of these.
>
> > So there are probably quite a lot of people out there who are manually
> > rubbing out THP's efforts. And there may also be people who are
> > setting min_free_kbytes to a value which is unnecessarily high for more
> > recent kernels.
> >
>
> Indeed, we have initscripts that modified min_free_kbytes before thp was
> introduced but luckily they were comparing their newly computed value to
> the existing value and only writing if the new value is greater.
> Hopefully most users are doing the same thing.

I've been waiting 10+ years for us to decide to delete that warning due
to the false positives. Hasn't happened yet, and the warning does
find bugs/issues/misconfigurations/etc.

But I do worry this has led to users unnecessarily increasing
min_free_kbytes just to shut the warnings up. Net result: they have
less memory available for cache, etc.

> Would it be overkill to save the kernel default both with and without thp
> and then doing a WARN_ON_ONCE() if a user-written value is ever less?

Well, min_free_kbytes is a userspace thing, not a kernel thing - maybe
THP shouldn't be dinking with it. What effect is THP trying to achieve
and can we achieve it by other/better means?

2014-01-15 00:53:03

by David Rientjes

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Tue, 14 Jan 2014, Andrew Morton wrote:

> I've been waiting 10+ years for us to decide to delete that warning due
> to the false positives. Hasn't happened yet, and the warning does
> find bugs/issues/misconfigurations/etc.
>

I've found memory leaks from the meminfo that is emitted as part of page
allocation failure warnings, that seems to be the only helpful part.
Unfortunately, they typically emit ~80 lines to the kernel log and become
quite verbose in succession. If you have a lot of nodes, it just becomes
longer.

I think we want to consider alternative values for the ratelimiter, in
this case nopage_rs that Dave added. Dave?

> > Would it be overkill to save the kernel default both with and without thp
> > and then doing a WARN_ON_ONCE() if a user-written value is ever less?
>
> Well, min_free_kbytes is a userspace thing, not a kernel thing - maybe
> THP shouldn't be dinking with it. What effect is THP trying to achieve
> and can we achieve it by other/better means?
>

It moved the preferred "hugeadm --set-recommended-min_free_kbytes"
behavior into the kernel that gave better results (due to lower occurrence
of fragmentation) for thp hosts. Previously, people were using hugeadm
in initscripts and then it became the default kernel logic when thp was
originally merged. I think it's primarily targeted to adjust the high
watermark so we could probably get the same behavior by special casing thp
with some scalar to the watermarks, but changing min_free_kbytes was
probably the easiest way to do it.

2014-01-15 15:22:24

by Mel Gorman

[permalink] [raw]
Subject: Re: [RFC] mm: show message when updating min_free_kbytes in thp

On Tue, Jan 14, 2014 at 04:35:33PM -0800, Andrew Morton wrote:
> > Would it be overkill to save the kernel default both with and without thp
> > and then doing a WARN_ON_ONCE() if a user-written value is ever less?
>
> Well, min_free_kbytes is a userspace thing, not a kernel thing - maybe
> THP shouldn't be dinking with it. What effect is THP trying to achieve
> and can we achieve it by other/better means?

It moved logic from hugeadm where few people knew about it to the
kernel. The value is related to anti-fragmentation. With the recommended
setting the probability of mixing pages of different mobility within a
single pageblock is reduced. Very very superficially, it reduces the
number of instances the mm_page_alloc_extfrag tracepoint is triggered
with parameters that are considered to be severely fragmenting.

--
Mel Gorman
SUSE Labs