2011-06-20 16:35:17

by Cong Wang

[permalink] [raw]
Subject: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

transparent_hugepage=never should mean to disable THP completely,
otherwise we don't have a way to disable THP completely.
The design is broken.

Signed-off-by: WANG Cong <[email protected]>
---
mm/huge_memory.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 81532f2..9c63c90 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -488,19 +488,26 @@ static struct attribute_group khugepaged_attr_group = {
};
#endif /* CONFIG_SYSFS */

+#define hugepage_enabled() khugepaged_enabled()
+
static int __init hugepage_init(void)
{
- int err;
+ int err = 0;
#ifdef CONFIG_SYSFS
static struct kobject *hugepage_kobj;
#endif

- err = -EINVAL;
if (!has_transparent_hugepage()) {
+ err = -EINVAL;
transparent_hugepage_flags = 0;
goto out;
}

+ if (!hugepage_enabled()) {
+ printk(KERN_INFO "hugepage: totally disabled\n");
+ goto out;
+ }
+
#ifdef CONFIG_SYSFS
err = -ENOMEM;
hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
--
1.7.4.4


2011-06-20 16:35:53

by Cong Wang

[permalink] [raw]
Subject: [PATCH 2/3] mm: make the threshold of enabling THP configurable

Don't hard-code 512M as the threshold in kernel, make it configruable,
and set 512M by default.

Signed-off-by: WANG Cong <[email protected]>
---
mm/Kconfig | 10 ++++++++++
mm/huge_memory.c | 2 +-
2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 8ca47a5..a826471 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -340,6 +340,16 @@ choice
benefit.
endchoice

+config TRANSPARENT_HUGEPAGE_THRESHOLD
+ depends on TRANSPARENT_HUGEPAGE
+ int "The minimal threshold of enabling Transparent Hugepage"
+ range 512 8192
+ default "512"
+ help
+ The threshold of enabling Transparent Huagepage automatically,
+ in Mbytes, below this value, Transparent Hugepage will be disabled
+ by default during boot.
+
#
# UP and nommu archs use km based percpu allocator
#
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9c63c90..7fb44cc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -544,7 +544,7 @@ static int __init hugepage_init(void)
* where the extra memory used could hurt more than TLB overhead
* is likely to save. The admin can still enable it through /sys.
*/
- if (totalram_pages < (512 << (20 - PAGE_SHIFT)))
+ if (totalram_pages < (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD << (20 - PAGE_SHIFT)))
transparent_hugepage_flags = 0;

start_khugepaged();
--
1.7.4.4

2011-06-20 16:35:35

by Cong Wang

[permalink] [raw]
Subject: [PATCH 3/3] mm: print information when THP is disabled automatically

Print information when THP is disabled automatically so that
users can find this info in dmesg.

Signed-off-by: WANG Cong <[email protected]>
---
mm/huge_memory.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7fb44cc..07679da 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -544,8 +544,11 @@ static int __init hugepage_init(void)
* where the extra memory used could hurt more than TLB overhead
* is likely to save. The admin can still enable it through /sys.
*/
- if (totalram_pages < (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD << (20 - PAGE_SHIFT)))
+ if (totalram_pages < (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD
+ << (20 - PAGE_SHIFT))) {
+ printk(KERN_INFO "hugepage: disabled auotmatically\n");
transparent_hugepage_flags = 0;
+ }

start_khugepaged();

--
1.7.4.4

2011-06-20 16:51:14

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On Tue, Jun 21, 2011 at 12:34:28AM +0800, Amerigo Wang wrote:
> transparent_hugepage=never should mean to disable THP completely,
> otherwise we don't have a way to disable THP completely.
> The design is broken.

We want to allow people to boot with transparent_hugepage=never but to
still allow people to enable it later at runtime. Not sure why you
find it broken... Your patch is just crippling down the feature with
no gain. There is absolutely no gain to disallow root to enable THP
later at runtime with sysfs, root can enable it anyway by writing into
/dev/mem.

Unless you're root and you enable it, it's completely disabled, so I
don't see what you mean it's not completely disabled. Not even
khugepaged is started, try to grep of khugepaged... (that wouldn't be
the same with ksm where ksm daemon runs even when it's off for no
gain, but I explicitly solved the locking so khugepaged will go away
when enabled=never and return when enabled=always).

>
> Signed-off-by: WANG Cong <[email protected]>
> ---
> mm/huge_memory.c | 11 +++++++++--
> 1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 81532f2..9c63c90 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -488,19 +488,26 @@ static struct attribute_group khugepaged_attr_group = {
> };
> #endif /* CONFIG_SYSFS */
>
> +#define hugepage_enabled() khugepaged_enabled()
> +
> static int __init hugepage_init(void)
> {
> - int err;
> + int err = 0;
> #ifdef CONFIG_SYSFS
> static struct kobject *hugepage_kobj;
> #endif
>
> - err = -EINVAL;
> if (!has_transparent_hugepage()) {
> + err = -EINVAL;
> transparent_hugepage_flags = 0;
> goto out;

Original error setting was better IMHO.

> }
>
> + if (!hugepage_enabled()) {
> + printk(KERN_INFO "hugepage: totally disabled\n");
> + goto out;
> + }
> +
> #ifdef CONFIG_SYSFS
> err = -ENOMEM;
> hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);

Changing the initialization to "never" at boot, doesn't mean we must
never allow it to be enabled again during the runtime of the kernel
(by root with sysfs, which is certainly less error prone than doing
that with /dev/mem), and there is no gain in preventing that.

2011-06-20 16:55:04

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: print information when THP is disabled automatically

On Tue, Jun 21, 2011 at 12:34:30AM +0800, Amerigo Wang wrote:
> Print information when THP is disabled automatically so that
> users can find this info in dmesg.
>
> Signed-off-by: WANG Cong <[email protected]>
> ---
> mm/huge_memory.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7fb44cc..07679da 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -544,8 +544,11 @@ static int __init hugepage_init(void)
> * where the extra memory used could hurt more than TLB overhead
> * is likely to save. The admin can still enable it through /sys.
> */
> - if (totalram_pages < (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD << (20 - PAGE_SHIFT)))
> + if (totalram_pages < (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD
> + << (20 - PAGE_SHIFT))) {
> + printk(KERN_INFO "hugepage: disabled auotmatically\n");

typo automatically. I'd suggest to change the prefix from "hugepage:"
to "THP:" to avoid the risk of possible confusion with hugetlbfs
support. Maybe you could print the minimal threshold too ("disabled
automatically with less than %dMB of RAM").

2011-06-20 16:56:28

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On 06/20/2011 12:50 PM, Andrea Arcangeli wrote:
> On Tue, Jun 21, 2011 at 12:34:28AM +0800, Amerigo Wang wrote:
>> transparent_hugepage=never should mean to disable THP completely,
>> otherwise we don't have a way to disable THP completely.
>> The design is broken.
>
> We want to allow people to boot with transparent_hugepage=never but to
> still allow people to enable it later at runtime. Not sure why you
> find it broken... Your patch is just crippling down the feature with
> no gain. There is absolutely no gain to disallow root to enable THP
> later at runtime with sysfs, root can enable it anyway by writing into
> /dev/mem.
>
> Unless you're root and you enable it, it's completely disabled, so I
> don't see what you mean it's not completely disabled. Not even
> khugepaged is started, try to grep of khugepaged...

Agreed, I don't really see the reason for these patches.

Amerigo?

2011-06-20 16:58:55

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On Tue, Jun 21, 2011 at 12:34:28AM +0800, Amerigo Wang wrote:
> transparent_hugepage=never should mean to disable THP completely,
> otherwise we don't have a way to disable THP completely.
> The design is broken.
>

I don't get why it's broken. Why would the user be prevented from
enabling it at runtime?

--
Mel Gorman
SUSE Labs

2011-06-20 16:59:53

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: make the threshold of enabling THP configurable

On Tue, 2011-06-21 at 00:34 +0800, Amerigo Wang wrote:
> +config TRANSPARENT_HUGEPAGE_THRESHOLD
> + depends on TRANSPARENT_HUGEPAGE
> + int "The minimal threshold of enabling Transparent Hugepage"
> + range 512 8192
> + default "512"
> + help
> + The threshold of enabling Transparent Huagepage automatically,
> + in Mbytes, below this value, Transparent Hugepage will be disabled
> + by default during boot.

It makes some sense to me that there would _be_ a threshold, simply
because you need some space to defragment things. But, I can't imagine
any kind of user having *ANY* kind of idea what to set this to. Could
we add some text to this? Maybe:

Transparent hugepages are created by moving other pages out of
the way to create large, contiguous swaths of free memory.
However, some memory on a system can not be easily moved. It is
likely on small systems that this unmovable memory will occupy a
large portion of total memory, which makes even attempting to
create transparent hugepages very expensive.

If you are unsure, set this to the smallest possible value.

To override this at boot, use the $FOO boot command-line option.

I'm also not sure putting a ceiling on this makes a lot of sense.
What's the logic behind that? I know it would be a mess to expose it to
users, but shouldn't this be a per-zone limit, logically? Seems like a
8GB system would have similar issues to a two-numa-node 16GB system.

-- Dave

2011-06-20 17:00:19

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: make the threshold of enabling THP configurable

On Tue, Jun 21, 2011 at 12:34:29AM +0800, Amerigo Wang wrote:
> Don't hard-code 512M as the threshold in kernel, make it configruable,
> and set 512M by default.
>

I'm not seeing the gain here either. This is something that is going to
be set by distributions and probably never by users. If the default of
512 is incorrect, what should it be? Also, the Kconfig help message has
spelling errors.

--
Mel Gorman
SUSE Labs

2011-06-20 17:01:15

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: print information when THP is disabled automatically

On Tue, Jun 21, 2011 at 12:34:30AM +0800, Amerigo Wang wrote:
> Print information when THP is disabled automatically so that
> users can find this info in dmesg.
>
> Signed-off-by: WANG Cong <[email protected]>
> ---
> mm/huge_memory.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7fb44cc..07679da 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -544,8 +544,11 @@ static int __init hugepage_init(void)
> * where the extra memory used could hurt more than TLB overhead
> * is likely to save. The admin can still enable it through /sys.
> */
> - if (totalram_pages < (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD << (20 - PAGE_SHIFT)))
> + if (totalram_pages < (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD
> + << (20 - PAGE_SHIFT))) {
> + printk(KERN_INFO "hugepage: disabled auotmatically\n");
> transparent_hugepage_flags = 0;
> + }
>
> start_khugepaged();
>

Guess this doesn't hurt. You misspelled automatically though and
mentioning "hugepage" could be confused with hugetlbfs.

--
Mel Gorman
SUSE Labs

2011-06-20 17:01:54

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

于 2011年06月21日 00:50, Andrea Arcangeli 写道:
> On Tue, Jun 21, 2011 at 12:34:28AM +0800, Amerigo Wang wrote:
>> transparent_hugepage=never should mean to disable THP completely,
>> otherwise we don't have a way to disable THP completely.
>> The design is broken.
>
> We want to allow people to boot with transparent_hugepage=never but to
> still allow people to enable it later at runtime. Not sure why you
> find it broken... Your patch is just crippling down the feature with
> no gain. There is absolutely no gain to disallow root to enable THP
> later at runtime with sysfs, root can enable it anyway by writing into
> /dev/mem.


What can I do if I don't want to see THP at all? I mean the same
behavior as when my CPU doesn't have PSE.

With this patch, there is no even /sys/kernel/vm/transparent_hugepage/
exists.

>
> Unless you're root and you enable it, it's completely disabled, so I
> don't see what you mean it's not completely disabled. Not even
> khugepaged is started, try to grep of khugepaged... (that wouldn't be
> the same with ksm where ksm daemon runs even when it's off for no
> gain, but I explicitly solved the locking so khugepaged will go away
> when enabled=never and return when enabled=always).

Without this patch, THP is still initialized (although khugepaged is not started),
that is what I don't want to see when I pass "transparent_hugepage=never",
because "never" for me means THP is totally unseen, even not initialized.

Thanks.

2011-06-20 17:07:22

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

于 2011年06月21日 00:58, Mel Gorman 写道:
> On Tue, Jun 21, 2011 at 12:34:28AM +0800, Amerigo Wang wrote:
>> transparent_hugepage=never should mean to disable THP completely,
>> otherwise we don't have a way to disable THP completely.
>> The design is broken.
>>
>
> I don't get why it's broken. Why would the user be prevented from
> enabling it at runtime?
>

We need to a way to totally disable it, right? Otherwise, when I configure
THP in .config, I always have THP initialized even when I pass "=never".

For me, if you don't provide such way to disable it, it is not flexible.

I meet this problem when I try to disable THP in kdump kernel, there is
no user of THP in kdump kernel, THP is a waste for kdump kernel. This is
why I need to find a way to totally disable it.

Thanks.

2011-06-20 17:10:45

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On 06/20/2011 01:07 PM, Cong Wang wrote:
> 于 2011年06月21日 00:58, Mel Gorman 写道:
>> On Tue, Jun 21, 2011 at 12:34:28AM +0800, Amerigo Wang wrote:
>>> transparent_hugepage=never should mean to disable THP completely,
>>> otherwise we don't have a way to disable THP completely.
>>> The design is broken.
>>>
>>
>> I don't get why it's broken. Why would the user be prevented from
>> enabling it at runtime?
>>
>
> We need to a way to totally disable it, right? Otherwise, when I configure
> THP in .config, I always have THP initialized even when I pass "=never".
>
> For me, if you don't provide such way to disable it, it is not flexible.
>
> I meet this problem when I try to disable THP in kdump kernel, there is
> no user of THP in kdump kernel, THP is a waste for kdump kernel. This is
> why I need to find a way to totally disable it.

What you have not explained yet is why having THP
halfway initialized (but not used, and without a
khugepaged thread) is a problem at all.

Why is it a problem for you?

2011-06-20 17:17:10

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: make the threshold of enabling THP configurable

于 2011年06月21日 00:59, Mel Gorman 写道:
> On Tue, Jun 21, 2011 at 12:34:29AM +0800, Amerigo Wang wrote:
>> Don't hard-code 512M as the threshold in kernel, make it configruable,
>> and set 512M by default.
>>
>
> I'm not seeing the gain here either. This is something that is going to
> be set by distributions and probably never by users. If the default of
> 512 is incorrect, what should it be? Also, the Kconfig help message has
> spelling errors.
>

Sorry for spelling errors, I am not an English speaker.

Hard-coding is almost never a good thing in kernel, enforcing 512
is not good either. Since the default is still 512, I don't think this
will affect much users.

I do agree to improve the help message, like Dave mentioned in his reply,
but I don't like enforcing a hard-coded number in kernel.

BTW, why do you think 512 is suitable for *all* users?

Thanks.

2011-06-20 17:19:45

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

于 2011年06月21日 01:10, Rik van Riel 写道:
> On 06/20/2011 01:07 PM, Cong Wang wrote:
>> 于 2011年06月21日 00:58, Mel Gorman 写道:
>>> On Tue, Jun 21, 2011 at 12:34:28AM +0800, Amerigo Wang wrote:
>>>> transparent_hugepage=never should mean to disable THP completely,
>>>> otherwise we don't have a way to disable THP completely.
>>>> The design is broken.
>>>>
>>>
>>> I don't get why it's broken. Why would the user be prevented from
>>> enabling it at runtime?
>>>
>>
>> We need to a way to totally disable it, right? Otherwise, when I configure
>> THP in .config, I always have THP initialized even when I pass "=never".
>>
>> For me, if you don't provide such way to disable it, it is not flexible.
>>
>> I meet this problem when I try to disable THP in kdump kernel, there is
>> no user of THP in kdump kernel, THP is a waste for kdump kernel. This is
>> why I need to find a way to totally disable it.
>
> What you have not explained yet is why having THP
> halfway initialized (but not used, and without a
> khugepaged thread) is a problem at all.
>
> Why is it a problem for you?

It occupies some memory, memory is valuable in kdump kernel (usually
only 128M). :) Since I am sure no one will use it, why do I still need
to initialize it at all?

Thanks.

2011-06-20 17:23:37

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: make the threshold of enabling THP configurable

于 2011年06月21日 00:59, Dave Hansen 写道:
> On Tue, 2011-06-21 at 00:34 +0800, Amerigo Wang wrote:
>> +config TRANSPARENT_HUGEPAGE_THRESHOLD
>> + depends on TRANSPARENT_HUGEPAGE
>> + int "The minimal threshold of enabling Transparent Hugepage"
>> + range 512 8192
>> + default "512"
>> + help
>> + The threshold of enabling Transparent Huagepage automatically,
>> + in Mbytes, below this value, Transparent Hugepage will be disabled
>> + by default during boot.
>
> It makes some sense to me that there would _be_ a threshold, simply
> because you need some space to defragment things. But, I can't imagine
> any kind of user having *ANY* kind of idea what to set this to. Could
> we add some text to this? Maybe:
>
> Transparent hugepages are created by moving other pages out of
> the way to create large, contiguous swaths of free memory.
> However, some memory on a system can not be easily moved. It is
> likely on small systems that this unmovable memory will occupy a
> large portion of total memory, which makes even attempting to
> create transparent hugepages very expensive.
>
> If you are unsure, set this to the smallest possible value.
>
> To override this at boot, use the $FOO boot command-line option.
>

Yeah, I totally agree to improve the help message as you said,
please forgive a non-English speaker. ;)

> I'm also not sure putting a ceiling on this makes a lot of sense.
> What's the logic behind that? I know it would be a mess to expose it to
> users, but shouldn't this be a per-zone limit, logically? Seems like a
> 8GB system would have similar issues to a two-numa-node 16GB system.
>

I am not sure about this, since I am new to THP, I just replaced
the hard-code 512 with a Kconfig var. But I am certainly open
to improve this as you said if Andrea agrees.

Thanks.

2011-06-20 17:25:16

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: print information when THP is disabled automatically

于 2011年06月21日 00:54, Andrea Arcangeli 写道:
> On Tue, Jun 21, 2011 at 12:34:30AM +0800, Amerigo Wang wrote:
>> Print information when THP is disabled automatically so that
>> users can find this info in dmesg.
>>
>> Signed-off-by: WANG Cong<[email protected]>
>> ---
>> mm/huge_memory.c | 5 ++++-
>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 7fb44cc..07679da 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -544,8 +544,11 @@ static int __init hugepage_init(void)
>> * where the extra memory used could hurt more than TLB overhead
>> * is likely to save. The admin can still enable it through /sys.
>> */
>> - if (totalram_pages< (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD<< (20 - PAGE_SHIFT)))
>> + if (totalram_pages< (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD
>> + << (20 - PAGE_SHIFT))) {
>> + printk(KERN_INFO "hugepage: disabled auotmatically\n");
>
> typo automatically. I'd suggest to change the prefix from "hugepage:"
> to "THP:" to avoid the risk of possible confusion with hugetlbfs
> support. Maybe you could print the minimal threshold too ("disabled
> automatically with less than %dMB of RAM").

Well, the "hugepage:" prefix is copied from other printk messages
in the same function. ;-)

Yeah, it would be nice to print the threshold too.

Thanks for your reply.

2011-06-20 17:27:44

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: print information when THP is disabled automatically

于 2011年06月21日 01:01, Mel Gorman 写道:
> On Tue, Jun 21, 2011 at 12:34:30AM +0800, Amerigo Wang wrote:
>> Print information when THP is disabled automatically so that
>> users can find this info in dmesg.
>>
>> Signed-off-by: WANG Cong<[email protected]>
>> ---
>> mm/huge_memory.c | 5 ++++-
>> 1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 7fb44cc..07679da 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -544,8 +544,11 @@ static int __init hugepage_init(void)
>> * where the extra memory used could hurt more than TLB overhead
>> * is likely to save. The admin can still enable it through /sys.
>> */
>> - if (totalram_pages< (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD<< (20 - PAGE_SHIFT)))
>> + if (totalram_pages< (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD
>> + << (20 - PAGE_SHIFT))) {
>> + printk(KERN_INFO "hugepage: disabled auotmatically\n");
>> transparent_hugepage_flags = 0;
>> + }
>>
>> start_khugepaged();
>>
>
> Guess this doesn't hurt. You misspelled automatically though and
> mentioning "hugepage" could be confused with hugetlbfs.
>

Yeah, sorry for the typo.

But, there are many printk messages in the same file start with "hugepage:".
:-) I can send a patch to replace all of them with "THP" if you want...

Thanks!

2011-06-20 17:28:45

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On 06/20/2011 01:19 PM, Cong Wang wrote:
> 于 2011年06月21日 01:10, Rik van Riel 写道:
>> On 06/20/2011 01:07 PM, Cong Wang wrote:
>>> 于 2011年06月21日 00:58, Mel Gorman 写道:
>>>> On Tue, Jun 21, 2011 at 12:34:28AM +0800, Amerigo Wang wrote:
>>>>> transparent_hugepage=never should mean to disable THP completely,
>>>>> otherwise we don't have a way to disable THP completely.
>>>>> The design is broken.
>>>>>
>>>>
>>>> I don't get why it's broken. Why would the user be prevented from
>>>> enabling it at runtime?
>>>>
>>>
>>> We need to a way to totally disable it, right? Otherwise, when I
>>> configure
>>> THP in .config, I always have THP initialized even when I pass "=never".
>>>
>>> For me, if you don't provide such way to disable it, it is not flexible.
>>>
>>> I meet this problem when I try to disable THP in kdump kernel, there is
>>> no user of THP in kdump kernel, THP is a waste for kdump kernel. This is
>>> why I need to find a way to totally disable it.
>>
>> What you have not explained yet is why having THP
>> halfway initialized (but not used, and without a
>> khugepaged thread) is a problem at all.
>>
>> Why is it a problem for you?
>
> It occupies some memory, memory is valuable in kdump kernel (usually
> only 128M). :) Since I am sure no one will use it, why do I still need
> to initialize it at all?

Lets take a look at how much memory your patches end
up saving.

By bailing out earlier in hugepage_init, you end up
saving 3 sysfs objects, one slab cache and a hash
table with 1024 pointers. That's a total of maybe
10kB of memory on a 64 bit system.

I'm not convinced that a 10kB memory reduction is
worth the price of never being able to enable
transparent hugepages when a system is booted with
THP disabled...

2011-06-20 17:35:29

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

于 2011年06月21日 01:28, Rik van Riel 写道:
> On 06/20/2011 01:19 PM, Cong Wang wrote:
>> 于 2011年06月21日 01:10, Rik van Riel 写道:
>>> On 06/20/2011 01:07 PM, Cong Wang wrote:
>>>> 于 2011年06月21日 00:58, Mel Gorman 写道:
>>>>> On Tue, Jun 21, 2011 at 12:34:28AM +0800, Amerigo Wang wrote:
>>>>>> transparent_hugepage=never should mean to disable THP completely,
>>>>>> otherwise we don't have a way to disable THP completely.
>>>>>> The design is broken.
>>>>>>
>>>>>
>>>>> I don't get why it's broken. Why would the user be prevented from
>>>>> enabling it at runtime?
>>>>>
>>>>
>>>> We need to a way to totally disable it, right? Otherwise, when I
>>>> configure
>>>> THP in .config, I always have THP initialized even when I pass "=never".
>>>>
>>>> For me, if you don't provide such way to disable it, it is not flexible.
>>>>
>>>> I meet this problem when I try to disable THP in kdump kernel, there is
>>>> no user of THP in kdump kernel, THP is a waste for kdump kernel. This is
>>>> why I need to find a way to totally disable it.
>>>
>>> What you have not explained yet is why having THP
>>> halfway initialized (but not used, and without a
>>> khugepaged thread) is a problem at all.
>>>
>>> Why is it a problem for you?
>>
>> It occupies some memory, memory is valuable in kdump kernel (usually
>> only 128M). :) Since I am sure no one will use it, why do I still need
>> to initialize it at all?
>
> Lets take a look at how much memory your patches end
> up saving.
>
> By bailing out earlier in hugepage_init, you end up
> saving 3 sysfs objects, one slab cache and a hash
> table with 1024 pointers. That's a total of maybe
> 10kB of memory on a 64 bit system.
>
> I'm not convinced that a 10kB memory reduction is
> worth the price of never being able to enable
> transparent hugepages when a system is booted with
> THP disabled...
>

Even if it is really 10K, why not save it since it doesn't
much effort to make this. ;) Not only memory, but also time,
this could also save a little time to initialize the kernel.

For me, the more serious thing is the logic, there is
no way to totally disable it as long as I have THP in .config
currently. This is why I said the design is broken.

Thanks.

2011-06-20 17:50:37

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On 06/20/2011 01:34 PM, Cong Wang wrote:

> Even if it is really 10K, why not save it since it doesn't
> much effort to make this. ;) Not only memory, but also time,
> this could also save a little time to initialize the kernel.
>
> For me, the more serious thing is the logic, there is
> no way to totally disable it as long as I have THP in .config
> currently. This is why I said the design is broken.

There are many things you cannot totally disable as long
as they are enabled in the .config. Think about things
like swap, or tmpfs - neither of which you are going to
use in the crashdump kernel.

I believe we need to keep the kernel optimized for common
use and convenience.

Crashdump is very much a corner case. Yes, using less
memory in crashdump is worthwhile, but lets face it -
the big memory user there is likely to be the struct page
array, with everything else down in the noise...

2011-06-20 17:59:07

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On Tue, 21 Jun 2011, Cong Wang wrote:

> 于 2011年06月21日 01:28, Rik van Riel 写道:
> >On 06/20/2011 01:19 PM, Cong Wang wrote:
> >>于 2011年06月21日 01:10, Rik van Riel 写道:
> >>>On 06/20/2011 01:07 PM, Cong Wang wrote:
> >>>>于 2011年06月21日 00:58, Mel Gorman 写道:
> >>>>>On Tue, Jun 21, 2011 at 12:34:28AM +0800, Amerigo Wang wrote:
> >>>>>>transparent_hugepage=never should mean to disable THP completely,
> >>>>>>otherwise we don't have a way to disable THP completely.
> >>>>>>The design is broken.
> >>>>>>
> >>>>>
> >>>>>I don't get why it's broken. Why would the user be prevented from
> >>>>>enabling it at runtime?
> >>>>>
> >>>>
> >>>>We need to a way to totally disable it, right? Otherwise, when I
> >>>>configure
> >>>>THP in .config, I always have THP initialized even when I pass "=never".
> >>>>
> >>>>For me, if you don't provide such way to disable it, it is not flexible.
> >>>>
> >>>>I meet this problem when I try to disable THP in kdump kernel, there is
> >>>>no user of THP in kdump kernel, THP is a waste for kdump kernel. This is
> >>>>why I need to find a way to totally disable it.
> >>>
> >>>What you have not explained yet is why having THP
> >>>halfway initialized (but not used, and without a
> >>>khugepaged thread) is a problem at all.
> >>>
> >>>Why is it a problem for you?
> >>
> >>It occupies some memory, memory is valuable in kdump kernel (usually
> >>only 128M). :) Since I am sure no one will use it, why do I still need
> >>to initialize it at all?
> >
> >Lets take a look at how much memory your patches end
> >up saving.
> >
> >By bailing out earlier in hugepage_init, you end up
> >saving 3 sysfs objects, one slab cache and a hash
> >table with 1024 pointers. That's a total of maybe
> >10kB of memory on a 64 bit system.
> >
> >I'm not convinced that a 10kB memory reduction is
> >worth the price of never being able to enable
> >transparent hugepages when a system is booted with
> >THP disabled...
> >
>
> Even if it is really 10K, why not save it since it doesn't
> much effort to make this. ;) Not only memory, but also time,
> this could also save a little time to initialize the kernel.
>
> For me, the more serious thing is the logic, there is
> no way to totally disable it as long as I have THP in .config
> currently. This is why I said the design is broken.
>
> Thanks.
>

If memory is this scarce, why not set CONFIG_TRANSPARENT_HUGEPAGE=n and be done
with it? If the config option is enabled, the admin should be able to turn the
functionality back on if desired. If you really don't _ever_ want THP then
disable the config.

Eric


Attachments:
(No filename) (2.56 kB)
signature.asc (490.00 B)
Digital signature
Download all attachments

2011-06-20 18:00:01

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On Mon, Jun 20, 2011 at 01:28:07PM -0400, Rik van Riel wrote:

[..]
> I'm not convinced that a 10kB memory reduction is
> worth the price of never being able to enable
> transparent hugepages when a system is booted with
> THP disabled...

Agreed.

Thanks
Vivek

2011-06-20 18:26:10

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On Mon, Jun 20, 2011 at 01:50:00PM -0400, Rik van Riel wrote:
> On 06/20/2011 01:34 PM, Cong Wang wrote:
>
> >Even if it is really 10K, why not save it since it doesn't
> >much effort to make this. ;) Not only memory, but also time,
> >this could also save a little time to initialize the kernel.
> >
> >For me, the more serious thing is the logic, there is
> >no way to totally disable it as long as I have THP in .config
> >currently. This is why I said the design is broken.
>
> There are many things you cannot totally disable as long
> as they are enabled in the .config. Think about things
> like swap, or tmpfs - neither of which you are going to
> use in the crashdump kernel.
>
> I believe we need to keep the kernel optimized for common
> use and convenience.
>
> Crashdump is very much a corner case. Yes, using less
> memory in crashdump is worthwhile, but lets face it -
> the big memory user there is likely to be the struct page
> array, with everything else down in the noise...

We are creating struct page array only for memory visible in
second kernel. So in this case struct page array for 128G.

One of big user is per cpu data on large cpu systems. (256 etc).
Even though we boot second kernel with maxcpus=1, it does not
change possible cpu maps and initalizes LAPIC etc and that seem
to be consuming significant amount of memory. (In the order of MBs).
So I see some opprotunity there to save memory. But this 10kB
definitely sounds trivial amount to me.

thanks
Vivek

2011-06-20 19:21:58

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On Mon, Jun 20, 2011 at 02:25:58PM -0400, Vivek Goyal wrote:
> So I see some opprotunity there to save memory. But this 10kB
> definitely sounds trivial amount to me.

Agree with you and Rik. Also I already avoided the big memory waste
(that for example isn't avoided in the ksmd and could be optimized
away without decreasing flexibility of KSM, and ksmd surely runs on
the kdump kernel too...) that is to make khugepaged exit and release
kernel stack when enabled=never (either done by sysfs or at boot with
transparent_hugepage=never) and all other structs associated with a
(temporarily) useless kernel thread.

The khugepaged_slab_init and mm_slot_hash_init() maybe could be
deferred to when khugepaged starts, and be released when it shutdown
but it makes it more tricky/racey. If you really want to optimize
that, without preventing to ever enable THP again despite all .text
was compiled in and ready to run. You will likely save more if you
make ksmd exit when run=0 (which btw is a much more common config than
enabled=never with THP). And slots hashes are allocated by ksm too so
you could optimize those too if you want and allocate them only by the
time ksmd starts.

As long as it'd still possible to enable the feature again as it is
possible now without noticing an altered behavior from userland, I'm
not entirely against optimizing for saving ~8k of ram even if it
increases complexity a bit (more kernel code will increase .text a bit
though, hopefully not 8k more of .text ;).

2011-06-20 19:37:12

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: print information when THP is disabled automatically

Hi Cong,

On Tue, Jun 21, 2011 at 01:26:58AM +0800, Cong Wang wrote:
> But, there are many printk messages in the same file start with "hugepage:".
> :-) I can send a patch to replace all of them with "THP" if you want...

Those are in failure paths practically impossible to trigger unless
the memory layout is misconfigured and the kernel won't succeed
booting so I guess I didn't care much what I wrote in those 3 sorry.
Replacing those with THP surely sounds valid cleanup to avoid
confusion if you add a "THP:" prefix elsewhere.

Thanks,
Andrea

2011-06-20 19:43:26

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On Tue, Jun 21, 2011 at 01:01:17AM +0800, Cong Wang wrote:
> Without this patch, THP is still initialized (although khugepaged is not started),
> that is what I don't want to see when I pass "transparent_hugepage=never",
> because "never" for me means THP is totally unseen, even not initialized.

The ram saving by not registering in sysfs is not worth the loss of
generic functionality. You can try to make the hash and slab
khugepaged allocations more dynamic if you want to microoptimize for
RAM usage, that I wouldn't be against if you find a way to do it
simply and without much complexity (and .text) added. But likely there
are other places to optimize that may introduce less tricks and would
give you a bigger saving than ~8kbytes, it's up to you.

2011-06-21 03:15:19

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

于 2011年06月21日 03:43, Andrea Arcangeli 写道:
> On Tue, Jun 21, 2011 at 01:01:17AM +0800, Cong Wang wrote:
>> Without this patch, THP is still initialized (although khugepaged is not started),
>> that is what I don't want to see when I pass "transparent_hugepage=never",
>> because "never" for me means THP is totally unseen, even not initialized.
>
> The ram saving by not registering in sysfs is not worth the loss of
> generic functionality. You can try to make the hash and slab
> khugepaged allocations more dynamic if you want to microoptimize for
> RAM usage, that I wouldn't be against if you find a way to do it
> simply and without much complexity (and .text) added. But likely there
> are other places to optimize that may introduce less tricks and would
> give you a bigger saving than ~8kbytes, it's up to you.

But the THP functionality is not going to be used.

Yeah, sounds reasonable, I will try to check if I can make it.

Thanks for pointing this out!

2011-06-21 03:28:58

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

于 2011年06月21日 01:50, Rik van Riel 写道:
> On 06/20/2011 01:34 PM, Cong Wang wrote:
>
>> Even if it is really 10K, why not save it since it doesn't
>> much effort to make this. ;) Not only memory, but also time,
>> this could also save a little time to initialize the kernel.
>>
>> For me, the more serious thing is the logic, there is
>> no way to totally disable it as long as I have THP in .config
>> currently. This is why I said the design is broken.
>
> There are many things you cannot totally disable as long
> as they are enabled in the .config. Think about things
> like swap, or tmpfs - neither of which you are going to
> use in the crashdump kernel.

Sure, things like CONFIG_KEXEC can never be disabled
without changing .config too, they are designed like this.

Some features _do_ only mean to be disabled only
by Kconfig, some syscalls are indeed good examples here,
but some features don't. THP is one of them, because features
like this can be tuned dynamically.

>
> I believe we need to keep the kernel optimized for common
> use and convenience.
>
> Crashdump is very much a corner case. Yes, using less
> memory in crashdump is worthwhile, but lets face it -
> the big memory user there is likely to be the struct page
> array, with everything else down in the noise...

For the 128M case, only the struct page's of the 128M is
constructed in the second kernel, which unlikely to be a big user.

Thanks.

2011-06-21 03:36:14

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

于 2011年06月21日 01:58, Eric B Munson 写道:
> If memory is this scarce, why not set CONFIG_TRANSPARENT_HUGEPAGE=n and be done
> with it? If the config option is enabled, the admin should be able to turn the
> functionality back on if desired. If you really don't _ever_ want THP then
> disable the config.
>

Unfortunately, changing .config is not always as easy as you said,
for the kdump case, we use the same kernel binary with the normal kernel
which certainly has to have THP enabled in .config.

Thanks.

2011-06-21 04:08:59

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

于 2011年06月21日 03:21, Andrea Arcangeli 写道:
> On Mon, Jun 20, 2011 at 02:25:58PM -0400, Vivek Goyal wrote:
>> So I see some opprotunity there to save memory. But this 10kB
>> definitely sounds trivial amount to me.
>
> Agree with you and Rik. Also I already avoided the big memory waste
> (that for example isn't avoided in the ksmd and could be optimized
> away without decreasing flexibility of KSM, and ksmd surely runs on
> the kdump kernel too...) that is to make khugepaged exit and release
> kernel stack when enabled=never (either done by sysfs or at boot with
> transparent_hugepage=never) and all other structs associated with a
> (temporarily) useless kernel thread.

I agree to disable ksm in kdump kernel, thanks for pointing this out!
I will look into later, and probably send a patch for this too.

>
> The khugepaged_slab_init and mm_slot_hash_init() maybe could be
> deferred to when khugepaged starts, and be released when it shutdown
> but it makes it more tricky/racey. If you really want to optimize
> that, without preventing to ever enable THP again despite all .text
> was compiled in and ready to run. You will likely save more if you
> make ksmd exit when run=0 (which btw is a much more common config than
> enabled=never with THP). And slots hashes are allocated by ksm too so
> you could optimize those too if you want and allocate them only by the
> time ksmd starts.

The thing is that we can save ~10K by adding 3 lines of code as this
patch showed, where else in kernel can you save 10K by 3 lines of code?
(except some kfree() cases, of course) So, again, why not have it? ;)

>
> As long as it'd still possible to enable the feature again as it is
> possible now without noticing an altered behavior from userland, I'm
> not entirely against optimizing for saving ~8k of ram even if it
> increases complexity a bit (more kernel code will increase .text a bit
> though, hopefully not 8k more of .text ;).

Why do we _force_ the feature to be tunable even when user completely
don't want to disable it? Why not provide a way to let the user to decide
which is better for him?

When programming kernel, providing a mechanism rather than a policy is
what I always keep in mind, I don't know why you violate this rule here,
to be honest. :-/

Thanks.

2011-06-21 09:36:48

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: make the threshold of enabling THP configurable

On Tue, Jun 21, 2011 at 01:16:00AM +0800, Cong Wang wrote:
> ??? 2011???06???21??? 00:59, Mel Gorman ??????:
> >On Tue, Jun 21, 2011 at 12:34:29AM +0800, Amerigo Wang wrote:
> >>Don't hard-code 512M as the threshold in kernel, make it configruable,
> >>and set 512M by default.
> >>
> >
> >I'm not seeing the gain here either. This is something that is going to
> >be set by distributions and probably never by users. If the default of
> >512 is incorrect, what should it be? Also, the Kconfig help message has
> >spelling errors.
> >
>
> Sorry for spelling errors, I am not an English speaker.
>
> Hard-coding is almost never a good thing in kernel, enforcing 512
> is not good either. Since the default is still 512, I don't think this
> will affect much users.
>
> I do agree to improve the help message, like Dave mentioned in his reply,
> but I don't like enforcing a hard-coded number in kernel.
>
> BTW, why do you think 512 is suitable for *all* users?
>

Fragmentation avoidance benefits from tuning min_free_kbytes to a higher
value and minimising fragmentation-related problems is crucial if THP is
to allocate its necessary pages.

THP tunes min_free_kbytes automatically and this value is in part
related to the number of zones. At 512M on a single node machine, the
recommended min_free_kbytes is close to 10% of memory which is barely
tolerable as it is. At 256M, it's 17%, at 128M, it's 34% so tuning the
value lower has diminishing returns as the performance impact of giving
up such a high percentage of free memory is not going to be offset by
reduced TLB misses. Tuning it to a higher value might make some sense
if the higher min_free_kbytes was a problem but it would be much more
rational to tune it as a sysctl than making it a compile-time decision.

--
Mel Gorman
SUSE Labs

2011-06-21 09:40:28

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: print information when THP is disabled automatically

On Tue, Jun 21, 2011 at 01:26:58AM +0800, Cong Wang wrote:
> ??? 2011???06???21??? 01:01, Mel Gorman ??????:
> >On Tue, Jun 21, 2011 at 12:34:30AM +0800, Amerigo Wang wrote:
> >>Print information when THP is disabled automatically so that
> >>users can find this info in dmesg.
> >>
> >>Signed-off-by: WANG Cong<[email protected]>
> >>---
> >> mm/huge_memory.c | 5 ++++-
> >> 1 files changed, 4 insertions(+), 1 deletions(-)
> >>
> >>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>index 7fb44cc..07679da 100644
> >>--- a/mm/huge_memory.c
> >>+++ b/mm/huge_memory.c
> >>@@ -544,8 +544,11 @@ static int __init hugepage_init(void)
> >> * where the extra memory used could hurt more than TLB overhead
> >> * is likely to save. The admin can still enable it through /sys.
> >> */
> >>- if (totalram_pages< (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD<< (20 - PAGE_SHIFT)))
> >>+ if (totalram_pages< (CONFIG_TRANSPARENT_HUGEPAGE_THRESHOLD
> >>+ << (20 - PAGE_SHIFT))) {
> >>+ printk(KERN_INFO "hugepage: disabled auotmatically\n");
> >> transparent_hugepage_flags = 0;
> >>+ }
> >>
> >> start_khugepaged();
> >>
> >
> >Guess this doesn't hurt. You misspelled automatically though and
> >mentioning "hugepage" could be confused with hugetlbfs.
> >
>
> Yeah, sorry for the typo.
>
> But, there are many printk messages in the same file start with "hugepage:".
> :-) I can send a patch to replace all of them with "THP" if you want...
>

My bad. They are in a memory allocation failure path that should be
impossible to trigger but I should still have spotted the prefix at
the time and complained. Changing them to THP would be nice.

--
Mel Gorman
SUSE Labs

2011-06-21 14:43:59

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On Tue, Jun 21, 2011 at 12:08:14PM +0800, Cong Wang wrote:
> The thing is that we can save ~10K by adding 3 lines of code as this
> patch showed, where else in kernel can you save 10K by 3 lines of code?
> (except some kfree() cases, of course) So, again, why not have it? ;)

Because you could save it with a more complicated patch that doesn't
cripple down functionality.

Sure you can save a ton more ram with one liner patches, just search
the callers of alloc_large_system_hash and reduce the number of
entries everywhere. Are you using dhash_entries=1 ihash_entries=1?
That alone would save a ton more than ~10k so you should add it to
command line if it isn't there but there are other hashes like these
that don't have dhash_entries parameters. You could add
khugepaged_hash_slots parameter too for example and set it == 1 with a
parameter to avoid crippling down functionality, that wouldn't even
increase complexity. Those kind of approaches that don't cripple down
features, are ok. Remvoing sysfs register is not ok and there's no
need of adding a =0 parameter when you can achieve the memory saving
without totally losing functionality.

I booted with 128m ram and I get 128KB (not ~8KB) allocated in the
dentry hash, 65KB allocated in the inode hash, 65KB in the TCP
established hash, 8KB in the route cache hash, 262KB in the bind hash,
10KB in the UDP hash, you can all reduce those to a few hundred bytes
and it'll still work just fine. So yeah with one liner patches you can
surely achieve more than this ~8KB gain, and with dhash_entries=1
ihash_entries=1 you'll already save hugely more than by booting with
transparent_hugepage=0 that avoids registering in sysfs and cripple
down functionality. If you make the khugepaged slots hash configurable
in size (keeping the current default) with a new param it will
_increase_ functionality as it will also allow to _increase_ its size
on huge systems or in special configurations that may benefit from a
larger hash.

Again if you want to optimize this ~8KB gain, I recommend to add a
param to make the hash size dynamic not to prevent the feature to ever
be enabled again, so by making the code more complex at least it will
also be useful if we want to increase the size hash at boot time (not
only to decrease it).

I guess however you may run into command line stringsize limit if you
add things like dhash_entries=1 for every single hash in the kernel...

2011-06-21 20:02:01

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On 06/21/2011 12:08 AM, Cong Wang wrote:

> The thing is that we can save ~10K by adding 3 lines of code as this
> patch showed, where else in kernel can you save 10K by 3 lines of code?
> (except some kfree() cases, of course) So, again, why not have it? ;)

Because we'll end up with hundreds of lines of code, just
to save under 1MB of memory. Which ends up not being saved
at all, because people will still give their kdump kernel
128MB :)

The only really big gain you are likely to get is making
sure all the per-cpu memory is not allocated in the kdump
kernel (which is booted with 1 cpu).

That is a big, multi-MB, optimization that can be implemented
in one place. Large savings for a localized change, so you
actually have a chance of having the changes accepted upstream.

--
All rights reversed

2011-06-22 02:42:37

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: make the threshold of enabling THP configurable

于 2011年06月21日 17:36, Mel Gorman 写道:
>
> Fragmentation avoidance benefits from tuning min_free_kbytes to a higher
> value and minimising fragmentation-related problems is crucial if THP is
> to allocate its necessary pages.
>
> THP tunes min_free_kbytes automatically and this value is in part
> related to the number of zones. At 512M on a single node machine, the
> recommended min_free_kbytes is close to 10% of memory which is barely
> tolerable as it is. At 256M, it's 17%, at 128M, it's 34% so tuning the
> value lower has diminishing returns as the performance impact of giving
> up such a high percentage of free memory is not going to be offset by
> reduced TLB misses. Tuning it to a higher value might make some sense
> if the higher min_free_kbytes was a problem but it would be much more
> rational to tune it as a sysctl than making it a compile-time decision.
>

What this patch changed is the check of total memory pages in hugepage_init(),
which I don't think is suitable as a sysctl.

If you mean min_free_kbytes could be tuned as a sysctl, that should be done
in other patch, right? :)

Thanks.

2011-06-22 02:57:27

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

于 2011年06月21日 22:43, Andrea Arcangeli 写道:
> On Tue, Jun 21, 2011 at 12:08:14PM +0800, Cong Wang wrote:
>> The thing is that we can save ~10K by adding 3 lines of code as this
>> patch showed, where else in kernel can you save 10K by 3 lines of code?
>> (except some kfree() cases, of course) So, again, why not have it? ;)
>
> Because you could save it with a more complicated patch that doesn't
> cripple down functionality.


Why do you prefer "more complicated" things to simple ones? ;-)

I realized this patch changed the original behavior of "=never",
thus proposed a new "=0" parameter.

But to be honest, "=never" should be renamed to "=disable".

> Again if you want to optimize this ~8KB gain, I recommend to add a
> param to make the hash size dynamic not to prevent the feature to ever
> be enabled again, so by making the code more complex at least it will
> also be useful if we want to increase the size hash at boot time (not
> only to decrease it).
>

Not only such things, the more serious thing is that you are
enforcing a policy to users, as long as I enable THP in Kconfig,
I have no way to disable it.

Why are you so sure that every user who has no chance to change
.config likes THP?

And, what can I do if I want to prevent any process from having
a chance to enable THP? Because as long as THP exists in /sys,
any process has the right privilege can change it.

Thanks.

2011-06-22 09:16:21

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: make the threshold of enabling THP configurable

On Wed, Jun 22, 2011 at 10:41:54AM +0800, Cong Wang wrote:
> ??? 2011???06???21??? 17:36, Mel Gorman ??????:
> >
> >Fragmentation avoidance benefits from tuning min_free_kbytes to a higher
> >value and minimising fragmentation-related problems is crucial if THP is
> >to allocate its necessary pages.
> >
> >THP tunes min_free_kbytes automatically and this value is in part
> >related to the number of zones. At 512M on a single node machine, the
> >recommended min_free_kbytes is close to 10% of memory which is barely
> >tolerable as it is. At 256M, it's 17%, at 128M, it's 34% so tuning the
> >value lower has diminishing returns as the performance impact of giving
> >up such a high percentage of free memory is not going to be offset by
> >reduced TLB misses. Tuning it to a higher value might make some sense
> >if the higher min_free_kbytes was a problem but it would be much more
> >rational to tune it as a sysctl than making it a compile-time decision.
> >
>
> What this patch changed is the check of total memory pages in hugepage_init(),
> which I don't think is suitable as a sysctl.
>
> If you mean min_free_kbytes could be tuned as a sysctl, that should be done
> in other patch, right? :)
>

min_free_kbytes is already automatically tuned when THP is enabled.

What I meant was that there is a rational reason why 512M is the
default for enabling THP by default. Tuning it lower than that by any
means makes very little sense. Tuning it higher might make some sense
but it is more likely that THP would simply be disabled via sysctl. I
see very little advantage to introducing this Kconfig option other
than as a source of confusion when running make oldconfig.

--
Mel Gorman
SUSE Labs

2011-06-22 10:47:56

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: make the threshold of enabling THP configurable

于 2011年06月22日 17:16, Mel Gorman 写道:
>
> What I meant was that there is a rational reason why 512M is the
> default for enabling THP by default. Tuning it lower than that by any
> means makes very little sense. Tuning it higher might make some sense
> but it is more likely that THP would simply be disabled via sysctl. I
> see very little advantage to introducing this Kconfig option other
> than as a source of confusion when running make oldconfig.
>

The tunable range is (512, 8192), so 512M is the minimum.

Sure, I knew it can be disabled via /sys, actually we can do even
more in user-space, that is totally move the 512M check out of kernel,
why we didn't?

In short, I think we should either remove the 512M from kernel, or
make 512M to be tunable.

Thanks.

2011-06-22 11:15:37

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: make the threshold of enabling THP configurable

On Wed, Jun 22, 2011 at 06:46:39PM +0800, Cong Wang wrote:
> ??? 2011???06???22??? 17:16, Mel Gorman ??????:
> >
> >What I meant was that there is a rational reason why 512M is the
> >default for enabling THP by default. Tuning it lower than that by any
> >means makes very little sense. Tuning it higher might make some sense
> >but it is more likely that THP would simply be disabled via sysctl. I
> >see very little advantage to introducing this Kconfig option other
> >than as a source of confusion when running make oldconfig.
> >
>
> The tunable range is (512, 8192), so 512M is the minimum.
>
> Sure, I knew it can be disabled via /sys, actually we can do even
> more in user-space, that is totally move the 512M check out of kernel,
> why we didn't?
>

Because the reason why 512M is the default is not obvious and there
was no guarantee all distros would chose a reasonable default for
an init script (or know that an init script was even necessary).
This is one of the few cases where there is a sensible default that
is the least surprising.

> In short, I think we should either remove the 512M from kernel, or
> make 512M to be tunable.
>

That just hands them a different sort of rope to hang themselves with
where THP gets enabled on small machines or botting with mem=128M
and getting surprised later by the high min_free_kbytes.

At this point, I don't really care if the Kconfig entry exists or
not. I think it gains nothing but additional confusion for people
who write .config files but it's not a topic I want to discuss for
days either.

--
Mel Gorman
SUSE Labs

2011-06-22 12:35:05

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: make the threshold of enabling THP configurable

于 2011年06月22日 19:15, Mel Gorman 写道:
> On Wed, Jun 22, 2011 at 06:46:39PM +0800, Cong Wang wrote:
>> ??? 2011???06???22??? 17:16, Mel Gorman ??????:
>>>
>>> What I meant was that there is a rational reason why 512M is the
>>> default for enabling THP by default. Tuning it lower than that by any
>>> means makes very little sense. Tuning it higher might make some sense
>>> but it is more likely that THP would simply be disabled via sysctl. I
>>> see very little advantage to introducing this Kconfig option other
>>> than as a source of confusion when running make oldconfig.
>>>
>>
>> The tunable range is (512, 8192), so 512M is the minimum.
>>
>> Sure, I knew it can be disabled via /sys, actually we can do even
>> more in user-space, that is totally move the 512M check out of kernel,
>> why we didn't?
>>
>
> Because the reason why 512M is the default is not obvious and there
> was no guarantee all distros would chose a reasonable default for
> an init script (or know that an init script was even necessary).
> This is one of the few cases where there is a sensible default that
> is the least surprising.
>

Putting a well-explained Kconfig help would solve this problem,
so I don't think this is a surprising thing.

2011-06-22 14:22:58

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: completely disable THP by transparent_hugepage=never

On Wed, Jun 22, 2011 at 10:56:41AM +0800, Cong Wang wrote:
> 于 2011年06月21日 22:43, Andrea Arcangeli 写道:
> > On Tue, Jun 21, 2011 at 12:08:14PM +0800, Cong Wang wrote:
> >> The thing is that we can save ~10K by adding 3 lines of code as this
> >> patch showed, where else in kernel can you save 10K by 3 lines of code?
> >> (except some kfree() cases, of course) So, again, why not have it? ;)
> >
> > Because you could save it with a more complicated patch that doesn't
> > cripple down functionality.
>
>
> Why do you prefer "more complicated" things to simple ones? ;-)

If they offer more features yes. Allowing to tune the size of the has
will also allow to increase it, not only to decrease it. It's also not
significantly more complicated.

> I realized this patch changed the original behavior of "=never",
> thus proposed a new "=0" parameter.
>
> But to be honest, "=never" should be renamed to "=disable".

So in turn you're saying "=always" should be renamed to "=enabled". So
your preference would be enabled=enabled and enabled=disabled and
enabled=madvise? I think the current wording is nicer and breaking
kabi just for this sounds bad.

> Not only such things, the more serious thing is that you are
> enforcing a policy to users, as long as I enable THP in Kconfig,
> I have no way to disable it.
>
> Why are you so sure that every user who has no chance to change
> .config likes THP?
>
> And, what can I do if I want to prevent any process from having
> a chance to enable THP? Because as long as THP exists in /sys,
> any process has the right privilege can change it.

You must be root to have the privilege to enable it, root also has the
privilege to enable THP by writing in /dev/mem or by loading a kernel
module to do it.

I already told you how to save hundred kbytes of ram from you kernel
setting dhash_entries=1 and ihash_entries=1 and how to achieve the ~8k
ram saving in THP and KSM without crippling functionality with a patch
that is more complex than your three liner, but not much more complex,
and that it will _improve_ (not cripple down) functionality.

I'm also not interested into making the 512M param configurable. If
you want to add a "=force" parameter ok but I doubt you will ever gain
anything significant on a system with 512M of ram where each process
will likely be smaller than 512M and 100M would get used by the
anti-frag logic (reducing it to 400m).

I suggest just cleaning up the printk and if you want you can add a
__setup("thp_mm_slots_hash_heads=", set_thp_mm_slots_hash_heads) but
no other change needed.