LinuxLists.cc - [PATCH] mm: thp: remove use_zero

2018-07-20 18:15:05

Subject: [PATCH] mm: thp: remove use_zero_page sysfs knob

By digging into the original review, it looks use_zero_page sysfs knob
was added to help ease-of-testing and give user a way to mitigate
refcounting overhead.

It has been a few years since the knob was added at the first place, I
think we are confident that it is stable enough. And, since commit
6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
it looks refcounting overhead has been reduced significantly.

Other than the above, the value of the knob is always 1 (enabled by
default), I'm supposed very few people turn it off by default.

So, it sounds not worth to still keep this knob around.

Cc: Kirill A. Shutemov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Aaron Lu <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
Documentation/admin-guide/mm/transhuge.rst | 7 -------
include/linux/huge_mm.h | 4 ----
mm/huge_memory.c | 22 ++--------------------
3 files changed, 2 insertions(+), 31 deletions(-)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 7ab93a8..d471ce8 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -148,13 +148,6 @@ madvise
never
should be self-explanatory.

-By default kernel tries to use huge zero page on read page fault to
-anonymous mapping. It's possible to disable huge zero page by writing 0
-or enable it back by writing 1::
-
- echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page
- echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page
-
Some userspace (such as a test program, or an optimized memory allocation
library) may want to know the size (in bytes) of a transparent hugepage::

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a8a1262..0ea7808 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -58,7 +58,6 @@ enum transparent_hugepage_flag {
TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG,
TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG,
TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG,
- TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG,
#ifdef CONFIG_DEBUG_VM
TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG,
#endif
@@ -116,9 +115,6 @@ static inline bool transparent_hugepage_enabled(struct vm_area_struct *vma)
return false;
}

-#define transparent_hugepage_use_zero_page() \
- (transparent_hugepage_flags & \
- (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
#ifdef CONFIG_DEBUG_VM
#define transparent_hugepage_debug_cow() \
(transparent_hugepage_flags & \
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1cd7c1a..0d4cf87 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -54,8 +54,7 @@
(1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)|
#endif
(1<<TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG)|
- (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)|
- (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
+ (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG);

static struct shrinker deferred_split_shrinker;

@@ -273,21 +272,6 @@ static ssize_t defrag_store(struct kobject *kobj,
static struct kobj_attribute defrag_attr =
__ATTR(defrag, 0644, defrag_show, defrag_store);

-static ssize_t use_zero_page_show(struct kobject *kobj,
- struct kobj_attribute *attr, char *buf)
-{
- return single_hugepage_flag_show(kobj, attr, buf,
- TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
-}
-static ssize_t use_zero_page_store(struct kobject *kobj,
- struct kobj_attribute *attr, const char *buf, size_t count)
-{
- return single_hugepage_flag_store(kobj, attr, buf, count,
- TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
-}
-static struct kobj_attribute use_zero_page_attr =
- __ATTR(use_zero_page, 0644, use_zero_page_show, use_zero_page_store);
-
static ssize_t hpage_pmd_size_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
@@ -317,7 +301,6 @@ static ssize_t debug_cow_store(struct kobject *kobj,
static struct attribute *hugepage_attr[] = {
&enabled_attr.attr,
&defrag_attr.attr,
- &use_zero_page_attr.attr,
&hpage_pmd_size_attr.attr,
#if defined(CONFIG_SHMEM) && defined(CONFIG_TRANSPARENT_HUGE_PAGECACHE)
&shmem_enabled_attr.attr,
@@ -677,8 +660,7 @@ int do_huge_pmd_anonymous_page(struct vm_fault *vmf)
if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
return VM_FAULT_OOM;
if (!(vmf->flags & FAULT_FLAG_WRITE) &&
- !mm_forbids_zeropage(vma->vm_mm) &&
- transparent_hugepage_use_zero_page()) {
+ !mm_forbids_zeropage(vma->vm_mm)) {
pgtable_t pgtable;
struct page *zero_page;
bool set;
--
1.8.3.1

2018-07-20 19:33:49

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On Sat, 21 Jul 2018 02:13:50 +0800 Yang Shi <[email protected]> wrote:

> By digging into the original review, it looks use_zero_page sysfs knob
> was added to help ease-of-testing and give user a way to mitigate
> refcounting overhead.
>
> It has been a few years since the knob was added at the first place, I
> think we are confident that it is stable enough. And, since commit
> 6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
> it looks refcounting overhead has been reduced significantly.
>
> Other than the above, the value of the knob is always 1 (enabled by
> default), I'm supposed very few people turn it off by default.
>
> So, it sounds not worth to still keep this knob around.

Probably OK. Might not be OK, nobody knows.

It's been there for seven years so another six months won't kill us.
How about as an intermediate step we add a printk("use_zero_page is
scheduled for removal. Please contact [email protected] if you need
it").

2018-07-20 20:03:33

by David Rientjes

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On Fri, 20 Jul 2018, Andrew Morton wrote:

> > By digging into the original review, it looks use_zero_page sysfs knob
> > was added to help ease-of-testing and give user a way to mitigate
> > refcounting overhead.
> >
> > It has been a few years since the knob was added at the first place, I
> > think we are confident that it is stable enough. And, since commit
> > 6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
> > it looks refcounting overhead has been reduced significantly.
> >
> > Other than the above, the value of the knob is always 1 (enabled by
> > default), I'm supposed very few people turn it off by default.
> >
> > So, it sounds not worth to still keep this knob around.
>
> Probably OK. Might not be OK, nobody knows.
>
> It's been there for seven years so another six months won't kill us.
> How about as an intermediate step we add a printk("use_zero_page is
> scheduled for removal. Please contact [email protected] if you need
> it").
>

We disable the huge zero page through this interface, there were issues
related to the huge zero page shrinker (probably best to never free a
per-node huge zero page after allocated) and CVE-2017-1000405 for huge
dirty COW.

2018-07-20 20:38:54

by Yang Shi

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On 7/20/18 1:02 PM, David Rientjes wrote:
> On Fri, 20 Jul 2018, Andrew Morton wrote:
>
>>> By digging into the original review, it looks use_zero_page sysfs knob
>>> was added to help ease-of-testing and give user a way to mitigate
>>> refcounting overhead.
>>>
>>> It has been a few years since the knob was added at the first place, I
>>> think we are confident that it is stable enough. And, since commit
>>> 6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
>>> it looks refcounting overhead has been reduced significantly.
>>>
>>> Other than the above, the value of the knob is always 1 (enabled by
>>> default), I'm supposed very few people turn it off by default.
>>>
>>> So, it sounds not worth to still keep this knob around.
>> Probably OK. Might not be OK, nobody knows.
>>
>> It's been there for seven years so another six months won't kill us.
>> How about as an intermediate step we add a printk("use_zero_page is
>> scheduled for removal. Please contact [email protected] if you need
>> it").
>>
> We disable the huge zero page through this interface, there were issues
> related to the huge zero page shrinker (probably best to never free a
> per-node huge zero page after allocated) and CVE-2017-1000405 for huge
> dirty COW.

Thanks for the information. It looks the CVE has been resolved by commit
a8f97366452ed491d13cf1e44241bc0b5740b1f0 ("mm, thp: Do not make page
table dirty unconditionally in touch_p[mu]d()"), which is in 4.15 already.

What was the shrinker related issue? I'm supposed it has been resolved,
right?

2018-07-20 21:06:44

by David Rientjes

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On Fri, 20 Jul 2018, Yang Shi wrote:

> > We disable the huge zero page through this interface, there were issues
> > related to the huge zero page shrinker (probably best to never free a
> > per-node huge zero page after allocated) and CVE-2017-1000405 for huge
> > dirty COW.
>
> Thanks for the information. It looks the CVE has been resolved by commit
> a8f97366452ed491d13cf1e44241bc0b5740b1f0 ("mm, thp: Do not make page table
> dirty unconditionally in touch_p[mu]d()"), which is in 4.15 already.
>

For users who run kernels earlier than 4.15 they may choose to mitigate
the CVE by using this tunable. It's not something we permanently need to
have, but it may likely be too early.

> What was the shrinker related issue? I'm supposed it has been resolved, right?
>

The huge zero page can be reclaimed under memory pressure and, if it is,
it is attempted to be allocted again with gfp flags that attempt memory
compaction that can become expensive. If we are constantly under memory
pressure, it gets freed and reallocated millions of times always trying to
compact memory both directly and by kicking kcompactd in the background.

It likely should also be per node.

2018-07-20 21:07:21

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On Sat, Jul 21, 2018 at 02:13:50AM +0800, Yang Shi wrote:
> By digging into the original review, it looks use_zero_page sysfs knob
> was added to help ease-of-testing and give user a way to mitigate
> refcounting overhead.
>
> It has been a few years since the knob was added at the first place, I
> think we are confident that it is stable enough. And, since commit
> 6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
> it looks refcounting overhead has been reduced significantly.
>
> Other than the above, the value of the knob is always 1 (enabled by
> default), I'm supposed very few people turn it off by default.
>
> So, it sounds not worth to still keep this knob around.

I don't think that having the knob around is huge maintenance burden.
And since it helped to workaround a security bug relative recently I would
rather keep it.

--
Kirill A. Shutemov

2018-07-20 23:51:00

by Yang Shi

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On 7/20/18 2:05 PM, David Rientjes wrote:
> On Fri, 20 Jul 2018, Yang Shi wrote:
>
>>> We disable the huge zero page through this interface, there were issues
>>> related to the huge zero page shrinker (probably best to never free a
>>> per-node huge zero page after allocated) and CVE-2017-1000405 for huge
>>> dirty COW.
>> Thanks for the information. It looks the CVE has been resolved by commit
>> a8f97366452ed491d13cf1e44241bc0b5740b1f0 ("mm, thp: Do not make page table
>> dirty unconditionally in touch_p[mu]d()"), which is in 4.15 already.
>>
> For users who run kernels earlier than 4.15 they may choose to mitigate
> the CVE by using this tunable. It's not something we permanently need to
> have, but it may likely be too early.

Yes, it might be good to keep it around for a while.

>
>> What was the shrinker related issue? I'm supposed it has been resolved, right?
>>
> The huge zero page can be reclaimed under memory pressure and, if it is,
> it is attempted to be allocted again with gfp flags that attempt memory
> compaction that can become expensive. If we are constantly under memory
> pressure, it gets freed and reallocated millions of times always trying to
> compact memory both directly and by kicking kcompactd in the background.

Even though we don't use huge zero page, we may also run into the
similar issue under memory pressure. Just save the cost of calling huge
zero page shrinker, but actually its cost sound not high.

>
> It likely should also be per node.

2018-07-20 23:53:07

by Yang Shi

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On 7/20/18 2:06 PM, Kirill A. Shutemov wrote:
> On Sat, Jul 21, 2018 at 02:13:50AM +0800, Yang Shi wrote:
>> By digging into the original review, it looks use_zero_page sysfs knob
>> was added to help ease-of-testing and give user a way to mitigate
>> refcounting overhead.
>>
>> It has been a few years since the knob was added at the first place, I
>> think we are confident that it is stable enough. And, since commit
>> 6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
>> it looks refcounting overhead has been reduced significantly.
>>
>> Other than the above, the value of the knob is always 1 (enabled by
>> default), I'm supposed very few people turn it off by default.
>>
>> So, it sounds not worth to still keep this knob around.
> I don't think that having the knob around is huge maintenance burden.
> And since it helped to workaround a security bug relative recently I would
> rather keep it.

I agree to keep it for a while to let that security bug cool down,
however, if there is no user anymore, it sounds pointless to still keep
a dead knob.

>

2018-07-22 03:53:08

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On Fri, Jul 20, 2018 at 02:05:52PM -0700, David Rientjes wrote:
> The huge zero page can be reclaimed under memory pressure and, if it is,
> it is attempted to be allocted again with gfp flags that attempt memory
> compaction that can become expensive. If we are constantly under memory
> pressure, it gets freed and reallocated millions of times always trying to
> compact memory both directly and by kicking kcompactd in the background.
>
> It likely should also be per node.

Have you benchmarked making the non-huge zero page per-node?

2018-07-23 20:29:53

by David Rientjes

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On Sat, 21 Jul 2018, Matthew Wilcox wrote:

> > The huge zero page can be reclaimed under memory pressure and, if it is,
> > it is attempted to be allocted again with gfp flags that attempt memory
> > compaction that can become expensive. If we are constantly under memory
> > pressure, it gets freed and reallocated millions of times always trying to
> > compact memory both directly and by kicking kcompactd in the background.
> >
> > It likely should also be per node.
>
> Have you benchmarked making the non-huge zero page per-node?
>

Not since we disable it :) I will, though. The more concerning issue for
us, modulo CVE-2017-1000405, is the cpu cost of constantly directly
compacting memory for allocating the hzp in real time after it has been
reclaimed. We've observed this happening tens or hundreds of thousands
of times on some systems. It will be 2MB per node on x86 if the data
suggests we should make it NUMA aware, I don't think the cost is too high
to leave it persistently available even under memory pressure if
use_zero_page is enabled.

2018-07-23 20:32:15

by David Rientjes

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On Fri, 20 Jul 2018, Yang Shi wrote:

> I agree to keep it for a while to let that security bug cool down, however, if
> there is no user anymore, it sounds pointless to still keep a dead knob.
>

It's not a dead knob. We use it, and for reasons other than
CVE-2017-1000405. To mitigate the cost of constantly compacting memory to
allocate it after it has been freed due to memry pressure, we can either
continue to disable it, allow it to be persistently available, or use a
new value for use_zero_page to specify it should be persistently
available.

2018-07-23 21:34:16

by David Rientjes

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On Mon, 23 Jul 2018, David Rientjes wrote:

> > > The huge zero page can be reclaimed under memory pressure and, if it is,
> > > it is attempted to be allocted again with gfp flags that attempt memory
> > > compaction that can become expensive. If we are constantly under memory
> > > pressure, it gets freed and reallocated millions of times always trying to
> > > compact memory both directly and by kicking kcompactd in the background.
> > >
> > > It likely should also be per node.
> >
> > Have you benchmarked making the non-huge zero page per-node?
> >
>
> Not since we disable it :) I will, though. The more concerning issue for
> us, modulo CVE-2017-1000405, is the cpu cost of constantly directly
> compacting memory for allocating the hzp in real time after it has been
> reclaimed. We've observed this happening tens or hundreds of thousands
> of times on some systems. It will be 2MB per node on x86 if the data
> suggests we should make it NUMA aware, I don't think the cost is too high
> to leave it persistently available even under memory pressure if
> use_zero_page is enabled.
>

Measuring access latency to 4GB of memory on Naples I observe ~6.7%
slower access latency intrasocket and ~14% slower intersocket.

use_zero_page is currently a simple thp flag, meaning it rejects writes
where val != !!val, so perhaps it would be best to overload it with
additional options? I can imagine 0x2 defining persistent allocation so
that the hzp is not freed when the refcount goes to 0 and 0x4 defining if
the hzp should be per node. Implementing persistent allocation fixes our
concern with it, so I'd like to start there. Comments?

2018-07-23 21:51:15

by Yang Shi

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On 7/23/18 2:33 PM, David Rientjes wrote:
> On Mon, 23 Jul 2018, David Rientjes wrote:
>
>>>> The huge zero page can be reclaimed under memory pressure and, if it is,
>>>> it is attempted to be allocted again with gfp flags that attempt memory
>>>> compaction that can become expensive. If we are constantly under memory
>>>> pressure, it gets freed and reallocated millions of times always trying to
>>>> compact memory both directly and by kicking kcompactd in the background.
>>>>
>>>> It likely should also be per node.
>>> Have you benchmarked making the non-huge zero page per-node?
>>>
>> Not since we disable it :) I will, though. The more concerning issue for
>> us, modulo CVE-2017-1000405, is the cpu cost of constantly directly
>> compacting memory for allocating the hzp in real time after it has been
>> reclaimed. We've observed this happening tens or hundreds of thousands
>> of times on some systems. It will be 2MB per node on x86 if the data
>> suggests we should make it NUMA aware, I don't think the cost is too high
>> to leave it persistently available even under memory pressure if
>> use_zero_page is enabled.
>>
> Measuring access latency to 4GB of memory on Naples I observe ~6.7%
> slower access latency intrasocket and ~14% slower intersocket.
>
> use_zero_page is currently a simple thp flag, meaning it rejects writes
> where val != !!val, so perhaps it would be best to overload it with
> additional options? I can imagine 0x2 defining persistent allocation so
> that the hzp is not freed when the refcount goes to 0 and 0x4 defining if
> the hzp should be per node. Implementing persistent allocation fixes our
> concern with it, so I'd like to start there. Comments?

Sounds worth trying to me :-) It might be worth making it persistent by
default. Keeping 2MB memory unreclaimable sounds not harmful for the use
case which prefer to use THP.

2018-07-23 21:54:02

by Yang Shi

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On 7/23/18 1:31 PM, David Rientjes wrote:
> On Fri, 20 Jul 2018, Yang Shi wrote:
>
>> I agree to keep it for a while to let that security bug cool down, however, if
>> there is no user anymore, it sounds pointless to still keep a dead knob.
>>
> It's not a dead knob. We use it, and for reasons other than
> CVE-2017-1000405. To mitigate the cost of constantly compacting memory to
> allocate it after it has been freed due to memry pressure, we can either
> continue to disable it, allow it to be persistently available, or use a
> new value for use_zero_page to specify it should be persistently
> available.

My understanding is the cost of memory compaction is *not* unique for
huge zero page, right? It is expected when memory pressure is met, even
though huge zero page is disabled.

2018-07-23 23:15:11

by David Rientjes

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On Mon, 23 Jul 2018, Yang Shi wrote:

> > > I agree to keep it for a while to let that security bug cool down,
> > > however, if
> > > there is no user anymore, it sounds pointless to still keep a dead knob.
> > >
> > It's not a dead knob. We use it, and for reasons other than
> > CVE-2017-1000405. To mitigate the cost of constantly compacting memory to
> > allocate it after it has been freed due to memry pressure, we can either
> > continue to disable it, allow it to be persistently available, or use a
> > new value for use_zero_page to specify it should be persistently
> > available.
>
> My understanding is the cost of memory compaction is *not* unique for huge
> zero page, right? It is expected when memory pressure is met, even though huge
> zero page is disabled.
>

It's caused by fragmentation, not necessarily memory pressure. We've
disabled it because compacting for tens of thousands of huge zero pages in
the background has a noticeable impact on cpu. Additionally, if the hzp
cannot be allocated at runtime it increases the rss of applications that
map it, making it unpredictable. Making it persistent, as I've been
suggesting, fixes these issues.

2018-07-24 09:09:13

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On Mon, Jul 23, 2018 at 02:33:08PM -0700, David Rientjes wrote:
> On Mon, 23 Jul 2018, David Rientjes wrote:
>
> > > > The huge zero page can be reclaimed under memory pressure and, if it is,
> > > > it is attempted to be allocted again with gfp flags that attempt memory
> > > > compaction that can become expensive. If we are constantly under memory
> > > > pressure, it gets freed and reallocated millions of times always trying to
> > > > compact memory both directly and by kicking kcompactd in the background.
> > > >
> > > > It likely should also be per node.
> > >
> > > Have you benchmarked making the non-huge zero page per-node?
> > >
> >
> > Not since we disable it :) I will, though. The more concerning issue for
> > us, modulo CVE-2017-1000405, is the cpu cost of constantly directly
> > compacting memory for allocating the hzp in real time after it has been
> > reclaimed. We've observed this happening tens or hundreds of thousands
> > of times on some systems. It will be 2MB per node on x86 if the data
> > suggests we should make it NUMA aware, I don't think the cost is too high
> > to leave it persistently available even under memory pressure if
> > use_zero_page is enabled.
> >
>
> Measuring access latency to 4GB of memory on Naples I observe ~6.7%
> slower access latency intrasocket and ~14% slower intersocket.
>
> use_zero_page is currently a simple thp flag, meaning it rejects writes
> where val != !!val, so perhaps it would be best to overload it with
> additional options? I can imagine 0x2 defining persistent allocation so
> that the hzp is not freed when the refcount goes to 0 and 0x4 defining if
> the hzp should be per node. Implementing persistent allocation fixes our
> concern with it, so I'd like to start there. Comments?

Why not a separate files?

--
Kirill A. Shutemov

2018-07-24 20:34:11

by David Rientjes

[permalink] [raw]

Subject: Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

On Tue, 24 Jul 2018, Kirill A. Shutemov wrote:

> > use_zero_page is currently a simple thp flag, meaning it rejects writes
> > where val != !!val, so perhaps it would be best to overload it with
> > additional options? I can imagine 0x2 defining persistent allocation so
> > that the hzp is not freed when the refcount goes to 0 and 0x4 defining if
> > the hzp should be per node. Implementing persistent allocation fixes our
> > concern with it, so I'd like to start there. Comments?
>
> Why not a separate files?
>

That works as well. I'll write a patch for persistent allocation first to
address our most immediate need.