From: xu xin <[email protected]>
Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
after enabling use_zero_pages, all empty pages that are merged with
kernel zero page are not counted in pages_sharing or pages_shared.
That is because the rmap_items of these ksm zero pages are not
appended to The Stable Tree of KSM.
We need to add the count of empty pages to let users know how many empty
pages are merged with kernel zero page(s).
Please see the subsequent patches for details.
*** BLURB HERE ***
xu xin (3):
ksm: abstract the function try_to_get_old_rmap_item
ksm: add the accounting of zero pages merged by use_zero_pages
ksm: add zero_pages_merged in Documentation
Documentation/admin-guide/mm/ksm.rst | 10 ++-
mm/ksm.c | 122 +++++++++++++++++++++------
2 files changed, 106 insertions(+), 26 deletions(-)
--
2.25.1
From: xu xin <[email protected]>
Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
use_zero_pages to 1, Using pages_sharing of KSM to indicate how
much pages saved by KSM is basically accurate. But when enabling
use_zero_pages, it becomes not accurate, and all empty(zeroed)
pages that are merged with kernel zero page are not counted in
pages_sharing or pages_shared. That is because the rmap_items
of these ksm zero pages are never appended to the Stable Tree
of KSM.
This leads to KSM not being fully correct and transparent with all
merged pages when enabling use_zero_pages.
There are two ways to fix it. One way is to count ksm zero pages into
pages_sharing, but it breaks the definition of pages_sharing (means
how many pages is sharing those KSM stable node). So we have to choose
Plan B, which is adding a new interface "zero_pages_sharing" under
/sys/kernel/mm/ksm/ to show it.
To implement that, we introduce a new flag SPECIAL_ZERO_FLAG to mark
those special zero pages (merged with kernel zero pages) for accounting
because these zero pages neither belongs to the existing STABLE_FLAG
nor UNSTABLE_FLAG.
Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")
Co-developed-by: Xiaokai Ran <[email protected]>
Signed-off-by: Xiaokai Ran <[email protected]>
Co-developed-by: Yang Yang <[email protected]>
Signed-off-by: Yang Yang <[email protected]>
Co-developed-by: Jiang Xuexin <[email protected]>
Signed-off-by: Jiang Xuexin <[email protected]>
Signed-off-by: xu xin <[email protected]>
---
mm/ksm.c | 98 +++++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 79 insertions(+), 19 deletions(-)
diff --git a/mm/ksm.c b/mm/ksm.c
index 5b68482d2b3b..88153d2b497f 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -213,6 +213,7 @@ struct ksm_rmap_item {
#define SEQNR_MASK 0x0ff /* low bits of unstable tree seqnr */
#define UNSTABLE_FLAG 0x100 /* is a node of the unstable tree */
#define STABLE_FLAG 0x200 /* is listed from the stable tree */
+#define SPECIAL_ZERO_FLAG 0x400 /* specially treated zero page */
/* The stable and unstable tree heads */
static struct rb_root one_stable_tree[1] = { RB_ROOT };
@@ -274,6 +275,9 @@ static unsigned int zero_checksum __read_mostly;
/* Whether to merge empty (zeroed) pages with actual zero pages */
static bool ksm_use_zero_pages __read_mostly;
+/* The number of empty(zeroed) pages merged but not in the stable tree */
+static unsigned long ksm_zero_pages_sharing;
+
#ifdef CONFIG_NUMA
/* Zeroed when merging across nodes is not allowed */
static unsigned int ksm_merge_across_nodes = 1;
@@ -796,6 +800,10 @@ static void remove_trailing_rmap_items(struct ksm_rmap_item **rmap_list)
struct ksm_rmap_item *rmap_item = *rmap_list;
*rmap_list = rmap_item->rmap_list;
remove_rmap_item_from_tree(rmap_item);
+ if (rmap_item->address & SPECIAL_ZERO_FLAG) {
+ rmap_item->address &= PAGE_MASK;
+ ksm_zero_pages_sharing--;
+ }
free_rmap_item(rmap_item);
}
}
@@ -2017,6 +2025,39 @@ static void stable_tree_append(struct ksm_rmap_item *rmap_item,
rmap_item->mm->ksm_merging_pages++;
}
+static int try_to_merge_with_kernel_zero_page(struct mm_struct *mm,
+ struct ksm_rmap_item *rmap_item,
+ struct page *page)
+{
+ int err = 0;
+
+ if (!(rmap_item->address & SPECIAL_ZERO_FLAG)) {
+ struct vm_area_struct *vma;
+
+ mmap_read_lock(mm);
+ vma = find_mergeable_vma(mm, rmap_item->address);
+ if (vma) {
+ err = try_to_merge_one_page(vma, page,
+ ZERO_PAGE(rmap_item->address));
+ } else {
+ /* If the vma is out of date, we do not need to continue. */
+ err = 0;
+ }
+ mmap_read_unlock(mm);
+ /*
+ * In case of failure, the page was not really empty, so we
+ * need to continue. Otherwise we're done.
+ */
+ if (!err) {
+ rmap_item->address |= SPECIAL_ZERO_FLAG;
+ ksm_zero_pages_sharing++;
+ }
+
+ }
+
+ return err;
+}
+
/*
* cmp_and_merge_page - first see if page can be merged into the stable tree;
* if not, compare checksum to previous and if it's the same, see if page can
@@ -2101,29 +2142,22 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
* Same checksum as an empty page. We attempt to merge it with the
* appropriate zero page if the user enabled this via sysfs.
*/
- if (ksm_use_zero_pages && (checksum == zero_checksum)) {
- struct vm_area_struct *vma;
-
- mmap_read_lock(mm);
- vma = find_mergeable_vma(mm, rmap_item->address);
- if (vma) {
- err = try_to_merge_one_page(vma, page,
- ZERO_PAGE(rmap_item->address));
- } else {
+ if (ksm_use_zero_pages) {
+ if (checksum == zero_checksum) {
+ /* If success, just return. Otherwise, continue */
+ if (!try_to_merge_with_kernel_zero_page(mm, rmap_item, page))
+ return;
+ } else if (rmap_item->address & SPECIAL_ZERO_FLAG) {
/*
- * If the vma is out of date, we do not need to
- * continue.
+ * The page now is not kernel zero page(modified) but the flag
+ * of its rmap_item is still zero-page, so need to reset the
+ * flag and update the corresponding count.
*/
- err = 0;
+ rmap_item->address &= PAGE_MASK;
+ ksm_zero_pages_sharing--;
}
- mmap_read_unlock(mm);
- /*
- * In case of failure, the page was not really empty, so we
- * need to continue. Otherwise we're done.
- */
- if (!err)
- return;
}
+
tree_rmap_item =
unstable_tree_search_insert(rmap_item, page, &tree_page);
if (tree_rmap_item) {
@@ -2336,6 +2370,24 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
mmap_read_unlock(mm);
return rmap_item;
}
+ /*
+ * Because we want to count ksm zero pages which is
+ * non-anonymous, we must try to return the rmap_items
+ * of those kernel zero pages which replaces its
+ * original anonymous empty page due to use_zero_pages's
+ * feature.
+ */
+ if (is_zero_pfn(page_to_pfn(*page))) {
+ rmap_item = try_to_get_old_rmap_item(
+ ksm_scan.address,
+ ksm_scan.rmap_list);
+ if (rmap_item->address & SPECIAL_ZERO_FLAG) {
+ ksm_scan.rmap_list = &rmap_item->rmap_list;
+ ksm_scan.address += PAGE_SIZE;
+ mmap_read_unlock(mm);
+ return rmap_item;
+ }
+ }
next_page:
put_page(*page);
ksm_scan.address += PAGE_SIZE;
@@ -3115,6 +3167,13 @@ static ssize_t pages_volatile_show(struct kobject *kobj,
}
KSM_ATTR_RO(pages_volatile);
+static ssize_t zero_pages_sharing_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "%ld\n", ksm_zero_pages_sharing);
+}
+KSM_ATTR_RO(zero_pages_sharing);
+
static ssize_t stable_node_dups_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
@@ -3175,6 +3234,7 @@ static struct attribute *ksm_attrs[] = {
&merge_across_nodes_attr.attr,
#endif
&max_page_sharing_attr.attr,
+ &zero_pages_sharing_attr.attr,
&stable_node_chains_attr.attr,
&stable_node_dups_attr.attr,
&stable_node_chains_prune_millisecs_attr.attr,
--
2.25.1
From: xu xin <[email protected]>
When enabling use_zero_pages, pages_sharing cannot represent how
much memory saved indeed. zero_pages_sharing + pages_sharing does.
add the description of zero_pages_sharing.
Signed-off-by: xu xin <[email protected]>
Cc: Xiaokai Ran <[email protected]>
Cc: Yang Yang <[email protected]>
Cc: Jiang Xuexin <[email protected]>
Signed-off-by: xu xin <[email protected]>
---
Documentation/admin-guide/mm/ksm.rst | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
index fb6ba2002a4b..484665aa7418 100644
--- a/Documentation/admin-guide/mm/ksm.rst
+++ b/Documentation/admin-guide/mm/ksm.rst
@@ -162,7 +162,7 @@ The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
pages_shared
how many shared pages are being used
pages_sharing
- how many more sites are sharing them i.e. how much saved
+ how many more sites are sharing them
pages_unshared
how many pages unique but repeatedly checked for merging
pages_volatile
@@ -173,6 +173,14 @@ stable_node_chains
the number of KSM pages that hit the ``max_page_sharing`` limit
stable_node_dups
number of duplicated KSM pages
+zero_pages_sharing
+ how many empty pages are sharing kernel zero page(s) instead of
+ with each other as it would happen normally. only effective when
+ enabling ``use_zero_pages`` knob.
+
+If ``use_zero_pages`` is 0, only ``pages_sharing`` can represents how
+much saved. Otherwise, ``pages_sharing`` + ``zero_pages_sharing``
+represents how much saved actually.
A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good
sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing``
--
2.25.1
From: xu xin <[email protected]>
A new function try_to_get_old_rmap_item is abstracted from
get_next_rmap_item. This function will be reused by the subsequent
patches about counting ksm_zero_pages.
At the same time, the patch improves the readability and reusability
of KSM code.
Signed-off-by: xu xin <[email protected]>
Reviewed-by: Xiaokai Ran <[email protected]>
Reviewed-by: Yang Yang <[email protected]>
Cc: Jiang Xuexin <[email protected]>
Signed-off-by: xu xin <[email protected]>
---
mm/ksm.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/mm/ksm.c b/mm/ksm.c
index c19fcca9bc03..5b68482d2b3b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2187,14 +2187,11 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite
}
}
-static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
- struct ksm_rmap_item **rmap_list,
- unsigned long addr)
+static struct ksm_rmap_item *try_to_get_old_rmap_item(unsigned long addr,
+ struct ksm_rmap_item **rmap_list)
{
- struct ksm_rmap_item *rmap_item;
-
while (*rmap_list) {
- rmap_item = *rmap_list;
+ struct ksm_rmap_item *rmap_item = *rmap_list;
if ((rmap_item->address & PAGE_MASK) == addr)
return rmap_item;
if (rmap_item->address > addr)
@@ -2204,6 +2201,21 @@ static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
free_rmap_item(rmap_item);
}
+ return NULL;
+}
+
+static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
+ struct ksm_rmap_item **rmap_list,
+ unsigned long addr)
+{
+ struct ksm_rmap_item *rmap_item;
+
+ /* lookup if we have a old rmap_item matching the addr*/
+ rmap_item = try_to_get_old_rmap_item(addr, rmap_list);
+ if (rmap_item)
+ return rmap_item;
+
+ /* Need to allocate a new rmap_item */
rmap_item = alloc_rmap_item();
if (rmap_item) {
/* It has already been zeroed */
--
2.25.1
On 29.09.22 04:52, [email protected] wrote:
> From: xu xin <[email protected]>
>
> Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
> use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
> after enabling use_zero_pages, all empty pages that are merged with
> kernel zero page are not counted in pages_sharing or pages_shared.
> That is because the rmap_items of these ksm zero pages are not
> appended to The Stable Tree of KSM.
>
> We need to add the count of empty pages to let users know how many empty
> pages are merged with kernel zero page(s).
>
> Please see the subsequent patches for details.
Just raising the topic here because it's related to the KSM usage of the
shared zero-page:
MADV_UNMERGEABLE and other ways to trigger unsharing will *not* unshare
the shared zeropage as placed by KSM (which is against the
MADV_UNMERGEABLE documentation at least). It will only unshare actual
KSM pages. We might not want want to blindly unshare all shared
zeropages in applicable VMAs ... using a dedicated shared zero (KSM)
page -- instead of the generic zero page -- might be one way to handle
this cleaner.
Would that also fix some of the issues you describe above?
--
Thanks,
David / dhildenb
On Thu, 29 Sep 2022 02:52:06 +0000
[email protected] wrote:
> From: xu xin <[email protected]>
>
> Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
> use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
> after enabling use_zero_pages, all empty pages that are merged with
> kernel zero page are not counted in pages_sharing or pages_shared.
that's because those pages are not shared between different processes.
> That is because the rmap_items of these ksm zero pages are not
> appended to The Stable Tree of KSM.
>
> We need to add the count of empty pages to let users know how many empty
> pages are merged with kernel zero page(s).
why?
do you need to know how many untouched zero pages a process has?
does it make a difference if the zero page is really untouched or if it
was touched in the past but it is now zero?
>
> Please see the subsequent patches for details.
>
>
>
>
> *** BLURB HERE ***
>
> xu xin (3):
> ksm: abstract the function try_to_get_old_rmap_item
> ksm: add the accounting of zero pages merged by use_zero_pages
> ksm: add zero_pages_merged in Documentation
>
> Documentation/admin-guide/mm/ksm.rst | 10 ++-
> mm/ksm.c | 122 +++++++++++++++++++++------
> 2 files changed, 106 insertions(+), 26 deletions(-)
>
On Thu, 29 Sep 2022 11:21:44 +0200
David Hildenbrand <[email protected]> wrote:
> On 29.09.22 04:52, [email protected] wrote:
> > From: xu xin <[email protected]>
> >
> > Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
> > use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
> > after enabling use_zero_pages, all empty pages that are merged with
> > kernel zero page are not counted in pages_sharing or pages_shared.
> > That is because the rmap_items of these ksm zero pages are not
> > appended to The Stable Tree of KSM.
> >
> > We need to add the count of empty pages to let users know how many empty
> > pages are merged with kernel zero page(s).
> >
> > Please see the subsequent patches for details.
>
> Just raising the topic here because it's related to the KSM usage of the
> shared zero-page:
>
> MADV_UNMERGEABLE and other ways to trigger unsharing will *not* unshare
> the shared zeropage as placed by KSM (which is against the
> MADV_UNMERGEABLE documentation at least). It will only unshare actual
> KSM pages. We might not want want to blindly unshare all shared
> zeropages in applicable VMAs ... using a dedicated shared zero (KSM)
> page -- instead of the generic zero page -- might be one way to handle
> this cleaner.
I don't understand why do you need this.
first of all, one zero page would not be enough (depending on the
architecture, e.g. on s390x you need many). the whole point of zero
page merging is that one zero page is not enough.
second, once a page is merged with a zero page, it's not really handled
by KSM anymore. if you have a big allocation, of which you only touch a
few pages, would the rest be considered "merged"? no, it's just zero
pages, right?
this is the same, except that we take present pages with zeroes in it
and we discard them and map them to zero pages. it's kinda like if we
had never touched them.
>
> Would that also fix some of the issues you describe above?
>
On 29.09.22 12:36, Claudio Imbrenda wrote:
> On Thu, 29 Sep 2022 11:21:44 +0200
> David Hildenbrand <[email protected]> wrote:
>
>> On 29.09.22 04:52, [email protected] wrote:
>>> From: xu xin <[email protected]>
>>>
>>> Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
>>> use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
>>> after enabling use_zero_pages, all empty pages that are merged with
>>> kernel zero page are not counted in pages_sharing or pages_shared.
>>> That is because the rmap_items of these ksm zero pages are not
>>> appended to The Stable Tree of KSM.
>>>
>>> We need to add the count of empty pages to let users know how many empty
>>> pages are merged with kernel zero page(s).
>>>
>>> Please see the subsequent patches for details.
>>
>> Just raising the topic here because it's related to the KSM usage of the
>> shared zero-page:
>>
>> MADV_UNMERGEABLE and other ways to trigger unsharing will *not* unshare
>> the shared zeropage as placed by KSM (which is against the
>> MADV_UNMERGEABLE documentation at least). It will only unshare actual
>> KSM pages. We might not want want to blindly unshare all shared
>> zeropages in applicable VMAs ... using a dedicated shared zero (KSM)
>> page -- instead of the generic zero page -- might be one way to handle
>> this cleaner.
>
> I don't understand why do you need this.
>
> first of all, one zero page would not be enough (depending on the
> architecture, e.g. on s390x you need many). the whole point of zero
> page merging is that one zero page is not enough.
I don't follow. Having multiple ones is a pure optimization on s390x (I
recall something about cache coloring), no? So why should we blindly
care in the special KSM use case here?
>
> second, once a page is merged with a zero page, it's not really handled
> by KSM anymore. if you have a big allocation, of which you only touch a
> few pages, would the rest be considered "merged"? no, it's just zero
> pages, right?
If you haven't touched memory, there is nothing populated -- no shared
zeropage.
We only populate shared zeropages in private anonymous mappings on read
access without prior write.
> this is the same, except that we take present pages with zeroes in it
> and we discard them and map them to zero pages. it's kinda like if we
> had never touched them.
MADV_UNMERGEABLE
"Undo the effect of an earlier MADV_MERGEABLE operation on the
specified address range; KSM unmerges whatever pages it had merged in
the address range specified by addr and length."
Now please explain to me how not undoing a zeropage merging is correct
according to this documentation.
--
Thanks,
David / dhildenb
>> We need to add the count of empty pages to let users know how many empty
>> pages are merged with kernel zero page(s).
>>
>> Please see the subsequent patches for details.
> Just raising the topic here because it's related to the KSM usage of the
> shared zero-page:
> MADV_UNMERGEABLE and other ways to trigger unsharing will *not* unshare
> the shared zeropage as placed by KSM (which is against the
> MADV_UNMERGEABLE documentation at least). It will only unshare actual
> KSM pages. We might not want want to blindly unshare all shared
> zeropages in applicable VMAs ... using a dedicated shared zero (KSM)
> page -- instead of the generic zero page -- might be one way to handle
> this cleaner.
> Would that also fix some of the issues you describe above?
Glad to see your reply. I think it depends.
The way you said solves the issue you post, but maybe not help to solve the issue
I post.
The key lies in whether appending zeropage's rmap_items to stable tree. If
appending their rmap_items to the stable tree, the issue I pointed can be fixed but
that will degrade the performance of use_zero_pages. If not appending their rmap_items
to the stable tree, we have to choose this patches set (but I found some bugs now, later
I will send v2 to fix it).
On Thu, 29 Sep 2022 13:34:24 +0200
David Hildenbrand <[email protected]> wrote:
> On 29.09.22 12:42, Claudio Imbrenda wrote:
> > On Thu, 29 Sep 2022 02:52:06 +0000
> > [email protected] wrote:
> >
> >> From: xu xin <[email protected]>
> >>
> >> Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
> >> use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
> >> after enabling use_zero_pages, all empty pages that are merged with
> >> kernel zero page are not counted in pages_sharing or pages_shared.
> >
> > that's because those pages are not shared between different processes.
>
> They are probably the most shared pages between processes in the kernel.
shared from the kernel, though, not from other processes (that's what I
meant)
> They are simply not KSM pages, that's what makes accounting tricky here.
exactly. and those pages get shared all the time even without KSM, so
why care about those now?
does it make a difference why a page is a zero page?
>
> >
> >> That is because the rmap_items of these ksm zero pages are not
> >> appended to The Stable Tree of KSM.
> >>
> >> We need to add the count of empty pages to let users know how many empty
> >> pages are merged with kernel zero page(s).
> >
> > why?
> >
> > do you need to know how many untouched zero pages a process has?
> >
> > does it make a difference if the zero page is really untouched or if it
> > was touched in the past but it is now zero?
>
> I'd also like to understand the rationale. Is it about estimating memory
> demands when each and every shared page could get unshared?
>
On 29.09.22 12:42, Claudio Imbrenda wrote:
> On Thu, 29 Sep 2022 02:52:06 +0000
> [email protected] wrote:
>
>> From: xu xin <[email protected]>
>>
>> Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
>> use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
>> after enabling use_zero_pages, all empty pages that are merged with
>> kernel zero page are not counted in pages_sharing or pages_shared.
>
> that's because those pages are not shared between different processes.
They are probably the most shared pages between processes in the kernel.
They are simply not KSM pages, that's what makes accounting tricky here.
>
>> That is because the rmap_items of these ksm zero pages are not
>> appended to The Stable Tree of KSM.
>>
>> We need to add the count of empty pages to let users know how many empty
>> pages are merged with kernel zero page(s).
>
> why?
>
> do you need to know how many untouched zero pages a process has?
>
> does it make a difference if the zero page is really untouched or if it
> was touched in the past but it is now zero?
I'd also like to understand the rationale. Is it about estimating memory
demands when each and every shared page could get unshared?
--
Thanks,
David / dhildenb
On Thu, 29 Sep 2022 13:12:44 +0200
David Hildenbrand <[email protected]> wrote:
> On 29.09.22 12:36, Claudio Imbrenda wrote:
> > On Thu, 29 Sep 2022 11:21:44 +0200
> > David Hildenbrand <[email protected]> wrote:
> >
> >> On 29.09.22 04:52, [email protected] wrote:
> >>> From: xu xin <[email protected]>
> >>>
> >>> Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
> >>> use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
> >>> after enabling use_zero_pages, all empty pages that are merged with
> >>> kernel zero page are not counted in pages_sharing or pages_shared.
> >>> That is because the rmap_items of these ksm zero pages are not
> >>> appended to The Stable Tree of KSM.
> >>>
> >>> We need to add the count of empty pages to let users know how many empty
> >>> pages are merged with kernel zero page(s).
> >>>
> >>> Please see the subsequent patches for details.
> >>
> >> Just raising the topic here because it's related to the KSM usage of the
> >> shared zero-page:
> >>
> >> MADV_UNMERGEABLE and other ways to trigger unsharing will *not* unshare
> >> the shared zeropage as placed by KSM (which is against the
> >> MADV_UNMERGEABLE documentation at least). It will only unshare actual
> >> KSM pages. We might not want want to blindly unshare all shared
> >> zeropages in applicable VMAs ... using a dedicated shared zero (KSM)
> >> page -- instead of the generic zero page -- might be one way to handle
> >> this cleaner.
> >
> > I don't understand why do you need this.
> >
> > first of all, one zero page would not be enough (depending on the
> > architecture, e.g. on s390x you need many). the whole point of zero
> > page merging is that one zero page is not enough.
>
> I don't follow. Having multiple ones is a pure optimization on s390x (I
> recall something about cache coloring), no? So why should we blindly
> care in the special KSM use case here?
because merging pages full of zeroes with only one page will have
negative performance on those architectures that need cache colouring
(and s390 is not even the only architecture that needs it)
the whole point of merging pages full of zeroes with zero pages is to
not lose the cache colouring.
otherwise you could just let KSM merge all pages full of zeroes with
one page (which is what happens without use_zero_pages), and all the
numbers are correct.
if you are not on s390 or MIPS, you have no use for use_zero_pages
>
> >
> > second, once a page is merged with a zero page, it's not really handled
> > by KSM anymore. if you have a big allocation, of which you only touch a
> > few pages, would the rest be considered "merged"? no, it's just zero
> > pages, right?
>
> If you haven't touched memory, there is nothing populated -- no shared
> zeropage.
>
> We only populate shared zeropages in private anonymous mappings on read
> access without prior write.
that's what I meant. if you read without writing, you get zero pages.
you don't consider those to be "shared" from a KSM point of view
does it make a difference if some pages that have been written to but
now only contain zeroes are discarded and mapped back to the zero pages?
>
> > this is the same, except that we take present pages with zeroes in it
> > and we discard them and map them to zero pages. it's kinda like if we
> > had never touched them.
>
> MADV_UNMERGEABLE
>
> "Undo the effect of an earlier MADV_MERGEABLE operation on the
> specified address range; KSM unmerges whatever pages it had merged in
> the address range specified by addr and length."
>
> Now please explain to me how not undoing a zeropage merging is correct
> according to this documentation.
>
because once it's discarded and replaced with a zero page, the page is
not handled by KSM anymore.
I understand what you mean, that KSM did an action that now cannot be
undone, but how would you differentiate between zero pages that were
never written to and pages that had been written to and then discarded
and mapped back to a zero page because they only contained zeroes?
On 29.09.22 14:05, Claudio Imbrenda wrote:
> On Thu, 29 Sep 2022 13:12:44 +0200
> David Hildenbrand <[email protected]> wrote:
>
>> On 29.09.22 12:36, Claudio Imbrenda wrote:
>>> On Thu, 29 Sep 2022 11:21:44 +0200
>>> David Hildenbrand <[email protected]> wrote:
>>>
>>>> On 29.09.22 04:52, [email protected] wrote:
>>>>> From: xu xin <[email protected]>
>>>>>
>>>>> Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
>>>>> use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
>>>>> after enabling use_zero_pages, all empty pages that are merged with
>>>>> kernel zero page are not counted in pages_sharing or pages_shared.
>>>>> That is because the rmap_items of these ksm zero pages are not
>>>>> appended to The Stable Tree of KSM.
>>>>>
>>>>> We need to add the count of empty pages to let users know how many empty
>>>>> pages are merged with kernel zero page(s).
>>>>>
>>>>> Please see the subsequent patches for details.
>>>>
>>>> Just raising the topic here because it's related to the KSM usage of the
>>>> shared zero-page:
>>>>
>>>> MADV_UNMERGEABLE and other ways to trigger unsharing will *not* unshare
>>>> the shared zeropage as placed by KSM (which is against the
>>>> MADV_UNMERGEABLE documentation at least). It will only unshare actual
>>>> KSM pages. We might not want want to blindly unshare all shared
>>>> zeropages in applicable VMAs ... using a dedicated shared zero (KSM)
>>>> page -- instead of the generic zero page -- might be one way to handle
>>>> this cleaner.
>>>
>>> I don't understand why do you need this.
>>>
>>> first of all, one zero page would not be enough (depending on the
>>> architecture, e.g. on s390x you need many). the whole point of zero
>>> page merging is that one zero page is not enough.
>>
>> I don't follow. Having multiple ones is a pure optimization on s390x (I
>> recall something about cache coloring), no? So why should we blindly
>> care in the special KSM use case here?
>
> because merging pages full of zeroes with only one page will have
> negative performance on those architectures that need cache colouring
> (and s390 is not even the only architecture that needs it)
>
> the whole point of merging pages full of zeroes with zero pages is to
> not lose the cache colouring.
>
> otherwise you could just let KSM merge all pages full of zeroes with
> one page (which is what happens without use_zero_pages), and all the
> numbers are correct.
>
> if you are not on s390 or MIPS, you have no use for use_zero_pages
Ah, I see now that use_zero_pages is really only (mostly) s390x
specific. I already wondered why on earth we would really need that,
thanks for pointing that out.
One question I'd have is: why is the shared zero page treated special in
KSM then *at all*. Cache coloring problem should apply to *each and
every* deduplicated page.
Why is a page filled with 0xff any different from a page filled with 0x0?
Yes, I read e86c59b1b12d. It doesn't mention any actual performance
numbers and if the performance only applies to some microbenchmarks
nobody cares about.
Did you post some benchmarks results back then? That would be
interesting. I assume that the shared zeropage was simply the low
hanging fruit.
>
>>
>>>
>>> second, once a page is merged with a zero page, it's not really handled
>>> by KSM anymore. if you have a big allocation, of which you only touch a
>>> few pages, would the rest be considered "merged"? no, it's just zero
>>> pages, right?
>>
>> If you haven't touched memory, there is nothing populated -- no shared
>> zeropage.
>>
>> We only populate shared zeropages in private anonymous mappings on read
>> access without prior write.
>
> that's what I meant. if you read without writing, you get zero pages.
> you don't consider those to be "shared" from a KSM point of view
>
> does it make a difference if some pages that have been written to but
> now only contain zeroes are discarded and mapped back to the zero pages?
That's a good question. When it comes to unmerging, you'd might expect
that whatever was deduplicated will get duplicated again -- and your
memory consumption will adjust accordingly. The stats might give an
admin an idea regarding how much memory is actually overcommited. See
below on the important case where we essentially never see the shared
zeropage.
The motivation behind these patches would be great -- what is the KSM
user and what does it want to achieve with these numbers?
>
>>
>>> this is the same, except that we take present pages with zeroes in it
>>> and we discard them and map them to zero pages. it's kinda like if we
>>> had never touched them.
>>
>> MADV_UNMERGEABLE
>>
>> "Undo the effect of an earlier MADV_MERGEABLE operation on the
>> specified address range; KSM unmerges whatever pages it had merged in
>> the address range specified by addr and length."
>>
>> Now please explain to me how not undoing a zeropage merging is correct
>> according to this documentation.
>>
>
> because once it's discarded and replaced with a zero page, the page is
> not handled by KSM anymore.
>
> I understand what you mean, that KSM did an action that now cannot be
> undone, but how would you differentiate between zero pages that were
> never written to and pages that had been written to and then discarded
> and mapped back to a zero page because they only contained zeroes?
An application that always properly initializes (write at least some
part once) all its memory will never have the shared zeropage mapped. VM
guest memory comes to mind, probably still the most important KSM use case.
There are currently some remaining issues when taking a GUP R/O longterm
pin on such a page (e.g., vfio). In contrast to KSM pages, such pins are
not reliable for the shared zeropage, but I have fixes for them pending.
However, that is rather a corner case (it didn't work at all correctly a
while ago) and will be sorted out soon.
So the question is if MADV_UNMERGEABLE etc. (stats) should be adjusted
to document the behavior with use_zero_pages accordingly.
--
Thanks,
David / dhildenb
>> On 29.09.22 12:42, Claudio Imbrenda wrote:
>> > On Thu, 29 Sep 2022 02:52:06 +0000
>> > [email protected] wrote:
>> >
>> >> From: xu xin <[email protected]>
>> >>
>> >> Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
>> >> use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
>> >> after enabling use_zero_pages, all empty pages that are merged with
>> >> kernel zero page are not counted in pages_sharing or pages_shared.
>> >
>> > that's because those pages are not shared between different processes.
>>
>> They are probably the most shared pages between processes in the kernel.
>
>shared from the kernel, though, not from other processes (that's what I
>meant)
>
>> They are simply not KSM pages, that's what makes accounting tricky here.
>
>exactly. and those pages get shared all the time even without KSM, so
>why care about those now?
>
>does it make a difference why a page is a zero page?
WI's necessary to show these sharing zeros pages. Because:
1) Turning on/off use_zero_pages shouldn't make it so not transparent with the
sharing zero pages. When administrators enable KSM and turn on use_zero_pages,
if much memory increases due to zero pages sharing but they don't know the
reasons compared to turning off use_zero_pages, isn't it confusing?
2) If no need to let users know how many full-zero-filled pages are merged by KSM
due to use_zero_pages, then also no need to show pages_sharing and pages_shared
to users. Besides, the description of pages_sharing in Documentation is wrong and
MISLEADING when enabling use_zero_pages.
3) As David supposes, it also help for estimating memory demands when each and every
shared page could get unshared.
>
>>
>> >
>> >> That is because the rmap_items of these ksm zero pages are not
>> >> appended to The Stable Tree of KSM.
>> >>
>> >> We need to add the count of empty pages to let users know how many empty
>> >> pages are merged with kernel zero page(s).
>> >
>> > why?
>> >
>> > do you need to know how many untouched zero pages a process has?
>> >
>> > does it make a difference if the zero page is really untouched or if it
>> > was touched in the past but it is now zero?
>>
>> I'd also like to understand the rationale. Is it about estimating memory
>> demands when each and every shared page could get unshared?
>>
>
On Thu, 29 Sep 2022 19:53:34 +0200
David Hildenbrand <[email protected]> wrote:
> On 29.09.22 14:05, Claudio Imbrenda wrote:
> > On Thu, 29 Sep 2022 13:12:44 +0200
> > David Hildenbrand <[email protected]> wrote:
> >
> >> On 29.09.22 12:36, Claudio Imbrenda wrote:
> >>> On Thu, 29 Sep 2022 11:21:44 +0200
> >>> David Hildenbrand <[email protected]> wrote:
> >>>
> >>>> On 29.09.22 04:52, [email protected] wrote:
> >>>>> From: xu xin <[email protected]>
> >>>>>
> >>>>> Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
> >>>>> use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
> >>>>> after enabling use_zero_pages, all empty pages that are merged with
> >>>>> kernel zero page are not counted in pages_sharing or pages_shared.
> >>>>> That is because the rmap_items of these ksm zero pages are not
> >>>>> appended to The Stable Tree of KSM.
> >>>>>
> >>>>> We need to add the count of empty pages to let users know how many empty
> >>>>> pages are merged with kernel zero page(s).
> >>>>>
> >>>>> Please see the subsequent patches for details.
> >>>>
> >>>> Just raising the topic here because it's related to the KSM usage of the
> >>>> shared zero-page:
> >>>>
> >>>> MADV_UNMERGEABLE and other ways to trigger unsharing will *not* unshare
> >>>> the shared zeropage as placed by KSM (which is against the
> >>>> MADV_UNMERGEABLE documentation at least). It will only unshare actual
> >>>> KSM pages. We might not want want to blindly unshare all shared
> >>>> zeropages in applicable VMAs ... using a dedicated shared zero (KSM)
> >>>> page -- instead of the generic zero page -- might be one way to handle
> >>>> this cleaner.
> >>>
> >>> I don't understand why do you need this.
> >>>
> >>> first of all, one zero page would not be enough (depending on the
> >>> architecture, e.g. on s390x you need many). the whole point of zero
> >>> page merging is that one zero page is not enough.
> >>
> >> I don't follow. Having multiple ones is a pure optimization on s390x (I
> >> recall something about cache coloring), no? So why should we blindly
> >> care in the special KSM use case here?
> >
> > because merging pages full of zeroes with only one page will have
> > negative performance on those architectures that need cache colouring
> > (and s390 is not even the only architecture that needs it)
> >
> > the whole point of merging pages full of zeroes with zero pages is to
> > not lose the cache colouring.
> >
> > otherwise you could just let KSM merge all pages full of zeroes with
> > one page (which is what happens without use_zero_pages), and all the
> > numbers are correct.
> >
> > if you are not on s390 or MIPS, you have no use for use_zero_pages
>
> Ah, I see now that use_zero_pages is really only (mostly) s390x
> specific. I already wondered why on earth we would really need that,
> thanks for pointing that out.
>
> One question I'd have is: why is the shared zero page treated special in
> KSM then *at all*. Cache coloring problem should apply to *each and
> every* deduplicated page.
true, but unsurprisingly the zero page is the most common one. e.g. if
you have a very big and very sparse matrix, you will read lots of
consecutive pages of zeroes. there is also a more important issue with
VMs, which is actually the reason of the feature (see below)
in general it's unlikely that you will read lots of consecutive pages
with the exact same non-zero content.
>
> Why is a page filled with 0xff any different from a page filled with 0x0?
without use_zero_pages, the multiple zero pages in a KVM guest will
be merged into one single page in the host, so the guest will lose the
benefits of coloured zero pages. unsurprisingly this has a big impact
on performance.
>
> Yes, I read e86c59b1b12d. It doesn't mention any actual performance
> numbers and if the performance only applies to some microbenchmarks
> nobody cares about.
that feature was implemented because of customer feedback. aka some
users hit the problem IRL.
>
> Did you post some benchmarks results back then? That would be
no, and I don't have the numbers at hand right now, but I remember it
was a very significant difference in the benchmark.
> interesting. I assume that the shared zeropage was simply the low
> hanging fruit.
of course a very complex system could be implemented to merge pages in
different buckets based on the "colour"; the result would probably be
that nothing is shared.
KSM is a tradeoff between memory consumption and CPU time. fixing zero
pages brings speed advantages (on architectures with coloured zero
pages) without sacrificing memory savings (on any architecture)
>
> >
> >>
> >>>
> >>> second, once a page is merged with a zero page, it's not really handled
> >>> by KSM anymore. if you have a big allocation, of which you only touch a
> >>> few pages, would the rest be considered "merged"? no, it's just zero
> >>> pages, right?
> >>
> >> If you haven't touched memory, there is nothing populated -- no shared
> >> zeropage.
> >>
> >> We only populate shared zeropages in private anonymous mappings on read
> >> access without prior write.
> >
> > that's what I meant. if you read without writing, you get zero pages.
> > you don't consider those to be "shared" from a KSM point of view
> >
> > does it make a difference if some pages that have been written to but
> > now only contain zeroes are discarded and mapped back to the zero pages?
>
> That's a good question. When it comes to unmerging, you'd might expect
> that whatever was deduplicated will get duplicated again -- and your
> memory consumption will adjust accordingly. The stats might give an
> admin an idea regarding how much memory is actually overcommited. See
> below on the important case where we essentially never see the shared
> zeropage.
>
> The motivation behind these patches would be great -- what is the KSM
> user and what does it want to achieve with these numbers?
anyone who works on big amounts of very sparse data, especially in a VM
(as I explained above, with KSM without use_zero_pages KVM guests lose
the zero page colouring)
>
> >
> >>
> >>> this is the same, except that we take present pages with zeroes in it
> >>> and we discard them and map them to zero pages. it's kinda like if we
> >>> had never touched them.
> >>
> >> MADV_UNMERGEABLE
> >>
> >> "Undo the effect of an earlier MADV_MERGEABLE operation on the
> >> specified address range; KSM unmerges whatever pages it had merged in
> >> the address range specified by addr and length."
> >>
> >> Now please explain to me how not undoing a zeropage merging is correct
> >> according to this documentation.
> >>
> >
> > because once it's discarded and replaced with a zero page, the page is
> > not handled by KSM anymore.
> >
> > I understand what you mean, that KSM did an action that now cannot be
> > undone, but how would you differentiate between zero pages that were
> > never written to and pages that had been written to and then discarded
> > and mapped back to a zero page because they only contained zeroes?
>
> An application that always properly initializes (write at least some
> part once) all its memory will never have the shared zeropage mapped. VM
> guest memory comes to mind, probably still the most important KSM use case.
>
> There are currently some remaining issues when taking a GUP R/O longterm
> pin on such a page (e.g., vfio). In contrast to KSM pages, such pins are
> not reliable for the shared zeropage, but I have fixes for them pending.
> However, that is rather a corner case (it didn't work at all correctly a
> while ago) and will be sorted out soon.
>
> So the question is if MADV_UNMERGEABLE etc. (stats) should be adjusted
> to document the behavior with use_zero_pages accordingly.
we can count how many times a page full of zeroes was merged with a
zero-page, but we can't count how many time one of those pages was then
unmerged. once it's merged it becomes a zero-page, like the others.
the documentation probably can be fixed to explain what's going on
On Fri, 30 Sep 2022 02:00:32 +0000
xu xin <[email protected]> wrote:
> >> On 29.09.22 12:42, Claudio Imbrenda wrote:
> >> > On Thu, 29 Sep 2022 02:52:06 +0000
> >> > [email protected] wrote:
> >> >
> >> >> From: xu xin <[email protected]>
> >> >>
> >> >> Before enabling use_zero_pages by setting /sys/kernel/mm/ksm/
> >> >> use_zero_pages to 1, pages_sharing of KSM is basically accurate. But
> >> >> after enabling use_zero_pages, all empty pages that are merged with
> >> >> kernel zero page are not counted in pages_sharing or pages_shared.
> >> >
> >> > that's because those pages are not shared between different processes.
> >>
> >> They are probably the most shared pages between processes in the kernel.
> >
> >shared from the kernel, though, not from other processes (that's what I
> >meant)
> >
> >> They are simply not KSM pages, that's what makes accounting tricky here.
> >
> >exactly. and those pages get shared all the time even without KSM, so
> >why care about those now?
> >
> >does it make a difference why a page is a zero page?
>
> WI's necessary to show these sharing zeros pages. Because:
>
> 1) Turning on/off use_zero_pages shouldn't make it so not transparent with the
> sharing zero pages. When administrators enable KSM and turn on use_zero_pages,
> if much memory increases due to zero pages sharing but they don't know the
> reasons compared to turning off use_zero_pages, isn't it confusing?
I'm not sure I understand what you mean here
>
> 2) If no need to let users know how many full-zero-filled pages are merged by KSM
> due to use_zero_pages, then also no need to show pages_sharing and pages_shared
that's not true.
showing pages_sharing and pages_shared helps understand how much memory
would be needed for unsharing everything.
when pages filled with zeroes are replaced with zero-pages, those will
not get unshared. there is no way to know if those were actual
zero-pages from the beginning, or if they were pages full of zeroes
that have been replaced by KSM.
> to users. Besides, the description of pages_sharing in Documentation is wrong and
> MISLEADING when enabling use_zero_pages.
I guess I can fix the documentation
>
> 3) As David supposes, it also help for estimating memory demands when each and every
> shared page could get unshared.
the current statistics show exactly how much memory would be needed
if everything would get unshared. zero-pages would not get unshared.
> >
> >>
> >> >
> >> >> That is because the rmap_items of these ksm zero pages are not
> >> >> appended to The Stable Tree of KSM.
> >> >>
> >> >> We need to add the count of empty pages to let users know how many empty
> >> >> pages are merged with kernel zero page(s).
> >> >
> >> > why?
> >> >
> >> > do you need to know how many untouched zero pages a process has?
> >> >
> >> > does it make a difference if the zero page is really untouched or if it
> >> > was touched in the past but it is now zero?
> >>
> >> I'd also like to understand the rationale. Is it about estimating memory
> >> demands when each and every shared page could get unshared?
> >>
> >
>>>
>>>>
>>>>>
>>>>> second, once a page is merged with a zero page, it's not really handled
>>>>> by KSM anymore. if you have a big allocation, of which you only touch a
>>>>> few pages, would the rest be considered "merged"? no, it's just zero
>>>>> pages, right?
>>>>
>>>> If you haven't touched memory, there is nothing populated -- no shared
>>>> zeropage.
>>>>
>>>> We only populate shared zeropages in private anonymous mappings on read
>>>> access without prior write.
>>>
>>> that's what I meant. if you read without writing, you get zero pages.
>>> you don't consider those to be "shared" from a KSM point of view
>>>
>>> does it make a difference if some pages that have been written to but
>>> now only contain zeroes are discarded and mapped back to the zero pages?
>>
>> That's a good question. When it comes to unmerging, you'd might expect
>> that whatever was deduplicated will get duplicated again -- and your
>> memory consumption will adjust accordingly. The stats might give an
>> admin an idea regarding how much memory is actually overcommited. See
>> below on the important case where we essentially never see the shared
>> zeropage.
>>
>> The motivation behind these patches would be great -- what is the KSM
>> user and what does it want to achieve with these numbers?
>
> anyone who works on big amounts of very sparse data, especially in a VM
> (as I explained above, with KSM without use_zero_pages KVM guests lose
> the zero page colouring)
I meant the patches in question here :)
>
>>
>>>
>>>>
>>>>> this is the same, except that we take present pages with zeroes in it
>>>>> and we discard them and map them to zero pages. it's kinda like if we
>>>>> had never touched them.
>>>>
>>>> MADV_UNMERGEABLE
>>>>
>>>> "Undo the effect of an earlier MADV_MERGEABLE operation on the
>>>> specified address range; KSM unmerges whatever pages it had merged in
>>>> the address range specified by addr and length."
>>>>
>>>> Now please explain to me how not undoing a zeropage merging is correct
>>>> according to this documentation.
>>>>
>>>
>>> because once it's discarded and replaced with a zero page, the page is
>>> not handled by KSM anymore.
>>>
>>> I understand what you mean, that KSM did an action that now cannot be
>>> undone, but how would you differentiate between zero pages that were
>>> never written to and pages that had been written to and then discarded
>>> and mapped back to a zero page because they only contained zeroes?
>>
>> An application that always properly initializes (write at least some
>> part once) all its memory will never have the shared zeropage mapped. VM
>> guest memory comes to mind, probably still the most important KSM use case.
>>
>> There are currently some remaining issues when taking a GUP R/O longterm
>> pin on such a page (e.g., vfio). In contrast to KSM pages, such pins are
>> not reliable for the shared zeropage, but I have fixes for them pending.
>> However, that is rather a corner case (it didn't work at all correctly a
>> while ago) and will be sorted out soon.
>>
>> So the question is if MADV_UNMERGEABLE etc. (stats) should be adjusted
>> to document the behavior with use_zero_pages accordingly.
>
> we can count how many times a page full of zeroes was merged with a
> zero-page, but we can't count how many time one of those pages was then
> unmerged. once it's merged it becomes a zero-page, like the others.
Right. We could special case on MADV_MERGEABLE ("how many zero pages do
we have mapped into MADV_MERGEABLE VMAs"), but it gets tricky once we
enable MADV_MERGEABLE on a region where the shared zeropage was already
populated. Probably not worth the effort.
--
Thanks,
David / dhildenb