2019-06-16 08:58:58

by Oleksandr Natalenko

[permalink] [raw]
Subject: [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints

Hi, Minchan.

This is a set of commits based on our discussion on your submission [1].

First 2 implement minor suggestions just for you to not forget to take
them into account.

uio.h inclusion was needed for me to be able to compile your series
successfully. Also please note I had to enable "Transparent Hugepage
Support" as well as "Enable idle page tracking" options, otherwise the
build failed. I guess this can be addressed by you better since the
errors are introduced with MADV_COLD introduction.

Last 2 commits are the actual KSM hints enablement. The first one
implements additional check for the case where the mmap_sem is taken for
write, and the second one just allows KSM hints to be used by the remote
interface.

I'm not Cc'ing else anyone except two mailing lists to not distract
people unnecessarily. If you are fine with this addition, please use it
for your next iteration of process_madvise(), and then you'll Cc all the
people needed.

Thanks.

[1] https://lore.kernel.org/lkml/[email protected]/

Oleksandr Natalenko (5):
mm: rename madvise_core to madvise_common
mm: revert madvise_inject_error line split
mm: include uio.h to madvise.c
mm/madvise: employ mmget_still_valid for write lock
mm/madvise: allow KSM hints for remote API

mm/madvise.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)

--
2.22.0


2019-06-16 08:59:06

by Oleksandr Natalenko

[permalink] [raw]
Subject: [PATCH NOTFORMERGE 3/5] mm: include uio.h to madvise.c

I couldn't compile it w/o this header.

Signed-off-by: Oleksandr Natalenko <[email protected]>
---
mm/madvise.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/madvise.c b/mm/madvise.c
index 70aeb54f3e1c..9755340da157 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -25,6 +25,7 @@
#include <linux/swapops.h>
#include <linux/shmem_fs.h>
#include <linux/mmu_notifier.h>
+#include <linux/uio.h>

#include <asm/tlb.h>

--
2.22.0

2019-06-16 08:59:16

by Oleksandr Natalenko

[permalink] [raw]
Subject: [PATCH NOTFORMERGE 4/5] mm/madvise: employ mmget_still_valid for write lock

Do the very same trick as we already do since 04f5866e41fb. KSM hints
will require locking mmap_sem for write since they modify vm_flags, so
for remote KSM hinting this additional check is needed.

Signed-off-by: Oleksandr Natalenko <[email protected]>
---
mm/madvise.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/madvise.c b/mm/madvise.c
index 9755340da157..84f899b1b6da 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1049,6 +1049,8 @@ static int madvise_common(struct task_struct *task, struct mm_struct *mm,
if (write) {
if (down_write_killable(&mm->mmap_sem))
return -EINTR;
+ if (current->mm != mm && !mmget_still_valid(mm))
+ goto skip_mm;
} else {
down_read(&mm->mmap_sem);
}
@@ -1099,6 +1101,7 @@ static int madvise_common(struct task_struct *task, struct mm_struct *mm,
}
out:
blk_finish_plug(&plug);
+skip_mm:
if (write)
up_write(&mm->mmap_sem);
else
--
2.22.0

2019-06-16 08:59:39

by Oleksandr Natalenko

[permalink] [raw]
Subject: [PATCH NOTFORMERGE 5/5] mm/madvise: allow KSM hints for remote API

It all began with the fact that KSM works only on memory that is marked
by madvise(). And the only way to get around that is to either:

* use LD_PRELOAD; or
* patch the kernel with something like UKSM or PKSM.

(i skip ptrace can of worms here intentionally)

To overcome this restriction, lets employ a new remote madvise API. This
can be used by some small userspace helper daemon that will do auto-KSM
job for us.

I think of two major consumers of remote KSM hints:

* hosts, that run containers, especially similar ones and especially in
a trusted environment, sharing the same runtime like Node.js;

* heavy applications, that can be run in multiple instances, not
limited to opensource ones like Firefox, but also those that cannot be
modified since they are binary-only and, maybe, statically linked.

Speaking of statistics, more numbers can be found in the very first
submission, that is related to this one [1]. For my current setup with
two Firefox instances I get 100 to 200 MiB saved for the second instance
depending on the amount of tabs.

1 FF instance with 15 tabs:

$ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
410

2 FF instances, second one has 12 tabs (all the tabs are different):

$ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
592

At the very moment I do not have specific numbers for containerised
workload, but those should be comparable in case the containers share
similar/same runtime.

[1] https://lore.kernel.org/patchwork/patch/1012142/

Signed-off-by: Oleksandr Natalenko <[email protected]>
---
mm/madvise.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/madvise.c b/mm/madvise.c
index 84f899b1b6da..e8f9c49794a3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -991,6 +991,8 @@ process_madvise_behavior_valid(int behavior)
switch (behavior) {
case MADV_COLD:
case MADV_PAGEOUT:
+ case MADV_MERGEABLE:
+ case MADV_UNMERGEABLE:
return true;

default:
--
2.22.0

2019-06-16 09:00:30

by Oleksandr Natalenko

[permalink] [raw]
Subject: [PATCH NOTFORMERGE 1/5] mm: rename madvise_core to madvise_common

"core" usually means something very different within the kernel land,
thus lets just follow the way it is handled in mutexes, rw_semaphores
etc and name common things as "_common".

Signed-off-by: Oleksandr Natalenko <[email protected]>
---
mm/madvise.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 94d782097afd..edb7184f665c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -998,7 +998,7 @@ process_madvise_behavior_valid(int behavior)
}

/*
- * madvise_core - request behavior hint to address range of the target process
+ * madvise_common - request behavior hint to address range of the target process
*
* @task: task_struct got behavior hint, not giving the hint
* @mm: mm_struct got behavior hint, not giving the hint
@@ -1009,7 +1009,7 @@ process_madvise_behavior_valid(int behavior)
* @task could be a zombie leader if it calls sys_exit so accessing mm_struct
* via task->mm is prohibited. Please use @mm insetad of task->mm.
*/
-static int madvise_core(struct task_struct *task, struct mm_struct *mm,
+static int madvise_common(struct task_struct *task, struct mm_struct *mm,
unsigned long start, size_t len_in, int behavior)
{
unsigned long end, tmp;
@@ -1132,7 +1132,7 @@ static int pr_madvise_copy_param(struct pr_madvise_param __user *u_param,
return ret;
}

-static int process_madvise_core(struct task_struct *tsk, struct mm_struct *mm,
+static int process_madvise_common(struct task_struct *tsk, struct mm_struct *mm,
int *behaviors,
struct iov_iter *iter,
const struct iovec *range_vec,
@@ -1144,7 +1144,7 @@ static int process_madvise_core(struct task_struct *tsk, struct mm_struct *mm,
for (i = 0; i < riovcnt && iov_iter_count(iter); i++) {
err = -EINVAL;
if (process_madvise_behavior_valid(behaviors[i]))
- err = madvise_core(tsk, mm,
+ err = madvise_common(tsk, mm,
(unsigned long)range_vec[i].iov_base,
range_vec[i].iov_len, behaviors[i]);

@@ -1220,7 +1220,7 @@ static int process_madvise_core(struct task_struct *tsk, struct mm_struct *mm,
*/
SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
{
- return madvise_core(current, current->mm, start, len_in, behavior);
+ return madvise_common(current, current->mm, start, len_in, behavior);
}


@@ -1252,7 +1252,7 @@ SYSCALL_DEFINE3(process_madvise, int, pidfd,

/*
* We don't support cookie to gaurantee address space atomicity yet.
- * Once we implment cookie, process_madvise_core need to hold mmap_sme
+ * Once we implment cookie, process_madvise_common need to hold mmap_sme
* during entire operation to guarantee atomicity.
*/
if (params.cookie != 0)
@@ -1316,7 +1316,7 @@ SYSCALL_DEFINE3(process_madvise, int, pidfd,
goto release_task;
}

- ret = process_madvise_core(task, mm, behaviors, &iter, iov_r, nr_elem);
+ ret = process_madvise_common(task, mm, behaviors, &iter, iov_r, nr_elem);
mmput(mm);
release_task:
put_task_struct(task);
--
2.22.0

2019-06-16 09:01:02

by Oleksandr Natalenko

[permalink] [raw]
Subject: [PATCH NOTFORMERGE 2/5] mm: revert madvise_inject_error line split

Just to highlight it after our conversation.

Signed-off-by: Oleksandr Natalenko <[email protected]>
---
mm/madvise.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index edb7184f665c..70aeb54f3e1c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1041,8 +1041,7 @@ static int madvise_common(struct task_struct *task, struct mm_struct *mm,

#ifdef CONFIG_MEMORY_FAILURE
if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
- return madvise_inject_error(behavior,
- start, start + len_in);
+ return madvise_inject_error(behavior, start, start + len_in);
#endif

write = madvise_need_mmap_write(behavior);
--
2.22.0

2019-12-10 10:50:47

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints

Hello, Minchan.

On Sun, Jun 16, 2019 at 10:58:30AM +0200, Oleksandr Natalenko wrote:
> This is a set of commits based on our discussion on your submission [1].
>
> First 2 implement minor suggestions just for you to not forget to take
> them into account.
>
> uio.h inclusion was needed for me to be able to compile your series
> successfully. Also please note I had to enable "Transparent Hugepage
> Support" as well as "Enable idle page tracking" options, otherwise the
> build failed. I guess this can be addressed by you better since the
> errors are introduced with MADV_COLD introduction.
>
> Last 2 commits are the actual KSM hints enablement. The first one
> implements additional check for the case where the mmap_sem is taken for
> write, and the second one just allows KSM hints to be used by the remote
> interface.
>
> I'm not Cc'ing else anyone except two mailing lists to not distract
> people unnecessarily. If you are fine with this addition, please use it
> for your next iteration of process_madvise(), and then you'll Cc all the
> people needed.
>
> Thanks.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Oleksandr Natalenko (5):
> mm: rename madvise_core to madvise_common
> mm: revert madvise_inject_error line split
> mm: include uio.h to madvise.c
> mm/madvise: employ mmget_still_valid for write lock
> mm/madvise: allow KSM hints for remote API
>
> mm/madvise.c | 23 ++++++++++++++---------
> 1 file changed, 14 insertions(+), 9 deletions(-)
>
> --
> 2.22.0
>

This is a gentle ping. Are you still planning to submit process_madvise() solution?

--
Best regards,
Oleksandr Natalenko (post-factum)
Senior Software Maintenance Engineer

2019-12-11 01:35:50

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints

Hi Oleksandr,

On Tue, Dec 10, 2019 at 11:49:39AM +0100, Oleksandr Natalenko wrote:
> Hello, Minchan.
>
> On Sun, Jun 16, 2019 at 10:58:30AM +0200, Oleksandr Natalenko wrote:
> > This is a set of commits based on our discussion on your submission [1].
> >
> > First 2 implement minor suggestions just for you to not forget to take
> > them into account.
> >
> > uio.h inclusion was needed for me to be able to compile your series
> > successfully. Also please note I had to enable "Transparent Hugepage
> > Support" as well as "Enable idle page tracking" options, otherwise the
> > build failed. I guess this can be addressed by you better since the
> > errors are introduced with MADV_COLD introduction.
> >
> > Last 2 commits are the actual KSM hints enablement. The first one
> > implements additional check for the case where the mmap_sem is taken for
> > write, and the second one just allows KSM hints to be used by the remote
> > interface.
> >
> > I'm not Cc'ing else anyone except two mailing lists to not distract
> > people unnecessarily. If you are fine with this addition, please use it
> > for your next iteration of process_madvise(), and then you'll Cc all the
> > people needed.
> >
> > Thanks.
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/
> >
> > Oleksandr Natalenko (5):
> > mm: rename madvise_core to madvise_common
> > mm: revert madvise_inject_error line split
> > mm: include uio.h to madvise.c
> > mm/madvise: employ mmget_still_valid for write lock
> > mm/madvise: allow KSM hints for remote API
> >
> > mm/madvise.c | 23 ++++++++++++++---------
> > 1 file changed, 14 insertions(+), 9 deletions(-)
> >
> > --
> > 2.22.0
> >
>
> This is a gentle ping. Are you still planning to submit process_madvise() solution?

I'm really sorry for being slow progress.
I am stuck with internal stuff of company.
I will do best effort to send it within one or two weeks.

Thanks.