2018-08-02 11:02:22

by Kirill Tkhai

[permalink] [raw]
Subject: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

In case of shrink_slab_memcg() we do not zero nid, when shrinker
is not numa-aware. This is not a real problem, since currently
all memcg-aware shrinkers are numa-aware too (we have two:
super_block shrinker and workingset shrinker), but something may
change in the future.

(Andrew, this may be merged to mm-iterate-only-over-charged-shrinkers-during-memcg-shrink_slab)

Signed-off-by: Kirill Tkhai <[email protected]>
---
mm/vmscan.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ea0a46166e8e..0d980e801b8a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -455,6 +455,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
: SHRINK_BATCH;
long scanned = 0, next_deferred;

+ if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
+ nid = 0;
+
freeable = shrinker->count_objects(shrinker, shrinkctl);
if (freeable == 0 || freeable == SHRINK_EMPTY)
return freeable;
@@ -680,9 +683,6 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
.memcg = memcg,
};

- if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
- sc.nid = 0;
-
ret = do_shrink_slab(&sc, shrinker, priority);
if (ret == SHRINK_EMPTY)
ret = 0;



2018-08-02 16:56:25

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

On Thu, Aug 2, 2018 at 9:47 AM Yang Shi <[email protected]> wrote:
>
> On Thu, Aug 2, 2018 at 4:00 AM, Kirill Tkhai <[email protected]> wrote:
> > In case of shrink_slab_memcg() we do not zero nid, when shrinker
> > is not numa-aware. This is not a real problem, since currently
> > all memcg-aware shrinkers are numa-aware too (we have two:
>
> Actually, this is not true. huge_zero_page_shrinker is NOT numa-aware.
> deferred_split_shrinker is numa-aware.
>

But both huge_zero_page_shrinker and huge_zero_page_shrinker are not
memcg-aware shrinkers. I think Kirill is saying all memcg-aware
shrinkers are also numa-aware shrinkers.

Shakeel

2018-08-02 17:15:05

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

On Thu, Aug 2, 2018 at 4:00 AM, Kirill Tkhai <[email protected]> wrote:
> In case of shrink_slab_memcg() we do not zero nid, when shrinker
> is not numa-aware. This is not a real problem, since currently
> all memcg-aware shrinkers are numa-aware too (we have two:

Actually, this is not true. huge_zero_page_shrinker is NOT numa-aware.
deferred_split_shrinker is numa-aware.

Thanks,
Yang


> super_block shrinker and workingset shrinker), but something may
> change in the future.
>
> (Andrew, this may be merged to mm-iterate-only-over-charged-shrinkers-during-memcg-shrink_slab)
>
> Signed-off-by: Kirill Tkhai <[email protected]>
> ---
> mm/vmscan.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ea0a46166e8e..0d980e801b8a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -455,6 +455,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> : SHRINK_BATCH;
> long scanned = 0, next_deferred;
>
> + if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
> + nid = 0;
> +
> freeable = shrinker->count_objects(shrinker, shrinkctl);
> if (freeable == 0 || freeable == SHRINK_EMPTY)
> return freeable;
> @@ -680,9 +683,6 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
> .memcg = memcg,
> };
>
> - if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
> - sc.nid = 0;
> -
> ret = do_shrink_slab(&sc, shrinker, priority);
> if (ret == SHRINK_EMPTY)
> ret = 0;
>

2018-08-02 17:28:23

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

On Thu, Aug 2, 2018 at 9:54 AM, Shakeel Butt <[email protected]> wrote:
> On Thu, Aug 2, 2018 at 9:47 AM Yang Shi <[email protected]> wrote:
>>
>> On Thu, Aug 2, 2018 at 4:00 AM, Kirill Tkhai <[email protected]> wrote:
>> > In case of shrink_slab_memcg() we do not zero nid, when shrinker
>> > is not numa-aware. This is not a real problem, since currently
>> > all memcg-aware shrinkers are numa-aware too (we have two:
>>
>> Actually, this is not true. huge_zero_page_shrinker is NOT numa-aware.
>> deferred_split_shrinker is numa-aware.
>>
>
> But both huge_zero_page_shrinker and huge_zero_page_shrinker are not
> memcg-aware shrinkers. I think Kirill is saying all memcg-aware
> shrinkers are also numa-aware shrinkers.

Aha, thanks for reminding. Yes, I missed that memcg-aware part.

>
> Shakeel

2018-08-02 20:48:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

On Thu, 02 Aug 2018 14:00:52 +0300 Kirill Tkhai <[email protected]> wrote:

> In case of shrink_slab_memcg() we do not zero nid, when shrinker
> is not numa-aware. This is not a real problem, since currently
> all memcg-aware shrinkers are numa-aware too (we have two:
> super_block shrinker and workingset shrinker), but something may
> change in the future.

Fair enough.

> (Andrew, this may be merged to mm-iterate-only-over-charged-shrinkers-during-memcg-shrink_slab)

It got a bit messy so I got lazy and queued it as a separate patch.

btw, I have a note that https://lkml.org/lkml/2018/7/7/32 was caused by
this patch series. Is that the case and do you know if this was
addressed?


2018-08-03 07:15:28

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

On 02.08.2018 20:26, Yang Shi wrote:
> On Thu, Aug 2, 2018 at 9:54 AM, Shakeel Butt <[email protected]> wrote:
>> On Thu, Aug 2, 2018 at 9:47 AM Yang Shi <[email protected]> wrote:
>>>
>>> On Thu, Aug 2, 2018 at 4:00 AM, Kirill Tkhai <[email protected]> wrote:
>>>> In case of shrink_slab_memcg() we do not zero nid, when shrinker
>>>> is not numa-aware. This is not a real problem, since currently
>>>> all memcg-aware shrinkers are numa-aware too (we have two:
>>>
>>> Actually, this is not true. huge_zero_page_shrinker is NOT numa-aware.
>>> deferred_split_shrinker is numa-aware.
>>>
>>
>> But both huge_zero_page_shrinker and huge_zero_page_shrinker are not
>> memcg-aware shrinkers. I think Kirill is saying all memcg-aware
>> shrinkers are also numa-aware shrinkers.
>
> Aha, thanks for reminding. Yes, I missed that memcg-aware part.

Yes, I mean workingset_shadow_shrinker.

2018-08-03 09:03:45

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

On 02.08.2018 23:47, Andrew Morton wrote:
> On Thu, 02 Aug 2018 14:00:52 +0300 Kirill Tkhai <[email protected]> wrote:
>
>> In case of shrink_slab_memcg() we do not zero nid, when shrinker
>> is not numa-aware. This is not a real problem, since currently
>> all memcg-aware shrinkers are numa-aware too (we have two:
>> super_block shrinker and workingset shrinker), but something may
>> change in the future.
>
> Fair enough.
>
>> (Andrew, this may be merged to mm-iterate-only-over-charged-shrinkers-during-memcg-shrink_slab)
>
> It got a bit messy so I got lazy and queued it as a separate patch.
>
> btw, I have a note that https://lkml.org/lkml/2018/7/7/32 was caused by
> this patch series. Is that the case and do you know if this was
> addressed?

It's not related to the patchset. Bisect leads to:

commit c6aeb9d4c351 (HEAD, refs/bisect/bad)
Author: David Howells <[email protected]>
Date: Sun Jun 24 00:20:10 2018 +0100

kernfs, sysfs, cgroup, intel_rdt: Support fs_context

CC David.

David, please see reproducer at https://lkml.org/lkml/2018/7/7/32

Kirill

2018-08-03 10:32:41

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

The reproducer can be reduced to:

#define _GNU_SOURCE
#include <endian.h>
#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include <sys/syscall.h>
#include <sys/stat.h>
#include <sys/mount.h>
#include <unistd.h>
#include <fcntl.h>

const char path[] = "./file0";

int main()
{
mkdir(path, 0);
mount(path, path, "cgroup2", 0, 0);
chroot(path);
umount2(path, 0);
return 0;
}

and I've found two bugs (see attached patch). The issue is that
do_remount_sb() is called with fc == NULL from umount(), but both
cgroup_reconfigure() and do_remount_sb() dereference fc unconditionally.

But! I'm not sure why the reproducer works at all because the umount2() call
is *after* the chroot, so should fail on ENOENT before it even gets that far.
In fact, umount2() can be called multiple times, apparently successfully, and
doesn't actually unmount anything.

David
---
diff --git a/fs/super.c b/fs/super.c
index 3fe5d12b7697..321fbc244570 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -978,7 +978,10 @@ int do_remount_sb(struct super_block *sb, int sb_flags, void *data,
sb->s_op->remount_fs) {
if (sb->s_op->reconfigure) {
retval = sb->s_op->reconfigure(sb, fc);
- sb_flags = fc->sb_flags;
+ if (fc)
+ sb_flags = fc->sb_flags;
+ else
+ sb_flags = sb->s_flags;
if (retval == 0)
security_sb_reconfigure(fc);
} else {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f3238f38d152..48275fdce053 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1796,9 +1796,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags)

static int cgroup_reconfigure(struct kernfs_root *kf_root, struct fs_context *fc)
{
- struct cgroup_fs_context *ctx = cgroup_fc2context(fc);
+ if (fc) {
+ struct cgroup_fs_context *ctx = cgroup_fc2context(fc);

- apply_cgroup_root_flags(ctx->flags);
+ apply_cgroup_root_flags(ctx->flags);
+ }
return 0;
}


2018-08-03 11:02:04

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

On 03.08.2018 13:31, David Howells wrote:
> The reproducer can be reduced to:
>
> #define _GNU_SOURCE
> #include <endian.h>
> #include <stdint.h>
> #include <string.h>
> #include <stdio.h>
> #include <sys/syscall.h>
> #include <sys/stat.h>
> #include <sys/mount.h>
> #include <unistd.h>
> #include <fcntl.h>
>
> const char path[] = "./file0";
>
> int main()
> {
> mkdir(path, 0);
> mount(path, path, "cgroup2", 0, 0);
> chroot(path);
> umount2(path, 0);
> return 0;
> }
>
> and I've found two bugs (see attached patch). The issue is that
> do_remount_sb() is called with fc == NULL from umount(), but both
> cgroup_reconfigure() and do_remount_sb() dereference fc unconditionally.
>
> But! I'm not sure why the reproducer works at all because the umount2() call
> is *after* the chroot, so should fail on ENOENT before it even gets that far.
> In fact, umount2() can be called multiple times, apparently successfully, and
> doesn't actually unmount anything.

Before I also try to check why it works; just reporting you that the patch
works the problem in my environment. Thanks, David.

> ---
> diff --git a/fs/super.c b/fs/super.c
> index 3fe5d12b7697..321fbc244570 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -978,7 +978,10 @@ int do_remount_sb(struct super_block *sb, int sb_flags, void *data,
> sb->s_op->remount_fs) {
> if (sb->s_op->reconfigure) {
> retval = sb->s_op->reconfigure(sb, fc);
> - sb_flags = fc->sb_flags;
> + if (fc)
> + sb_flags = fc->sb_flags;
> + else
> + sb_flags = sb->s_flags;
> if (retval == 0)
> security_sb_reconfigure(fc);
> } else {
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index f3238f38d152..48275fdce053 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -1796,9 +1796,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags)
>
> static int cgroup_reconfigure(struct kernfs_root *kf_root, struct fs_context *fc)
> {
> - struct cgroup_fs_context *ctx = cgroup_fc2context(fc);
> + if (fc) {
> + struct cgroup_fs_context *ctx = cgroup_fc2context(fc);
>
> - apply_cgroup_root_flags(ctx->flags);
> + apply_cgroup_root_flags(ctx->flags);
> + }
> return 0;
> }
>
>

2018-08-03 11:06:44

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

On 03.08.2018 13:59, Kirill Tkhai wrote:
> On 03.08.2018 13:31, David Howells wrote:
>> The reproducer can be reduced to:
>>
>> #define _GNU_SOURCE
>> #include <endian.h>
>> #include <stdint.h>
>> #include <string.h>
>> #include <stdio.h>
>> #include <sys/syscall.h>
>> #include <sys/stat.h>
>> #include <sys/mount.h>
>> #include <unistd.h>
>> #include <fcntl.h>
>>
>> const char path[] = "./file0";
>>
>> int main()
>> {
>> mkdir(path, 0);
>> mount(path, path, "cgroup2", 0, 0);
>> chroot(path);
>> umount2(path, 0);
>> return 0;
>> }
>>
>> and I've found two bugs (see attached patch). The issue is that
>> do_remount_sb() is called with fc == NULL from umount(), but both
>> cgroup_reconfigure() and do_remount_sb() dereference fc unconditionally.
>>
>> But! I'm not sure why the reproducer works at all because the umount2() call
>> is *after* the chroot, so should fail on ENOENT before it even gets that far.
>> In fact, umount2() can be called multiple times, apparently successfully, and
>> doesn't actually unmount anything.
>
> Before I also try to check why it works; just reporting you that the patch
> works the problem in my environment. Thanks, David.

patch *fixes* the problem

>
>> ---
>> diff --git a/fs/super.c b/fs/super.c
>> index 3fe5d12b7697..321fbc244570 100644
>> --- a/fs/super.c
>> +++ b/fs/super.c
>> @@ -978,7 +978,10 @@ int do_remount_sb(struct super_block *sb, int sb_flags, void *data,
>> sb->s_op->remount_fs) {
>> if (sb->s_op->reconfigure) {
>> retval = sb->s_op->reconfigure(sb, fc);
>> - sb_flags = fc->sb_flags;
>> + if (fc)
>> + sb_flags = fc->sb_flags;
>> + else
>> + sb_flags = sb->s_flags;
>> if (retval == 0)
>> security_sb_reconfigure(fc);
>> } else {
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index f3238f38d152..48275fdce053 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -1796,9 +1796,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags)
>>
>> static int cgroup_reconfigure(struct kernfs_root *kf_root, struct fs_context *fc)
>> {
>> - struct cgroup_fs_context *ctx = cgroup_fc2context(fc);
>> + if (fc) {
>> + struct cgroup_fs_context *ctx = cgroup_fc2context(fc);
>>
>> - apply_cgroup_root_flags(ctx->flags);
>> + apply_cgroup_root_flags(ctx->flags);
>> + }
>> return 0;
>> }
>>
>>

2018-08-03 11:19:58

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

David Howells <[email protected]> wrote:

> But! I'm not sure why the reproducer works at all because the umount2() call
> is *after* the chroot, so should fail on ENOENT before it even gets that
> far.

No, it shouldn't. It did chroot() not chdir().

> In fact, umount2() can be called multiple times, apparently successfully, and
> doesn't actually unmount anything.

Okay, because it chroot'd into the directory. Should it return EBUSY though?

David

2018-08-03 12:03:05

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] mm: Move check for SHRINKER_NUMA_AWARE to do_shrink_slab()

Kirill Tkhai <[email protected]> wrote:

> > Before I also try to check why it works; just reporting you that the patch
> > works the problem in my environment. Thanks, David.
>
> patch *fixes* the problem

Thanks. I've folded the patch in.

David