2022-12-01 09:03:12

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [memcg] 1813e51eec: kernel-selftests.cgroup.test_kmem.test_kmem_memcg_deletion.fail

Greeting,

FYI, we noticed kernel-selftests.cgroup.test_kmem.test_kmem_memcg_deletion.fail due to commit (built with gcc-11):

commit: 1813e51eece0ad6f4aacaeb738e7cced46feb470 ("memcg: increase MEMCG_CHARGE_BATCH to 64")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linux-next/master 700e0cd3a5ce6a2cb90d9a2aab729b52f092a7d6]

in testcase: kernel-selftests
version: kernel-selftests-x86_64-2ed09c3b-1_20221128
with following parameters:

group: cgroup

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt

on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


# memory.current = 40161280
# slab + anon + file + kernel_stack = 14478624
# slab = 13453184
# anon = 0
# file = 0
# kernel_stack = 0
# pagetables = 0
# percpu = 1025440
# sock = 0
# not ok 2 test_kmem_memcg_deletion <--
# ok 3 test_kmem_proc_kpagecgroup
# ok 4 test_kmem_kernel_stacks
# ok 5 test_kmem_dead_cgroups
# ok 6 test_percpu_basic
not ok 2 selftests: cgroup: test_kmem # exit=1


Please be noted that there are other failed cases in the log which
should be unrelated with this commit. The only change we caught is that
"test_kmem_memcg_deletion" turned out to be "not ok" on this commit,
while it is "ok" on its parent commit. Thanks.


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (2.30 kB)
config-6.0.0-rc3-00089-g1813e51eece0 (172.04 kB)
job-script (6.51 kB)
dmesg.xz (65.14 kB)
kernel-selftests (28.41 kB)
job.yaml (5.63 kB)
reproduce (247.00 B)
Download all attachments

2022-12-01 10:39:01

by Michal Hocko

[permalink] [raw]
Subject: Re: [linus:master] [memcg] 1813e51eec: kernel-selftests.cgroup.test_kmem.test_kmem_memcg_deletion.fail

On Thu 01-12-22 16:05:44, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed kernel-selftests.cgroup.test_kmem.test_kmem_memcg_deletion.fail due to commit (built with gcc-11):
>
> commit: 1813e51eece0ad6f4aacaeb738e7cced46feb470 ("memcg: increase MEMCG_CHARGE_BATCH to 64")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linux-next/master 700e0cd3a5ce6a2cb90d9a2aab729b52f092a7d6]
>
> in testcase: kernel-selftests
> version: kernel-selftests-x86_64-2ed09c3b-1_20221128
> with following parameters:
>
> group: cgroup
>
> test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>
> on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> # memory.current = 40161280
> # slab + anon + file + kernel_stack = 14478624
> # slab = 13453184
> # anon = 0
> # file = 0
> # kernel_stack = 0
> # pagetables = 0
> # percpu = 1025440
> # sock = 0
> # not ok 2 test_kmem_memcg_deletion <--
> # ok 3 test_kmem_proc_kpagecgroup
> # ok 4 test_kmem_kernel_stacks
> # ok 5 test_kmem_dead_cgroups
> # ok 6 test_percpu_basic
> not ok 2 selftests: cgroup: test_kmem # exit=1

IIUC we need this
diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index 22b31ebb3513..1d073e28254b 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -24,7 +24,7 @@
* the maximum discrepancy between charge and vmstat entries is number
* of cpus multiplied by 32 pages.
*/
-#define MAX_VMSTAT_ERROR (4096 * 32 * get_nprocs())
+#define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())


static int alloc_dcache(const char *cgroup, void *arg)

But honestly, I am rather dubious of tests like this one. Does it really
give us any useful testing coverage?
--
Michal Hocko
SUSE Labs

2022-12-01 20:01:11

by Roman Gushchin

[permalink] [raw]
Subject: Re: [linus:master] [memcg] 1813e51eec: kernel-selftests.cgroup.test_kmem.test_kmem_memcg_deletion.fail

On Thu, Dec 01, 2022 at 11:16:34AM +0100, Michal Hocko wrote:
> On Thu 01-12-22 16:05:44, kernel test robot wrote:
> > Greeting,
> >
> > FYI, we noticed kernel-selftests.cgroup.test_kmem.test_kmem_memcg_deletion.fail due to commit (built with gcc-11):
> >
> > commit: 1813e51eece0ad6f4aacaeb738e7cced46feb470 ("memcg: increase MEMCG_CHARGE_BATCH to 64")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [test failed on linux-next/master 700e0cd3a5ce6a2cb90d9a2aab729b52f092a7d6]
> >
> > in testcase: kernel-selftests
> > version: kernel-selftests-x86_64-2ed09c3b-1_20221128
> > with following parameters:
> >
> > group: cgroup
> >
> > test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> > test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
> >
> > on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> > # memory.current = 40161280
> > # slab + anon + file + kernel_stack = 14478624
> > # slab = 13453184
> > # anon = 0
> > # file = 0
> > # kernel_stack = 0
> > # pagetables = 0
> > # percpu = 1025440
> > # sock = 0
> > # not ok 2 test_kmem_memcg_deletion <--
> > # ok 3 test_kmem_proc_kpagecgroup
> > # ok 4 test_kmem_kernel_stacks
> > # ok 5 test_kmem_dead_cgroups
> > # ok 6 test_percpu_basic
> > not ok 2 selftests: cgroup: test_kmem # exit=1
>
> IIUC we need this
> diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
> index 22b31ebb3513..1d073e28254b 100644
> --- a/tools/testing/selftests/cgroup/test_kmem.c
> +++ b/tools/testing/selftests/cgroup/test_kmem.c
> @@ -24,7 +24,7 @@
> * the maximum discrepancy between charge and vmstat entries is number
> * of cpus multiplied by 32 pages.
> */
> -#define MAX_VMSTAT_ERROR (4096 * 32 * get_nprocs())
> +#define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())

Yep.

>
>
> static int alloc_dcache(const char *cgroup, void *arg)
>
> But honestly, I am rather dubious of tests like this one. Does it really
> give us any useful testing coverage?

As I remember, we've had some issues in the past when some memcg stats leftovers
were not prpoerly propagated on the cgroup deletion, so that over time the
numbers on the parent level beacame completely crazy.

Thanks!

2022-12-02 09:18:19

by Michal Hocko

[permalink] [raw]
Subject: [PATCH] kselftests: cgroup: update kmem test precision tolerance

OK, so this is a full patch to fix this
---
From 7f338ed952ba4a100822004bc8399bf720b42899 Mon Sep 17 00:00:00 2001
From: Michal Hocko <[email protected]>
Date: Fri, 2 Dec 2022 09:45:29 +0100
Subject: [PATCH] kselftests: cgroup: update kmem test precision tolerance

1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
the batch size while this test case has been left behind. This has led
to a test failure reported by test bot:
not ok 2 selftests: cgroup: test_kmem # exit=1

Update the tolerance for the pcp charges to reflect the
MEMCG_CHARGE_BATCH change to fix this.

Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
tools/testing/selftests/cgroup/test_kmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index 22b31ebb3513..1d073e28254b 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -24,7 +24,7 @@
* the maximum discrepancy between charge and vmstat entries is number
* of cpus multiplied by 32 pages.
*/
-#define MAX_VMSTAT_ERROR (4096 * 32 * get_nprocs())
+#define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())


static int alloc_dcache(const char *cgroup, void *arg)
--
2.30.2

--
Michal Hocko
SUSE Labs

2022-12-02 17:06:10

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH] kselftests: cgroup: update kmem test precision tolerance

On Fri, Dec 02, 2022 at 09:50:26AM +0100, Michal Hocko wrote:
> OK, so this is a full patch to fix this
> ---
> From 7f338ed952ba4a100822004bc8399bf720b42899 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <[email protected]>
> Date: Fri, 2 Dec 2022 09:45:29 +0100
> Subject: [PATCH] kselftests: cgroup: update kmem test precision tolerance
>
> 1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
> the batch size while this test case has been left behind. This has led
> to a test failure reported by test bot:
> not ok 2 selftests: cgroup: test_kmem # exit=1
>
> Update the tolerance for the pcp charges to reflect the
> MEMCG_CHARGE_BATCH change to fix this.
>
> Reported-by: kernel test robot <[email protected]>
> Link: https://lore.kernel.org/oe-lkp/[email protected]
> Signed-off-by: Michal Hocko <[email protected]>

Acked-by: Shakeel Butt <[email protected]>

2022-12-02 17:37:40

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH] kselftests: cgroup: update kmem test precision tolerance

On Fri, Dec 02, 2022 at 09:50:26AM +0100, Michal Hocko wrote:
> OK, so this is a full patch to fix this
> ---
> From 7f338ed952ba4a100822004bc8399bf720b42899 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <[email protected]>
> Date: Fri, 2 Dec 2022 09:45:29 +0100
> Subject: [PATCH] kselftests: cgroup: update kmem test precision tolerance
>
> 1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
> the batch size while this test case has been left behind. This has led
> to a test failure reported by test bot:
> not ok 2 selftests: cgroup: test_kmem # exit=1
>
> Update the tolerance for the pcp charges to reflect the
> MEMCG_CHARGE_BATCH change to fix this.
>
> Reported-by: kernel test robot <[email protected]>
> Link: https://lore.kernel.org/oe-lkp/[email protected]
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> tools/testing/selftests/cgroup/test_kmem.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
> index 22b31ebb3513..1d073e28254b 100644
> --- a/tools/testing/selftests/cgroup/test_kmem.c
> +++ b/tools/testing/selftests/cgroup/test_kmem.c
> @@ -24,7 +24,7 @@
> * the maximum discrepancy between charge and vmstat entries is number
> * of cpus multiplied by 32 pages.
> */
> -#define MAX_VMSTAT_ERROR (4096 * 32 * get_nprocs())
> +#define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())

Hi Michal!

You need to update comments above too (it says 32 pages in a couple of places).
I actually sent the similar patch to Andrew yesterday, but hit reply and missed
adding people to cc.

Please, feel free to send your v2 with comments fixed and my acked-by,
or we can go with my version.

Thanks!

--

From 354850a59bb8e000490a23bc768f4d3183faf8e4 Mon Sep 17 00:00:00 2001
From: Roman Gushchin <[email protected]>
Date: Thu, 1 Dec 2022 18:05:07 -0800
Subject: [PATCH] kselftests/cgroup: adjust memcg charge batch size

Commit 1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64")
doubled the memcg charge batch size, which broke the kmem_memcg_deletion
test. Bump the corresponding error margin on the test side to fix the
problem.

Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/oe-lkp/[email protected]
Fixes: 1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64")
Signed-off-by: Roman Gushchin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Shakeel Butt <[email protected]>
---
tools/testing/selftests/cgroup/test_kmem.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index 22b31ebb3513..258ddc565deb 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -19,12 +19,12 @@


/*
- * Memory cgroup charging is performed using percpu batches 32 pages
+ * Memory cgroup charging is performed using percpu batches 64 pages
* big (look at MEMCG_CHARGE_BATCH), whereas memory.stat is exact. So
* the maximum discrepancy between charge and vmstat entries is number
- * of cpus multiplied by 32 pages.
+ * of cpus multiplied by 64 pages.
*/
-#define MAX_VMSTAT_ERROR (4096 * 32 * get_nprocs())
+#define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())


static int alloc_dcache(const char *cgroup, void *arg)
--
2.38.1

2022-12-02 18:03:50

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] kselftests: cgroup: update kmem test precision tolerance

On Fri, Dec 02, 2022 at 09:50:26AM +0100, Michal Hocko wrote:
> OK, so this is a full patch to fix this
> ---
> From 7f338ed952ba4a100822004bc8399bf720b42899 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <[email protected]>
> Date: Fri, 2 Dec 2022 09:45:29 +0100
> Subject: [PATCH] kselftests: cgroup: update kmem test precision tolerance
>
> 1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
> the batch size while this test case has been left behind. This has led
> to a test failure reported by test bot:
> not ok 2 selftests: cgroup: test_kmem # exit=1
>
> Update the tolerance for the pcp charges to reflect the
> MEMCG_CHARGE_BATCH change to fix this.
>
> Reported-by: kernel test robot <[email protected]>
> Link: https://lore.kernel.org/oe-lkp/[email protected]
> Signed-off-by: Michal Hocko <[email protected]>

The failure is gone after applying this patch. Thanks.

Tested-by: Yujie Liu <[email protected]>

=========================================================================================
compiler/group/kconfig/rootfs/tbox_group/testcase:
gcc-11/cgroup/x86_64-rhel-8.3-kselftests/debian-12-x86_64-20220629.cgz/lkp-icl-2sp5/kernel-selftests

commit:
1813e51eece0a ("memcg: increase MEMCG_CHARGE_BATCH to 64")
8046f9500f4b7 ("kselftests: cgroup: update kmem test precision tolerance")

1813e51eece0a 8046f9500f4b7
---------------- ------------- -------------
fail:runs %reproduction fail:runs
| | |
3:3 -100% :5 kernel-selftests.cgroup.test_kmem.test_kmem_memcg_deletion.fail

> ---
> tools/testing/selftests/cgroup/test_kmem.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
> index 22b31ebb3513..1d073e28254b 100644
> --- a/tools/testing/selftests/cgroup/test_kmem.c
> +++ b/tools/testing/selftests/cgroup/test_kmem.c
> @@ -24,7 +24,7 @@
> * the maximum discrepancy between charge and vmstat entries is number
> * of cpus multiplied by 32 pages.
> */
> -#define MAX_VMSTAT_ERROR (4096 * 32 * get_nprocs())
> +#define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
>
>
> static int alloc_dcache(const char *cgroup, void *arg)
> --
> 2.30.2
>
> --
> Michal Hocko
> SUSE Labs

2022-12-05 08:02:39

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] kselftests: cgroup: update kmem test precision tolerance

On Fri 02-12-22 09:16:17, Roman Gushchin wrote:
> On Fri, Dec 02, 2022 at 09:50:26AM +0100, Michal Hocko wrote:
> > OK, so this is a full patch to fix this
> > ---
> > From 7f338ed952ba4a100822004bc8399bf720b42899 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <[email protected]>
> > Date: Fri, 2 Dec 2022 09:45:29 +0100
> > Subject: [PATCH] kselftests: cgroup: update kmem test precision tolerance
> >
> > 1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
> > the batch size while this test case has been left behind. This has led
> > to a test failure reported by test bot:
> > not ok 2 selftests: cgroup: test_kmem # exit=1
> >
> > Update the tolerance for the pcp charges to reflect the
> > MEMCG_CHARGE_BATCH change to fix this.
> >
> > Reported-by: kernel test robot <[email protected]>
> > Link: https://lore.kernel.org/oe-lkp/[email protected]
> > Signed-off-by: Michal Hocko <[email protected]>
> > ---
> > tools/testing/selftests/cgroup/test_kmem.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
> > index 22b31ebb3513..1d073e28254b 100644
> > --- a/tools/testing/selftests/cgroup/test_kmem.c
> > +++ b/tools/testing/selftests/cgroup/test_kmem.c
> > @@ -24,7 +24,7 @@
> > * the maximum discrepancy between charge and vmstat entries is number
> > * of cpus multiplied by 32 pages.
> > */
> > -#define MAX_VMSTAT_ERROR (4096 * 32 * get_nprocs())
> > +#define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
>
> Hi Michal!
>
> You need to update comments above too (it says 32 pages in a couple of places).
> I actually sent the similar patch to Andrew yesterday, but hit reply and missed
> adding people to cc.
>
> Please, feel free to send your v2 with comments fixed and my acked-by,
> or we can go with my version.

It seems Andrew has already done all the fixups. Thanks both to you and
Andrew!
--
Michal Hocko
SUSE Labs