2023-11-16 04:40:02

by Oliver Sang

[permalink] [raw]
Subject: [linus:master] [mm/ksm] 5e924ff54d: ltp.ksm01.fail


hi, Stefan Roesch,

we reported
"[linux-next:master] [mm/ksm] 5e924ff54d: ltp.ksm01_1.fail"
in
https://lore.kernel.org/all/[email protected]/
when this commit is in linux-next/master.

now we noticed this commit is merged in mainline, and we still observed
same issue. just FYI.


Hello,

kernel test robot noticed "ltp.ksm01.fail" on:

commit: 5e924ff54d088828794d9f1a4d5bf17808f7270e ("mm/ksm: add "smart" page scanning mode")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 3ca112b71f35dd5d99fc4571a56b5fc6f0c15814]
[test failed on linux-next/master 8728c14129df7a6e29188a2e737b4774fb200953]

in testcase: ltp
version: ltp-x86_64-14c1f76-1_20230715
with following parameters:

disk: 1HDD
test: mm-00/ksm01



compiler: gcc-12
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]



Running tests.......
<<<test_start>>>
tag=ksm01 stime=1699563923
cmdline="ksm01"
contacts=""
analysis=exit
<<<test_output>>>
tst_kconfig.c:87: TINFO: Parsing kernel config '/proc/config.gz'
tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s
mem.c:422: TINFO: wait for all children to stop.
mem.c:388: TINFO: child 0 stops.
mem.c:388: TINFO: child 1 stops.
mem.c:388: TINFO: child 2 stops.
mem.c:495: TINFO: KSM merging...
mem.c:434: TINFO: resume all children.
mem.c:422: TINFO: wait for all children to stop.
mem.c:344: TINFO: child 1 continues...
mem.c:347: TINFO: child 1 allocates 128 MB filled with 'a'
mem.c:344: TINFO: child 2 continues...
mem.c:347: TINFO: child 2 allocates 128 MB filled with 'a'
mem.c:344: TINFO: child 0 continues...
mem.c:347: TINFO: child 0 allocates 128 MB filled with 'c'
mem.c:400: TINFO: child 1 stops.
mem.c:400: TINFO: child 0 stops.
mem.c:400: TINFO: child 2 stops.
ksm_helper.c:36: TINFO: ksm daemon takes 2s to run two full scans
mem.c:264: TINFO: check!
mem.c:255: TPASS: run is 1.
mem.c:255: TPASS: pages_shared is 2.

....

mem.c:255: TPASS: pages_shared is 1.
mem.c:255: TPASS: pages_sharing is 98302.
mem.c:252: TFAIL: pages_volatile is not 0 but 1. <-----
mem.c:252: TFAIL: pages_unshared is not 1 but 0. <-----
mem.c:255: TPASS: sleep_millisecs is 0.
mem.c:255: TPASS: pages_to_scan is 98304.

....

mem.c:415: TINFO: child 1 finished.
mem.c:415: TINFO: child 2 finished.
mem.c:415: TINFO: child 0 finished.

Summary:
passed 40
failed 2
broken 0
skipped 0
warnings 0
incrementing stop
<<<execution_status>>>
initiation_status="ok"
duration=9 termination_type=exited termination_id=1 corefile=no
cutime=342 cstime=293
<<<test_end>>>
INFO: ltp-pan reported some tests FAIL
LTP Version: 20230516-75-g2e582e743

###############################################################

Done executing testcases.
LTP Version: 20230516-75-g2e582e743
###############################################################




The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231116/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


2023-11-16 13:25:06

by David Hildenbrand

[permalink] [raw]
Subject: Re: [linus:master] [mm/ksm] 5e924ff54d: ltp.ksm01.fail

On 16.11.23 05:39, kernel test robot wrote:
>
> hi, Stefan Roesch,
>
> we reported
> "[linux-next:master] [mm/ksm] 5e924ff54d: ltp.ksm01_1.fail"
> in
> https://lore.kernel.org/all/[email protected]/
> when this commit is in linux-next/master.
>
> now we noticed this commit is merged in mainline, and we still observed
> same issue. just FYI.
>
>
> Hello,
>
> kernel test robot noticed "ltp.ksm01.fail" on:
>
> commit: 5e924ff54d088828794d9f1a4d5bf17808f7270e ("mm/ksm: add "smart" page scanning mode")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master 3ca112b71f35dd5d99fc4571a56b5fc6f0c15814]
> [test failed on linux-next/master 8728c14129df7a6e29188a2e737b4774fb200953]
>
> in testcase: ltp
> version: ltp-x86_64-14c1f76-1_20230715
> with following parameters:
>
> disk: 1HDD
> test: mm-00/ksm01
>
>
>
> compiler: gcc-12
> test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <[email protected]>
> | Closes: https://lore.kernel.org/oe-lkp/[email protected]
>
>
>
> Running tests.......
> <<<test_start>>>
> tag=ksm01 stime=1699563923
> cmdline="ksm01"
> contacts=""
> analysis=exit
> <<<test_output>>>
> tst_kconfig.c:87: TINFO: Parsing kernel config '/proc/config.gz'
> tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s
> mem.c:422: TINFO: wait for all children to stop.
> mem.c:388: TINFO: child 0 stops.
> mem.c:388: TINFO: child 1 stops.
> mem.c:388: TINFO: child 2 stops.
> mem.c:495: TINFO: KSM merging...
> mem.c:434: TINFO: resume all children.
> mem.c:422: TINFO: wait for all children to stop.
> mem.c:344: TINFO: child 1 continues...
> mem.c:347: TINFO: child 1 allocates 128 MB filled with 'a'
> mem.c:344: TINFO: child 2 continues...
> mem.c:347: TINFO: child 2 allocates 128 MB filled with 'a'
> mem.c:344: TINFO: child 0 continues...
> mem.c:347: TINFO: child 0 allocates 128 MB filled with 'c'
> mem.c:400: TINFO: child 1 stops.
> mem.c:400: TINFO: child 0 stops.
> mem.c:400: TINFO: child 2 stops.
> ksm_helper.c:36: TINFO: ksm daemon takes 2s to run two full scans
> mem.c:264: TINFO: check!
> mem.c:255: TPASS: run is 1.
> mem.c:255: TPASS: pages_shared is 2.
>
> ....
>
> mem.c:255: TPASS: pages_shared is 1.
> mem.c:255: TPASS: pages_sharing is 98302.
> mem.c:252: TFAIL: pages_volatile is not 0 but 1. <-----
> mem.c:252: TFAIL: pages_unshared is not 1 but 0. <-----

@Stefan, is this simply related to the new scanning optimization (skip
and eventually not merge a pages within the "2 scans" windows, whereby
previously, they would have gotten merged)?

If so, we might just want to disable that optimization for that test case?

Alternatively, maybe we have to wait for "more" scan cycles instead of
only 2?

--
Cheers,

David / dhildenb

2023-11-17 16:53:14

by Stefan Roesch

[permalink] [raw]
Subject: Re: [linus:master] [mm/ksm] 5e924ff54d: ltp.ksm01.fail


David Hildenbrand <[email protected]> writes:

> On 16.11.23 05:39, kernel test robot wrote:
>> hi, Stefan Roesch,
>> we reported
>> "[linux-next:master] [mm/ksm] 5e924ff54d: ltp.ksm01_1.fail"
>> in
>> https://lore.kernel.org/all/[email protected]/
>> when this commit is in linux-next/master.
>> now we noticed this commit is merged in mainline, and we still observed
>> same issue. just FYI.
>> Hello,
>> kernel test robot noticed "ltp.ksm01.fail" on:
>> commit: 5e924ff54d088828794d9f1a4d5bf17808f7270e ("mm/ksm: add "smart" page
>> scanning mode")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>> [test failed on linus/master 3ca112b71f35dd5d99fc4571a56b5fc6f0c15814]
>> [test failed on linux-next/master 8728c14129df7a6e29188a2e737b4774fb200953]
>> in testcase: ltp
>> version: ltp-x86_64-14c1f76-1_20230715
>> with following parameters:
>> disk: 1HDD
>> test: mm-00/ksm01
>> compiler: gcc-12
>> test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory
>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>> If you fix the issue in a separate patch/commit (i.e. not just a new version
>> of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <[email protected]>
>> | Closes: https://lore.kernel.org/oe-lkp/[email protected]
>> Running tests.......
>> <<<test_start>>>
>> tag=ksm01 stime=1699563923
>> cmdline="ksm01"
>> contacts=""
>> analysis=exit
>> <<<test_output>>>
>> tst_kconfig.c:87: TINFO: Parsing kernel config '/proc/config.gz'
>> tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s
>> mem.c:422: TINFO: wait for all children to stop.
>> mem.c:388: TINFO: child 0 stops.
>> mem.c:388: TINFO: child 1 stops.
>> mem.c:388: TINFO: child 2 stops.
>> mem.c:495: TINFO: KSM merging...
>> mem.c:434: TINFO: resume all children.
>> mem.c:422: TINFO: wait for all children to stop.
>> mem.c:344: TINFO: child 1 continues...
>> mem.c:347: TINFO: child 1 allocates 128 MB filled with 'a'
>> mem.c:344: TINFO: child 2 continues...
>> mem.c:347: TINFO: child 2 allocates 128 MB filled with 'a'
>> mem.c:344: TINFO: child 0 continues...
>> mem.c:347: TINFO: child 0 allocates 128 MB filled with 'c'
>> mem.c:400: TINFO: child 1 stops.
>> mem.c:400: TINFO: child 0 stops.
>> mem.c:400: TINFO: child 2 stops.
>> ksm_helper.c:36: TINFO: ksm daemon takes 2s to run two full scans
>> mem.c:264: TINFO: check!
>> mem.c:255: TPASS: run is 1.
>> mem.c:255: TPASS: pages_shared is 2.
>> ....
>> mem.c:255: TPASS: pages_shared is 1.
>> mem.c:255: TPASS: pages_sharing is 98302.
>> mem.c:252: TFAIL: pages_volatile is not 0 but 1. <-----
>> mem.c:252: TFAIL: pages_unshared is not 1 but 0. <-----
>
> @Stefan, is this simply related to the new scanning optimization (skip and
> eventually not merge a pages within the "2 scans" windows, whereby previously,
> they would have gotten merged)?
>
> If so, we might just want to disable that optimization for that test case?
>
> Alternatively, maybe we have to wait for "more" scan cycles instead of only 2?

I'd expect this is caused by "smart scan", where we can skip pages.
The best is probably to disable the smart scan feature for this test.
The smart scan feature can be disabled by:

echo 0 > /sys/kernel/mm/ksm/smart_scan

I'll have a look at it today.

2023-11-22 17:22:08

by Stefan Roesch

[permalink] [raw]
Subject: Re: [linus:master] [mm/ksm] 5e924ff54d: ltp.ksm01.fail


Stefan Roesch <[email protected]> writes:

> David Hildenbrand <[email protected]> writes:
>
>> On 16.11.23 05:39, kernel test robot wrote:
>>> hi, Stefan Roesch,
>>> we reported
>>> "[linux-next:master] [mm/ksm] 5e924ff54d: ltp.ksm01_1.fail"
>>> in
>>> https://lore.kernel.org/all/[email protected]/
>>> when this commit is in linux-next/master.
>>> now we noticed this commit is merged in mainline, and we still observed
>>> same issue. just FYI.
>>> Hello,
>>> kernel test robot noticed "ltp.ksm01.fail" on:
>>> commit: 5e924ff54d088828794d9f1a4d5bf17808f7270e ("mm/ksm: add "smart" page
>>> scanning mode")
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>> [test failed on linus/master 3ca112b71f35dd5d99fc4571a56b5fc6f0c15814]
>>> [test failed on linux-next/master 8728c14129df7a6e29188a2e737b4774fb200953]
>>> in testcase: ltp
>>> version: ltp-x86_64-14c1f76-1_20230715
>>> with following parameters:
>>> disk: 1HDD
>>> test: mm-00/ksm01
>>> compiler: gcc-12
>>> test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory
>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>> If you fix the issue in a separate patch/commit (i.e. not just a new version
>>> of
>>> the same patch/commit), kindly add following tags
>>> | Reported-by: kernel test robot <[email protected]>
>>> | Closes: https://lore.kernel.org/oe-lkp/[email protected]
>>> Running tests.......
>>> <<<test_start>>>
>>> tag=ksm01 stime=1699563923
>>> cmdline="ksm01"
>>> contacts=""
>>> analysis=exit
>>> <<<test_output>>>
>>> tst_kconfig.c:87: TINFO: Parsing kernel config '/proc/config.gz'
>>> tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s
>>> mem.c:422: TINFO: wait for all children to stop.
>>> mem.c:388: TINFO: child 0 stops.
>>> mem.c:388: TINFO: child 1 stops.
>>> mem.c:388: TINFO: child 2 stops.
>>> mem.c:495: TINFO: KSM merging...
>>> mem.c:434: TINFO: resume all children.
>>> mem.c:422: TINFO: wait for all children to stop.
>>> mem.c:344: TINFO: child 1 continues...
>>> mem.c:347: TINFO: child 1 allocates 128 MB filled with 'a'
>>> mem.c:344: TINFO: child 2 continues...
>>> mem.c:347: TINFO: child 2 allocates 128 MB filled with 'a'
>>> mem.c:344: TINFO: child 0 continues...
>>> mem.c:347: TINFO: child 0 allocates 128 MB filled with 'c'
>>> mem.c:400: TINFO: child 1 stops.
>>> mem.c:400: TINFO: child 0 stops.
>>> mem.c:400: TINFO: child 2 stops.
>>> ksm_helper.c:36: TINFO: ksm daemon takes 2s to run two full scans
>>> mem.c:264: TINFO: check!
>>> mem.c:255: TPASS: run is 1.
>>> mem.c:255: TPASS: pages_shared is 2.
>>> ....
>>> mem.c:255: TPASS: pages_shared is 1.
>>> mem.c:255: TPASS: pages_sharing is 98302.
>>> mem.c:252: TFAIL: pages_volatile is not 0 but 1. <-----
>>> mem.c:252: TFAIL: pages_unshared is not 1 but 0. <-----
>>
>> @Stefan, is this simply related to the new scanning optimization (skip and
>> eventually not merge a pages within the "2 scans" windows, whereby previously,
>> they would have gotten merged)?
>>
>> If so, we might just want to disable that optimization for that test case?
>>
>> Alternatively, maybe we have to wait for "more" scan cycles instead of only 2?
>
> I'd expect this is caused by "smart scan", where we can skip pages.
> The best is probably to disable the smart scan feature for this test.
> The smart scan feature can be disabled by:
>
> echo 0 > /sys/kernel/mm/ksm/smart_scan
>
> I'll have a look at it today.
>

If I disable "smart scan", the testcase completes successully. This is
simply the case that for the testcase, it can "skip" a page (as the
"smart scan" feature is enabled).

The easiest fix is to disable smart scan for the ksm cases. I'll send an
ltp patch a bit later to address this issue.