2022-03-25 19:29:09

by Mike Galbraith

[permalink] [raw]
Subject: new ltp memcg gripe bisects to b67bf49ce7aa ("post mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE")

Greetings,

$subject bisected in a kvm ala:

leap153:/usr/local/ltp # cat testme
export PATH=$PATH:`pwd`/testcases/bin
memcg_stat_test.sh
leap153:/usr/local/ltp # . ./testme

Usually leads to...
memcg_stat_test 3 TINFO: Test unevictable with MAP_LOCKED
memcg_stat_test 3 TINFO: Running memcg_process --mmap-lock1 -s 135168
memcg_stat_test 3 TINFO: Warming up pid: 3460
memcg_stat_test 3 TINFO: Process is still here after warm up: 3460
memcg_stat_test 3 TFAIL: unevictable is 122880, 135168 expected
...but may lead to...
memcg_stat_test 4 TINFO: Test unevictable with mlock
memcg_stat_test 4 TINFO: Running memcg_process --mmap-lock2 -s 135168
memcg_stat_test 4 TINFO: Warming up pid: 4271
memcg_stat_test 4 TINFO: Process is still here after warm up: 4271
memcg_stat_test 4 TFAIL: unevictable is 122880, 135168 expected
...or both. A wee bit flaky.

I wanted to verify with a revert on top of 85c7000fda00, but while the
revert patch applied, the result didn't boot. Config is full distro.

-Mike


2022-03-25 20:37:15

by Hugh Dickins

[permalink] [raw]
Subject: Re: new ltp memcg gripe bisects to b67bf49ce7aa ("post mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE")

On Fri, 25 Mar 2022, Mike Galbraith wrote:

> Greetings,
>
> $subject bisected in a kvm ala:
>
> leap153:/usr/local/ltp # cat testme
> export PATH=$PATH:`pwd`/testcases/bin
> memcg_stat_test.sh
> leap153:/usr/local/ltp # . ./testme
>
> Usually leads to...
> memcg_stat_test 3 TINFO: Test unevictable with MAP_LOCKED
> memcg_stat_test 3 TINFO: Running memcg_process --mmap-lock1 -s 135168
> memcg_stat_test 3 TINFO: Warming up pid: 3460
> memcg_stat_test 3 TINFO: Process is still here after warm up: 3460
> memcg_stat_test 3 TFAIL: unevictable is 122880, 135168 expected
> ...but may lead to...
> memcg_stat_test 4 TINFO: Test unevictable with mlock
> memcg_stat_test 4 TINFO: Running memcg_process --mmap-lock2 -s 135168
> memcg_stat_test 4 TINFO: Warming up pid: 4271
> memcg_stat_test 4 TINFO: Process is still here after warm up: 4271
> memcg_stat_test 4 TFAIL: unevictable is 122880, 135168 expected
> ...or both. A wee bit flaky.
>
> I wanted to verify with a revert on top of 85c7000fda00, but while the
> revert patch applied, the result didn't boot. Config is full distro.

Thanks a lot for spotting that. I'll have no trouble reproducing it here,
looking through my old LTP test results. I never noticed because I'm used
to memcg_stat failing - but looking closer, that's been because I'm usually
running with THP shmem_enabled "force", which causes memcg_stat_test 1 to
fail with a bigger number than expected (understandably): memcg_stat_test
3 and 4 failures are new to mlock/munlock changes.

It will (almost certainly) be a pagevec draining issue, to be fixed by a
strategically placed lru_add_drain() or mlock_page_drain(). I did have
more of those in for a while, before understanding and arriving at
b74355078b65 ("mm/munlock: page migration needs mlock pagevec drained");
and with that fix, hadn't noticed the need for more, so left them out
until proven desirable.

If it's as I expect, then it's worth doing: not just to pass an LTP test,
but more generally a good thing. I'll play around in the next few days
and post a patch once I'm satisfied.

Regarding your bisection and revert of b67bf49ce7aa ("mm/munlock: delete
FOLL_MLOCK and FOLL_POPULATE"). I'm glad to hear that you got a build
error trying to revert that one commit: not a supported combination!
Maybe not too far wrong, but I wouldn't trust it.

But yes, I can see that the revert will bring in an lru_add_drain()
per page, so that fits with my guess above.

Thanks!
Hugh

2022-03-28 23:05:59

by Hugh Dickins

[permalink] [raw]
Subject: Re: new ltp memcg gripe bisects to b67bf49ce7aa ("post mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE")

On Fri, 25 Mar 2022, Hugh Dickins wrote:
> On Fri, 25 Mar 2022, Mike Galbraith wrote:
>
> > Greetings,
> >
> > $subject bisected in a kvm ala:
> >
> > leap153:/usr/local/ltp # cat testme
> > export PATH=$PATH:`pwd`/testcases/bin
> > memcg_stat_test.sh
> > leap153:/usr/local/ltp # . ./testme
> >
> > Usually leads to...
> > memcg_stat_test 3 TINFO: Test unevictable with MAP_LOCKED
> > memcg_stat_test 3 TINFO: Running memcg_process --mmap-lock1 -s 135168
> > memcg_stat_test 3 TINFO: Warming up pid: 3460
> > memcg_stat_test 3 TINFO: Process is still here after warm up: 3460
> > memcg_stat_test 3 TFAIL: unevictable is 122880, 135168 expected
> > ...but may lead to...
> > memcg_stat_test 4 TINFO: Test unevictable with mlock
> > memcg_stat_test 4 TINFO: Running memcg_process --mmap-lock2 -s 135168
> > memcg_stat_test 4 TINFO: Warming up pid: 4271
> > memcg_stat_test 4 TINFO: Process is still here after warm up: 4271
> > memcg_stat_test 4 TFAIL: unevictable is 122880, 135168 expected
> > ...or both. A wee bit flaky.
> >
> > I wanted to verify with a revert on top of 85c7000fda00, but while the
> > revert patch applied, the result didn't boot. Config is full distro.
>
> Thanks a lot for spotting that. I'll have no trouble reproducing it here,
> looking through my old LTP test results. I never noticed because I'm used
> to memcg_stat failing - but looking closer, that's been because I'm usually
> running with THP shmem_enabled "force", which causes memcg_stat_test 1 to
> fail with a bigger number than expected (understandably): memcg_stat_test
> 3 and 4 failures are new to mlock/munlock changes.
>
> It will (almost certainly) be a pagevec draining issue, to be fixed by a
> strategically placed lru_add_drain() or mlock_page_drain(). I did have
> more of those in for a while, before understanding and arriving at
> b74355078b65 ("mm/munlock: page migration needs mlock pagevec drained");
> and with that fix, hadn't noticed the need for more, so left them out
> until proven desirable.
>
> If it's as I expect, then it's worth doing: not just to pass an LTP test,
> but more generally a good thing. I'll play around in the next few days
> and post a patch once I'm satisfied.
>
> Regarding your bisection and revert of b67bf49ce7aa ("mm/munlock: delete
> FOLL_MLOCK and FOLL_POPULATE"). I'm glad to hear that you got a build
> error trying to revert that one commit: not a supported combination!
> Maybe not too far wrong, but I wouldn't trust it.
>
> But yes, I can see that the revert will bring in an lru_add_drain()
> per page, so that fits with my guess above.

Right, I was easily able to reproduce those failures; and happily the
patch I had earlier, but left out, indeed fixes them as expected:
follows now.

Thanks,
Hugh