2021-11-21 10:57:27

by Mike Galbraith

[permalink] [raw]
Subject: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series

Greetings,

FYI, something in this series causes LTP controllers::memcg_regression
testcase to hang forever. Verified via brute force revert of the lot.

After letting box moan for 4.5 hours, I poked ^C repeatedly, but runltp
didn't exit/recover gracefully, and ps hung, so I nuked the box. All
memcg_test_1 instances were stuck in reclaim_throttle().

crash> ps | grep UN
31458 1 5 ffff88e9c8ded100 UN 0.0 7620 1156 memcg_test_1
31459 1 6 ffff88e9c8de8000 UN 0.0 7620 1156 memcg_test_1
31460 1 4 ffff88e9cbdd0000 UN 0.0 7620 1156 memcg_test_1
31461 1 2 ffff88ea41502880 UN 0.0 7620 1156 memcg_test_1
31462 1 4 ffff88ea12750000 UN 0.0 7620 1156 memcg_test_1
31463 1 7 ffff88ea12752880 UN 0.0 7620 1156 memcg_test_1
31464 1 3 ffff88ea01eb8000 UN 0.0 7620 1156 memcg_test_1
31465 1 6 ffff88e9c2bca880 UN 0.0 7620 1156 memcg_test_1
31466 1 7 ffff88e9c2bc8000 UN 0.0 7620 1156 memcg_test_1
31467 1 2 ffff88ea41e82880 UN 0.0 7620 1156 memcg_test_1
31476 2834 2 ffff88ea1174d100 UN 0.0 38868 3744 ps
crash> bt -sx ffff88e9c8ded100
PID: 31458 TASK: ffff88e9c8ded100 CPU: 5 COMMAND: "memcg_test_1"
#0 [ffffb36648bc79a0] __schedule+0x2c3 at ffffffffb72636e3
#1 [ffffb36648bc7a60] schedule+0x3a at ffffffffb72644ba
#2 [ffffb36648bc7a70] schedule_timeout+0x1f3 at ffffffffb7268d03
#3 [ffffb36648bc7ae8] reclaim_throttle+0xd0 at ffffffffb6a69ab0
#4 [ffffb36648bc7b50] do_try_to_free_pages+0x160 at ffffffffb6a6bab0
#5 [ffffb36648bc7ba8] try_to_free_mem_cgroup_pages+0xf4 at ffffffffb6a6d264
#6 [ffffb36648bc7c40] try_charge_memcg+0x19f at ffffffffb6b03b6f
#7 [ffffb36648bc7ce0] obj_cgroup_charge_pages+0x27 at ffffffffb6b043a7
#8 [ffffb36648bc7d08] obj_cgroup_charge+0x6f at ffffffffb6b0605f
#9 [ffffb36648bc7d28] kmem_cache_alloc+0x86 at ffffffffb6ae7686
#10 [ffffb36648bc7d80] vm_area_alloc+0x1a at ffffffffb689aa5a
#11 [ffffb36648bc7d90] mmap_region+0x32f at ffffffffb6a9ee1f
#12 [ffffb36648bc7df8] do_mmap+0x392 at ffffffffb6a9f492
#13 [ffffb36648bc7e50] vm_mmap_pgoff+0xd5 at ffffffffb6a75125
#14 [ffffb36648bc7ed8] do_syscall_64+0x58 at ffffffffb7256138
#15 [ffffb36648bc7f28] exc_page_fault+0x67 at ffffffffb7259c97
#16 [ffffb36648bc7f50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffffb740007c
RIP: 00007f7b95ad9743 RSP: 00007ffe12089b08 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b95ad9743
RDX: 0000000000000003 RSI: 0000000000001000 RDI: 0000000000000000
RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000001000 R14: 0000000000002022 R15: 0000000000000000
ORIG_RAX: 0000000000000009 CS: 0033 SS: 002b
crash> bt -sx ffff88ea1174d100
PID: 31476 TASK: ffff88ea1174d100 CPU: 2 COMMAND: "ps"
#0 [ffffb3664883fba8] __schedule+0x2c3 at ffffffffb72636e3
#1 [ffffb3664883fc68] schedule+0x3a at ffffffffb72644ba
#2 [ffffb3664883fc78] rwsem_down_read_slowpath+0x1bf at ffffffffb7266c8f
#3 [ffffb3664883fd08] down_read_killable+0x5c at ffffffffb7266f3c
#4 [ffffb3664883fd18] down_read_killable+0x5c at ffffffffb7266f3c
#5 [ffffb3664883fd28] __access_remote_vm+0x4a at ffffffffb6a9795a
#6 [ffffb3664883fd98] proc_pid_cmdline_read+0x16c at ffffffffb6bc3acc
#7 [ffffb3664883fe00] vfs_read+0x9a at ffffffffb6b1d49a
#8 [ffffb3664883fe30] ksys_read+0xa1 at ffffffffb6b1d891
#9 [ffffb3664883fe70] do_syscall_64+0x58 at ffffffffb7256138
#10 [ffffb3664883fec0] do_sys_openat2+0x1cd at ffffffffb6b1913d
#11 [ffffb3664883ff00] do_sys_open+0x57 at ffffffffb6b1a7e7
#12 [ffffb3664883ff28] do_syscall_64+0x67 at ffffffffb7256147
#13 [ffffb3664883ff50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffffb740007c
RIP: 00007f192fb5cb5e RSP: 00007ffc611e0ee8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f19303de010 RCX: 00007f192fb5cb5e
RDX: 0000000000020000 RSI: 00007f19303de010 RDI: 0000000000000006
RBP: 0000000000020000 R8: 0000000000000007 R9: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f19303de010
R13: 0000000000000000 R14: 0000000000000006 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b



2021-11-21 13:52:04

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series


Hi, this is your Linux kernel regression tracker speaking.

CCing regression mailing list, which should be in the loop for all
regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

On 21.11.21 11:57, Mike Galbraith wrote:
> Greetings,
>
> FYI, something in this series causes LTP controllers::memcg_regression
> testcase to hang forever. Verified via brute force revert of the lot.
>
> After letting box moan for 4.5 hours, I poked ^C repeatedly, but runltp
> didn't exit/recover gracefully, and ps hung, so I nuked the box. All
> memcg_test_1 instances were stuck in reclaim_throttle().
> [...]

TWIMC: To be sure this issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced 8cd7c588decf..66ce520bb7c2
#regzbot ignore-activity

Ciao, Thorsten, your Linux kernel regression tracker.

P.S.: If you want to know more about regzbot, check out its
web-interface, the getting start guide, and/or the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for the reporter: when reporting a regression it's in your interest
to tell #regzbot about it in the report, as that will ensure the
regression gets on the radar of regzbot and the regression tracker.
That's in your interest, as they will make sure the report won't fall
through the cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot, just
fix the issue as you normally would. Just remember to include a 'Link:'
tag to the report in the commit message, as explained in
Documentation/process/submitting-patches.rst
That aspect was recently was made more explicit in commit 1f57bd42b77c:
https://git.kernel.org/linus/1f57bd42b77c


P.P.S.: As a Linux kernel regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them.
Unfortunately therefore I sometimes will get things wrong or miss
something important. I hope that's not the case here; if you think it
is, don't hesitate to tell me about it in a public reply. That's in
everyone's interest, as what I wrote above might be misleading to
everyone reading this; any suggestion I gave they thus might sent
someone reading this down the wrong rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activity wrt to this regression.

2021-11-22 06:47:10

by Mike Galbraith

[permalink] [raw]
Subject: Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series

On Sun, 2021-11-21 at 14:51 +0100, Thorsten Leemhuis wrote:
>
> Hi, this is your Linux kernel regression tracker speaking.

This isn't necessarily a kernel regression, though the fact that
bystander tasks like ps can get snagged for ages as the testcase runs
at an amazingly glacial pace doesn't look particularly wonderful.

homer:/root # time ps lax
...
real 11m33.258s
user 0m0.004s
sys 0m0.008s
homer:/root #

The other memcg tests complete fine, though taking more time to do so.
Running all tests except memcg_regression, (and memcg_stress because
it's a 30 minute swap storm from hell as delivered), runtime increased
from 0m37s to 4m43s.

The problematic testcase is root intentionally shooting its own foot,
then checking for battle damage that occurred in days of yore, so it
now hurting in a different way isn't necessarily grounds to gripe at
the kernel.. even with the rather ugly side effects.. maybe <shrug>.

#---------------------------------------------------------------------------
# Bug: The bug was, while forking mass processes, trigger memcgroup OOM,
# then NULL pointer dereference may be hit.
# Kernel: 2.6.25-rcX
# Links: http://lkml.org/lkml/2008/4/14/38
# Fix: commit e115f2d89253490fb2dbf304b627f8d908df26f1
#---------------------------------------------------------------------------
test_1()
{
mkdir memcg/0/
echo 0 > memcg/0/memory.limit_in_bytes

./memcg_test_1

rmdir memcg/0/

check_kernel_bug
if [ $? -eq 1 ]; then
tst_resm TPASS "no kernel bug was found"
fi
}


I'm gonna just comment the little bugger out as obsolete and walk away.

-Mike

2021-11-22 07:41:39

by Mike Galbraith

[permalink] [raw]
Subject: Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series

On Mon, 2021-11-22 at 07:46 +0100, Mike Galbraith wrote:
>
> #--------------------------------------------------------------------
> -------
> # Bug:    The bug was, while forking mass processes, trigger
> memcgroup OOM,
> #         then NULL pointer dereference may be hit.
> # Kernel: 2.6.25-rcX
> # Links:  http://lkml.org/lkml/2008/4/14/38
> # Fix:    commit e115f2d89253490fb2dbf304b627f8d908df26f1
> #--------------------------------------------------------------------
> -------
> test_1()
> {      
>         mkdir memcg/0/
>         echo 0 > memcg/0/memory.limit_in_bytes
>        
>         ./memcg_test_1
>        
>         rmdir memcg/0/
>        
>         check_kernel_bug
>         if [ $? -eq 1 ]; then
>                 tst_resm TPASS "no kernel bug was found"
>         fi
> }
>
>
> I'm gonna just comment the little bugger out as obsolete and walk
> away.

test2() and test3() then pass, but test_4() hangs, and looks like it
should not, so...

-Mike

2021-11-23 09:13:11

by Mel Gorman

[permalink] [raw]
Subject: Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series

On Sun, Nov 21, 2021 at 11:57:20AM +0100, Mike Galbraith wrote:
> Greetings,
>
> FYI, something in this series causes LTP controllers::memcg_regression
> testcase to hang forever. Verified via brute force revert of the lot.
>
> After letting box moan for 4.5 hours, I poked ^C repeatedly, but runltp
> didn't exit/recover gracefully, and ps hung, so I nuked the box. All
> memcg_test_1 instances were stuck in reclaim_throttle().
>

I'll see can I reproduce this but do you know offhand what the test is
doing and what the expected outcome is? A possibility is that this is a
test that is driving the machine near OOM (or at least memcg OOM) and
getting throttled instead of getting killed.

--
Mel Gorman
SUSE Labs

2021-11-23 11:06:26

by Mike Galbraith

[permalink] [raw]
Subject: Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series

On Tue, 2021-11-23 at 09:13 +0000, Mel Gorman wrote:
> On Sun, Nov 21, 2021 at 11:57:20AM +0100, Mike Galbraith wrote:
> > Greetings,
> >
> > FYI, something in this series causes LTP controllers::memcg_regression
> > testcase to hang forever.  Verified via brute force revert of the lot.
> >
> > After letting box moan for 4.5 hours, I poked ^C repeatedly, but runltp
> > didn't exit/recover gracefully, and ps hung, so I nuked the box.  All
> > memcg_test_1 instances were stuck in reclaim_throttle().
> >
>
> I'll see can I reproduce this but do you know offhand what the test is
> doing and what the expected outcome is? A possibility is that this is a
> test that is driving the machine near OOM (or at least memcg OOM) and
> getting throttled instead of getting killed.

Here's the hanging test 4.

testcases/bin/memcg_regression_test.sh:
test_4()
{
./memcg_test_4.sh

check_kernel_bug
if [ $? -eq 1 ]; then
tst_resm TPASS "no kernel bug was found"
fi

# test_4.sh might be killed by oom, so do clean up here
killall -9 memcg_test_4 2> /dev/null
killall -9 memcg_test_4.sh 2> /dev/null

# if test_4.sh gets killed, it won't clean cgroup it created
rmdir memcg/0 2> /dev/null

swapon -a
}

testcases/bin/memcg_test_4.sh:
# attach current task to memcg/0/
mkdir memcg/0
echo $$ > memcg/0/tasks

./memcg_test_4 &
pid=$!
sleep 1

# let $pid allocate 100M memory
/bin/kill -SIGUSR1 $pid
sleep 1

# shrink memory, and then 80M will be swapped
echo 40M > memcg/0/memory.limit_in_bytes

# turn off swap, and swapoff will be killed
swapoff -a
sleep 1
echo $pid > memcg/tasks 2> /dev/null
echo $$ > memcg/tasks 2> /dev/null

# now remove the cgroup
rmdir memcg/0

2021-11-23 11:18:10

by Mike Galbraith

[permalink] [raw]
Subject: Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series

On Tue, 2021-11-23 at 09:13 +0000, Mel Gorman wrote:
>
> I'll see can I reproduce this...

You likely already know this, but just in case, just plunk the below
into $LTPROOT/runtest/foo, and $LTPROOT/runltp -f foo.

#DESCRIPTION:Resource Management testing
memcg_regression memcg_regression_test.sh

-Mike

2021-11-23 13:13:04

by Mike Galbraith

[permalink] [raw]
Subject: Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series

On Tue, 2021-11-23 at 12:18 +0100, Mike Galbraith wrote:
> On Tue, 2021-11-23 at 09:13 +0000, Mel Gorman wrote:
> >
> > I'll see can I reproduce this...
>
> You likely already know this, but just in case, just plunk the below
> into $LTPROOT/runtest/foo, and $LTPROOT/runltp -f foo.
>
> #DESCRIPTION:Resource Management testing
> memcg_regression        memcg_regression_test.sh

Editing memcg_regression_test.sh to only run test_4() works fine, and
an earlier kernel does oom-kill swapoff as it expects.

319.265016] swapoff invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

New kernel says it oom-killed, but it's in the penalty box.

[ 272.325727] swapoff invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

crash> ps | grep memcg
4105 4104 4 ffff953bd1a88000 IN 0.0 15176 3596 memcg_regressio
4115 4105 0 ffff953beffe2880 IN 0.0 15172 3580 memcg_test_4.sh
4117 4115 7 ffff953c0b818000 IN 0.2 110020 38404 memcg_test_4
crash> ps | grep swapoff
4138 4115 1 ffff953bf2f70000 UN 0.0 19032 0 swapoff
crash> bt -sx 4138
PID: 4138 TASK: ffff953bf2f70000 CPU: 5 COMMAND: "swapoff"
#0 [ffffad1a02957770] __schedule+0x2c3 at ffffffff932636e3
#1 [ffffad1a02957830] schedule+0x3a at ffffffff932644ba
#2 [ffffad1a02957840] schedule_timeout+0x1f3 at ffffffff93268d03
#3 [ffffad1a029578b8] reclaim_throttle+0xd0 at ffffffff92a69ab0
#4 [ffffad1a02957920] do_try_to_free_pages+0x160 at ffffffff92a6bab0
#5 [ffffad1a02957978] try_to_free_mem_cgroup_pages+0xf4 at ffffffff92a6d264
#6 [ffffad1a02957a10] try_charge_memcg+0x19f at ffffffff92b03b6f
#7 [ffffad1a02957ab0] charge_memcg+0x33 at ffffffff92b04313
#8 [ffffad1a02957ad0] mem_cgroup_swapin_charge_page+0x58 at ffffffff92b06a08
#9 [ffffad1a02957b00] __read_swap_cache_async+0x206 at ffffffff92ac2006
#10 [ffffad1a02957b68] swap_cluster_readahead+0x165 at ffffffff92ac2295
#11 [ffffad1a02957bf0] swapin_readahead+0x84 at ffffffff92ac24d4
#12 [ffffad1a02957c78] unuse_pte_range+0x53c at ffffffff92ac5cfc
#13 [ffffad1a02957d68] try_to_unuse+0x50e at ffffffff92ac664e
#14 [ffffad1a02957e38] __do_sys_swapoff+0x1e4 at ffffffff92ac6ba4
#15 [ffffad1a02957e90] do_syscall_64+0x58 at ffffffff93256138
#16 [ffffad1a02957ea8] exit_to_user_mode_prepare+0xf8 at ffffffff9293f138
#17 [ffffad1a02957ec0] syscall_exit_to_user_mode+0x18 at ffffffff9325a358
#18 [ffffad1a02957ed0] do_syscall_64+0x67 at ffffffff93256147
#19 [ffffad1a02957ef0] syscall_exit_to_user_mode+0x18 at ffffffff9325a358
#20 [ffffad1a02957f28] exc_page_fault+0x67 at ffffffff93259c97
#21 [ffffad1a02957f50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffff9340007c
RIP: 00007f6da7cd9997 RSP: 00007fffbca10388 RFLAGS: 00000202
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f6da7cd9997
RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bc98cb6d10
RBP: 000055bc98cb6d10 R8: 00007fffbca103f0 R9: 0000000000000003
R10: 000055bc98cb15af R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 00007fffbca103f0 R15: 0000000000000000
ORIG_RAX: 00000000000000a8 CS: 0033 SS: 002b
crash>

Enabling sched events for it...

homer:..debug/tracing # tail -25 trace
swapoff-4138 [001] d.... 1641.929650: sched_switch: prev_comm=swapoff prev_pid=4138 prev_prio=120 prev_state=D ==> next_comm=swapper/1 next_pid=0 next_prio=120
<idle>-0 [001] d.s.. 1642.441625: sched_waking: comm=swapoff pid=4138 prio=120 target_cpu=001
<idle>-0 [001] dNs.. 1642.441631: sched_wakeup: comm=swapoff pid=4138 prio=120 target_cpu=001
<idle>-0 [001] d.... 1642.441638: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=swapoff next_pid=4138 next_prio=120
swapoff-4138 [001] d.... 1642.441642: sched_waking: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
swapoff-4138 [001] d.... 1642.441643: sched_stat_runtime: comm=swapoff pid=4138 runtime=11453 [ns] vruntime=45946584931 [ns]
swapoff-4138 [001] d.... 1642.441645: sched_wakeup: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
swapoff-4138 [001] d.... 1642.441652: sched_stat_runtime: comm=swapoff pid=4138 runtime=8758 [ns] vruntime=45946593689 [ns]
swapoff-4138 [001] d.... 1642.441653: sched_switch: prev_comm=swapoff prev_pid=4138 prev_prio=120 prev_state=D ==> next_comm=kworker/1:0 next_pid=5005 next_prio=120
<idle>-0 [001] d.s.. 1642.953622: sched_waking: comm=swapoff pid=4138 prio=120 target_cpu=001
<idle>-0 [001] dNs.. 1642.953627: sched_wakeup: comm=swapoff pid=4138 prio=120 target_cpu=001
<idle>-0 [001] d.... 1642.953634: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=swapoff next_pid=4138 next_prio=120
swapoff-4138 [001] d.... 1642.953638: sched_waking: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
swapoff-4138 [001] d.... 1642.953639: sched_stat_runtime: comm=swapoff pid=4138 runtime=10995 [ns] vruntime=45946604684 [ns]
swapoff-4138 [001] d.... 1642.953641: sched_wakeup: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
swapoff-4138 [001] d.... 1642.953648: sched_stat_runtime: comm=swapoff pid=4138 runtime=8633 [ns] vruntime=45946613317 [ns]
swapoff-4138 [001] d.... 1642.953649: sched_switch: prev_comm=swapoff prev_pid=4138 prev_prio=120 prev_state=D ==> next_comm=kworker/1:0 next_pid=5005 next_prio=120
<idle>-0 [001] d.s.. 1643.465642: sched_waking: comm=swapoff pid=4138 prio=120 target_cpu=001
<idle>-0 [001] dNs.. 1643.465654: sched_wakeup: comm=swapoff pid=4138 prio=120 target_cpu=001
<idle>-0 [001] d.... 1643.465674: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=swapoff next_pid=4138 next_prio=120
swapoff-4138 [001] d.... 1643.465683: sched_waking: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
swapoff-4138 [001] d.... 1643.465686: sched_stat_runtime: comm=swapoff pid=4138 runtime=28731 [ns] vruntime=45946642048 [ns]
swapoff-4138 [001] d.... 1643.465690: sched_wakeup: comm=kworker/1:0 pid=5005 prio=120 target_cpu=001
swapoff-4138 [001] d.... 1643.465708: sched_stat_runtime: comm=swapoff pid=4138 runtime=21966 [ns] vruntime=45946664014 [ns]
swapoff-4138 [001] d.... 1643.465711: sched_switch: prev_comm=swapoff prev_pid=4138 prev_prio=120 prev_state=D ==> next_comm=kworker/1:0 next_pid=5005 next_prio=120
homer:..debug/tracing #

...I bet if I were reeeeeally patient, I might see the test pass.

LOL, and as I write this, it just did :)

-Mike

2021-11-23 14:28:45

by Mike Galbraith

[permalink] [raw]
Subject: Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series

On Tue, 2021-11-23 at 14:12 +0100, Mike Galbraith wrote:
>
> ...I bet if I were reeeeeally patient, I might see the test pass.
>
> LOL, and as I write this, it just did :)

real 30m27.524s
user 0m0.206s
sys 0m0.162s


Hohum, this one looks like there is a genuine buglet in in there
somewhere, as oom-kill _and_ penalty box surely wasn't intended.

-Mike

2021-11-24 14:56:34

by Mel Gorman

[permalink] [raw]
Subject: Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series

On Tue, Nov 23, 2021 at 12:18:01PM +0100, Mike Galbraith wrote:
> On Tue, 2021-11-23 at 09:13 +0000, Mel Gorman wrote:
> >
> > I'll see can I reproduce this...
>
> You likely already know this, but just in case, just plunk the below
> into $LTPROOT/runtest/foo, and $LTPROOT/runltp -f foo.
>
> #DESCRIPTION:Resource Management testing
> memcg_regression memcg_regression_test.sh
>

Thanks. Can you try the following patch please?

The test will still take longer to reach OOM and complete as it's stalling
but not as severely.

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 07db03883062..d9166e94eb95 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1057,7 +1057,17 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)

break;
case VMSCAN_THROTTLE_NOPROGRESS:
- timeout = HZ/2;
+ timeout = 1;
+
+ /*
+ * If kswapd is disabled, reschedule if necessary but do not
+ * throttle as the system is likely near OOM.
+ */
+ if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES) {
+ cond_resched();
+ return;
+ }
+
break;
case VMSCAN_THROTTLE_ISOLATED:
timeout = HZ/50;
@@ -3395,7 +3405,7 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
return;

/* Throttle if making no progress at high prioities. */
- if (sc->priority < DEF_PRIORITY - 2)
+ if (sc->priority < DEF_PRIORITY - 2 && !sc->nr_reclaimed)
reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS);
}

@@ -3415,6 +3425,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
unsigned long nr_soft_scanned;
gfp_t orig_mask;
pg_data_t *last_pgdat = NULL;
+ pg_data_t *first_pgdat = NULL;

/*
* If the number of buffer_heads in the machine exceeds the maximum
@@ -3478,14 +3489,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
/* need some check for avoid more shrink_zone() */
}

+ if (!first_pgdat)
+ first_pgdat = zone->zone_pgdat;
+
/* See comment about same check for global reclaim above */
if (zone->zone_pgdat == last_pgdat)
continue;
last_pgdat = zone->zone_pgdat;
shrink_node(zone->zone_pgdat, sc);
- consider_reclaim_throttle(zone->zone_pgdat, sc);
}

+ consider_reclaim_throttle(first_pgdat, sc);
+
/*
* Restore to original mask to avoid the impact on the caller if we
* promoted it to __GFP_HIGHMEM.

2021-11-24 16:22:30

by Mike Galbraith

[permalink] [raw]
Subject: Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series

On Wed, 2021-11-24 at 14:56 +0000, Mel Gorman wrote:
> On Tue, Nov 23, 2021 at 12:18:01PM +0100, Mike Galbraith wrote:
> > On Tue, 2021-11-23 at 09:13 +0000, Mel Gorman wrote:
> > >
> > > I'll see can I reproduce this...
> >
> > You likely already know this, but just in case, just plunk the below
> > into $LTPROOT/runtest/foo, and $LTPROOT/runltp -f foo.
> >
> > #DESCRIPTION:Resource Management testing
> > memcg_regression        memcg_regression_test.sh
> >
>
> Thanks. Can you try the following patch please?
>
> The test will still take longer to reach OOM and complete as it's stalling
> but not as severely.

Yeah, way better. test1() is still to be avoided, but everything else
seems fine, including stress (twiddled to not be swap storm from hell).

>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 07db03883062..d9166e94eb95 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1057,7 +1057,17 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
>  
>                 break;
>         case VMSCAN_THROTTLE_NOPROGRESS:
> -               timeout = HZ/2;
> +               timeout = 1;
> +
> +               /*
> +                * If kswapd is disabled, reschedule if necessary but do not
> +                * throttle as the system is likely near OOM.
> +                */
> +               if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES) {
> +                       cond_resched();
> +                       return;
> +               }
> +
>                 break;
>         case VMSCAN_THROTTLE_ISOLATED:
>                 timeout = HZ/50;
> @@ -3395,7 +3405,7 @@ static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
>                 return;
>  
>         /* Throttle if making no progress at high prioities. */
> -       if (sc->priority < DEF_PRIORITY - 2)
> +       if (sc->priority < DEF_PRIORITY - 2 && !sc->nr_reclaimed)
>                 reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS);
>  }
>  
> @@ -3415,6 +3425,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>         unsigned long nr_soft_scanned;
>         gfp_t orig_mask;
>         pg_data_t *last_pgdat = NULL;
> +       pg_data_t *first_pgdat = NULL;
>  
>         /*
>          * If the number of buffer_heads in the machine exceeds the maximum
> @@ -3478,14 +3489,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>                         /* need some check for avoid more shrink_zone() */
>                 }
>  
> +               if (!first_pgdat)
> +                       first_pgdat = zone->zone_pgdat;
> +
>                 /* See comment about same check for global reclaim above */
>                 if (zone->zone_pgdat == last_pgdat)
>                         continue;
>                 last_pgdat = zone->zone_pgdat;
>                 shrink_node(zone->zone_pgdat, sc);
> -               consider_reclaim_throttle(zone->zone_pgdat, sc);
>         }
>  
> +       consider_reclaim_throttle(first_pgdat, sc);
> +
>         /*
>          * Restore to original mask to avoid the impact on the caller if we
>          * promoted it to __GFP_HIGHMEM.