Message-ID: <43717dc3-b8c8-651a-3d61-019c9752a110@shopee.com>
Date:   Tue, 14 Mar 2023 21:27:56 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
 Gecko/20100101 Thunderbird/102.8.0
Subject: Re: [PATCH RESEND] mm/oom_kill: don't kill exiting tasks in
 oom_kill_memcg_member
To:     Michal Hocko <mhocko@suse.com>
Cc:     shakeelb@google.com, hannes@cmpxchg.org, akpm@linux-foundation.org,
        linux-mm@kvack.org, linux-kernel@vger.kernel.org
References: <20230314091136.264878-1-haifeng.xu@shopee.com>
 <ZBA8NlwBTprShO3e@dhcp22.suse.cz>
 <f774cfeb-9524-5fd7-fe2d-e6c2a58684e2@shopee.com>
 <ZBBJZx3Em9L9/3jn@dhcp22.suse.cz>
 <3654a73e-6817-4247-73b8-4604efe4a309@shopee.com>
 <ZBBh9IN3Cvq89WO5@dhcp22.suse.cz>
From:   Haifeng Xu <haifeng.xu@shopee.com>
In-Reply-To: <ZBBh9IN3Cvq89WO5@dhcp22.suse.cz>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: bulk


On 2023/3/14 20:00, Michal Hocko wrote:
> On Tue 14-03-23 19:07:27, Haifeng Xu wrote:
>>
>>
>> On 2023/3/14 18:16, Michal Hocko wrote:
>>> On Tue 14-03-23 18:07:42, Haifeng Xu wrote:
>>>>
>>>>
>>>> On 2023/3/14 17:19, Michal Hocko wrote:
>>>>> On Tue 14-03-23 09:11:36, Haifeng Xu wrote:
>>>>>> If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member()
>>>>>> to kill all processes in the memcg. When scanning tasks in memcg, maybe
>>>>>> the provided task is marked as oom victim. Also, some tasks are likely
>>>>>> to release their address space. There is no need to kill the exiting tasks.
>>>>>
>>>>> This doesn't state any actual problem. Could you be more specific? Is
>>>>> this a bug fix, a behavior change or an optimization?
>>>>
>>>>
>>>> 1) oom_kill_process() has inovked __oom_kill_process() to kill the selected victim, but it will be scanned
>>>> in mem_cgroup_scan_tasks(). It's pointless to kill the victim twice. 
>>>
>>> Why does that matter though? The purpose of task_will_free_mem in
>>> oom_kill_process is different. It would bail out from a potentially
>>> noisy OOM report when the selected oom victim is expected to terminate
>>> soon. __oom_kill_process called for the whole memcg doesn't aim at
>>> avoiding any oom victims. It merely sends a kill signal too all of them.
>>>
>>
>> except sending kill signals, __oom_kill_process() will do some other work, such as print messeages, traversal all 
>> all user processes sharing mm which holds RCU section and so on. So if skip the victim, we don't need those work again
>> and it won't affect the original mechanism. All oom victims are still get killed. 
> 
> mm sharing among processes is a very rare thing but do not forget that
> task_will_free_mem needs to do the same thing for the same reason.

For the victim, __oom_kill_process() traversals all processes in the system whether there some other tasks sharing mm or not.
If skip it, this work can be dropped.

> 
>>>> 2) for those exiting processes, reaping them directly is also a faster way to free memory compare with invoking
>>>> __oom_kill_process().
>>>
>>> Is it? What if the terminating task is blocked on lock? Async oom
>>> reaping might release those resources in that case.
>>
>> Yes, the reaping process is asynchronous. I mean we don't need the work mentioned above any more.
>> "reaping them directly" here is that joining the task in oom reaper queue.
> 
> I do not follow.
> 
> In any case I still do not see any actual justification for the change
> other than "we can do it and it might turn out less expensive". This
> alone is not sufficient, just be explicit, because oom is hardly a fast
> path to optimize every single cpu cycle for. So unless you see an actual
> real life problem that would be behaving much better or even fixed then
> I am not convinced this is a worthwhile change to have.
> 

we can also see two same messages("Memory cgroup out of memory: Killed process ***")about the victim.
This seems a little confusing. If skip the victim, only one message was printed.