Message-ID: <d1d0429a-0cf6-51f5-81ce-6eb0ad340540@shopee.com>
Date:   Tue, 21 Mar 2023 10:41:44 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
 Gecko/20100101 Thunderbird/102.9.0
Subject: Re: [RFC] memcg, oom: clean up mem_cgroup_oom_synchronize
To:     Michal Hocko <mhocko@suse.com>
Cc:     hannes@cmpxchg.org, shakeelb@google.com, akpm@linux-foundation.org,
        linux-mm@kvack.org, linux-kernel@vger.kernel.org
References: <20230315070302.268316-1-haifeng.xu@shopee.com>
 <ZBRTV12GNtiSlOr3@dhcp22.suse.cz>
From:   Haifeng Xu <haifeng.xu@shopee.com>
In-Reply-To: <ZBRTV12GNtiSlOr3@dhcp22.suse.cz>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: bulk


On 2023/3/17 19:47, Michal Hocko wrote:
> On Wed 15-03-23 07:03:02, Haifeng Xu wrote:
>> Since commit 29ef680ae7c2 ("memcg, oom: move out_of_memory back to
>> the charge path"), only oom_kill_disable is set, oom killer will
>> be delayed to page fault path. In the charge patch, even if the
>> oom_lock in memcg can't be acquired, the oom handing can also be
>> invoked. In order to keep the behavior consistent with it, remove
>> the lock check, just leave oom_kill_disable check behind in the
>> page fault path.
> 
> I do not understand the actual problem you are trying to deal with here.
> 
>> Furthermore, the lock contender won't be scheduled out, this doesn't
>> fit the sixth description in commit fb2a6fc56be66 ("mm: memcg:
>> rework and document OOM waiting and wakeup"). So remove the explicit
>> wakeup for the lock holder.
>>
>> Fixes: fb2a6fc56be6 ("mm: memcg: rework and document OOM waiting and wakeup")
> 
> The subject mentions a clean up but the fixes tag would indicate an
> acutal fix.
> 
>> Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
>> ---
>>  mm/memcontrol.c | 11 ++---------
>>  1 file changed, 2 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 5abffe6f8389..360fa7cf7879 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1999,7 +1999,7 @@ bool mem_cgroup_oom_synchronize(bool handle)
>>  	if (locked)
>>  		mem_cgroup_oom_notify(memcg);
>>  
>> -	if (locked && !memcg->oom_kill_disable) {
>> +	if (!memcg->oom_kill_disable) {
>>  		mem_cgroup_unmark_under_oom(memcg);
>>  		finish_wait(&memcg_oom_waitq, &owait.wait);
>>  		mem_cgroup_out_of_memory(memcg, current->memcg_oom_gfp_mask,
> 
> Now looking at the actual code I suspect you in fact want to simplify
> the logic here as mem_cgroup_oom_synchronize is only ever triggered whe
> oom_kill_disable == true because current->memcg_in_oom is never non NULL
> otherwise. So the check is indeed unnecessary. Your patch, however
> doesn't really simplify the code much. 
> 
> Did you want this instead?
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 12559c08d976..a77dc88cfa12 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1999,16 +1999,9 @@ bool mem_cgroup_oom_synchronize(bool handle)
>  	if (locked)
>  		mem_cgroup_oom_notify(memcg);
>  
> -	if (locked && !READ_ONCE(memcg->oom_kill_disable)) {
> -		mem_cgroup_unmark_under_oom(memcg);
> -		finish_wait(&memcg_oom_waitq, &owait.wait);
> -		mem_cgroup_out_of_memory(memcg, current->memcg_oom_gfp_mask,
> -					 current->memcg_oom_order);
> -	} else {
> -		schedule();
> -		mem_cgroup_unmark_under_oom(memcg);
> -		finish_wait(&memcg_oom_waitq, &owait.wait);
> -	}
> +	schedule();
> +	mem_cgroup_unmark_under_oom(memcg);
> +	finish_wait(&memcg_oom_waitq, &owait.wait);
>  
>  	if (locked) {
>  		mem_cgroup_oom_unlock(memcg);
> 

Yes, the chance that someone else disable the oom_kill_disable again in the page fault path
is quite low.

>> @@ -2010,15 +2010,8 @@ bool mem_cgroup_oom_synchronize(bool handle)
>>  		finish_wait(&memcg_oom_waitq, &owait.wait);
>>  	}
>>  
>> -	if (locked) {
>> +	if (locked)
>>  		mem_cgroup_oom_unlock(memcg);
>> -		/*
>> -		 * There is no guarantee that an OOM-lock contender
>> -		 * sees the wakeups triggered by the OOM kill
>> -		 * uncharges.  Wake any sleepers explicitly.
>> -		 */
>> -		memcg_oom_recover(memcg);
>> -	}
> 
> Hmm, so this seems unneded as well for the oom_kill_disable case as
> well. Rather than referring to fb2a6fc56be66 it would be better to
> why the explicit recovery is not really needed anymore.
> 
>>  cleanup:
>>  	current->memcg_in_oom = NULL;
>>  	css_put(&memcg->css);
> 

Thank you for your suggestion. I'll post an official patch later.