Received: by 2002:ac0:aa62:0:0:0:0:0 with SMTP id w31-v6csp3429994ima; Tue, 23 Oct 2018 05:57:19 -0700 (PDT) X-Google-Smtp-Source: ACcGV636Grrz4V/wGNsgp63niQLMruf7QUDS86HHTe/gna0JUdo8z9BncNOGVdcBLU3+ajm9VzRY X-Received: by 2002:a63:1806:: with SMTP id y6-v6mr47385618pgl.187.1540299439662; Tue, 23 Oct 2018 05:57:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540299439; cv=none; d=google.com; s=arc-20160816; b=jOv0Yz98yO69m1RCd7GSFBy1t8CC9a6MDJ2pxniiF0vNj9kq1nFMYBn9SyyZCXvkX9 HPD7df52/P0HnU6wXTfibXquOZD5ywP5VJXq+wQPtGH6FRdDaHi2gAQKD2OKSXpQQ5ry 2wrAigl0EUgzPfT0Xg06uBAog/lLE7UBNN3qLhKwRibL1UPlYu9vIqAV+rB29qA7ST3M WsxzemYnL/7MugfZfQ4FxsNuvh6oNmz08d1on9fCH0rwxHZkTPQavKsAd8YzXNGj6Bx/ ZHqHTbyUT/KEJRr1mKzmRlU+T5HafPbXdpK9aXlKLUSx0/OMIfpfNCyC8Vl0O0HN5+Mt urEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=C5cZsxhr9uHC0ika+DtWRl2eTU3hcQD4MyglBzKAI84=; b=cQfRJr36i8qzhDnwNBS20H9Sk4J0scrpM2DP5EzawuMkjUe+Lwm408lhUIc18PXSuy 4Mq//jzZ7y/KzdAbNecDQN0dOh43LMvi03VYpLnbtFXImyRY63GpBtgF6/cDgoSnMIWk zHfCItcUJ2Tbgqer8/M8sEv2tbRgHErtNPrx2v+W15lC2c9QAvmCw/xxvV3ubcH0Lr9c bHg1/yTi7oxg6BoXGQmRsnx7cvQfTyZqRsuMBRuJftv+dx9AS/YhVZXpBdTnnOn6BuON K3O4UqqsiOD1XbHe+dZxJQZSMY4T9Ewc5tYr0Qe24GBEl3xwNg1iTXnASya4nXK6GbVU ljvw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s2-v6si1148266plp.139.2018.10.23.05.57.04; Tue, 23 Oct 2018 05:57:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728110AbeJWU5X (ORCPT + 99 others); Tue, 23 Oct 2018 16:57:23 -0400 Received: from www262.sakura.ne.jp ([202.181.97.72]:43706 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728060AbeJWU5W (ORCPT ); Tue, 23 Oct 2018 16:57:22 -0400 Received: from fsav304.sakura.ne.jp (fsav304.sakura.ne.jp [153.120.85.135]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w9NCXpM5045411; Tue, 23 Oct 2018 21:33:52 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav304.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav304.sakura.ne.jp); Tue, 23 Oct 2018 21:33:51 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav304.sakura.ne.jp) Received: from [192.168.1.8] (softbank060157065137.bbtec.net [60.157.65.137]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id w9NCXi49045332 (version=TLSv1.2 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 23 Oct 2018 21:33:51 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Subject: Re: [RFC PATCH 2/2] memcg: do not report racy no-eligible OOM tasks To: Michal Hocko , Johannes Weiner Cc: linux-mm@kvack.org, David Rientjes , Andrew Morton , LKML References: <20181022120308.GB18839@dhcp22.suse.cz> <201810230101.w9N118i3042448@www262.sakura.ne.jp> <20181023114246.GR18839@dhcp22.suse.cz> <20181023121055.GS18839@dhcp22.suse.cz> From: Tetsuo Handa Message-ID: Date: Tue, 23 Oct 2018 21:33:43 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20181023121055.GS18839@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018/10/23 21:10, Michal Hocko wrote: > On Tue 23-10-18 13:42:46, Michal Hocko wrote: >> On Tue 23-10-18 10:01:08, Tetsuo Handa wrote: >>> Michal Hocko wrote: >>>> On Mon 22-10-18 20:45:17, Tetsuo Handa wrote: >>>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>>>>> index e79cb59552d9..a9dfed29967b 100644 >>>>>> --- a/mm/memcontrol.c >>>>>> +++ b/mm/memcontrol.c >>>>>> @@ -1380,10 +1380,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, >>>>>> .gfp_mask = gfp_mask, >>>>>> .order = order, >>>>>> }; >>>>>> - bool ret; >>>>>> + bool ret = true; >>>>>> >>>>>> mutex_lock(&oom_lock); >>>>>> + >>>>>> + /* >>>>>> + * multi-threaded tasks might race with oom_reaper and gain >>>>>> + * MMF_OOM_SKIP before reaching out_of_memory which can lead >>>>>> + * to out_of_memory failure if the task is the last one in >>>>>> + * memcg which would be a false possitive failure reported >>>>>> + */ >>>>>> + if (tsk_is_oom_victim(current)) >>>>>> + goto unlock; >>>>>> + >>>>> >>>>> This is not wrong but is strange. We can use mutex_lock_killable(&oom_lock) >>>>> so that any killed threads no longer wait for oom_lock. >>>> >>>> tsk_is_oom_victim is stronger because it doesn't depend on >>>> fatal_signal_pending which might be cleared throughout the exit process. >>>> >>> >>> I still want to propose this. No need to be memcg OOM specific. >> >> Well, I maintain what I've said [1] about simplicity and specific fix >> for a specific issue. Especially in the tricky code like this where all >> the consequences are far more subtle than they seem to be. >> >> This is obviously a matter of taste but I don't see much point discussing >> this back and forth for ever. Unless there is a general agreement that >> the above is less appropriate then I am willing to consider a different >> change but I simply do not have energy to nit pick for ever. >> >> [1] http://lkml.kernel.org/r/20181022134315.GF18839@dhcp22.suse.cz > > In other words. Having a memcg specific fix means, well, a memcg > maintenance burden. Like any other memcg specific oom decisions we > already have. So are you OK with that Johannes or you would like to see > a more generic fix which might turn out to be more complex? > I don't know what "that Johannes" refers to. If you don't want to affect SysRq-OOM and pagefault-OOM cases, are you OK with having a global-OOM specific fix? mm/page_alloc.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e2ef1c1..f59f029 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3518,6 +3518,17 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) if (gfp_mask & __GFP_THISNODE) goto out; + /* + * It is possible that multi-threaded OOM victims get + * task_will_free_mem(current) == false when the OOM reaper quickly + * set MMF_OOM_SKIP. But since we know that tsk_is_oom_victim() == true + * tasks won't loop forever (unless it is a __GFP_NOFAIL allocation + * request), we don't need to select next OOM victim. + */ + if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL)) { + *did_some_progress = 1; + goto out; + } /* Exhausted what can be done so it's blame time */ if (out_of_memory(&oc) || WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) { *did_some_progress = 1; -- 1.8.3.1