Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2074096imm; Thu, 9 Aug 2018 06:59:19 -0700 (PDT) X-Google-Smtp-Source: AA+uWPyLcj/7NeYKCGUjx4pypMrkj9NjXWoYrNnqjAY5ZWevQXsoHSrn4+ghVCqnIg0pzeLXKaX2 X-Received: by 2002:a63:195e:: with SMTP id 30-v6mr2272031pgz.192.1533823159009; Thu, 09 Aug 2018 06:59:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533823158; cv=none; d=google.com; s=arc-20160816; b=ySYRJR5bGrs5LbhXMpy+6mal1a8do2Ul/qEi+XhfacEhoSIA3Pfhjf23jk9e0fdLfh sJlRcHG2dJLncPV7WHUj9Y7Bvu8cVSJ1T/e8Vk8W5JYi6udVQhYbgPZX2DDh3RuSTnHj ltnuVRygCnzI4rMcyLpsxoi4WuR7y5D6esuZxO3ueNEYdWB/PwthHuIXy8O4UxpYOJxf L9Js7WQbZjzf2A+fiVQbIxBGwpVsrd3uCOoo9zuG789ZhBQkyfB5Jw/wWHlSu3+YXs0A +IliqRA+D/XjAlaCrG9DlIM6zT9/jU4MtjOdYvo3x4FMa27/At9wJl+wTy9jYUJ/3U6W Xgbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:cc:references:to:subject:arc-authentication-results; bh=Z9IO46B+XmdAmCMRcN+tJ+BgcnFccadDUHWOc7qgSlM=; b=x9WwahF0iMSoN19937wZ/zodz1xi+6Tgu88P9lVCxL2UJAZtJR93JN0EAVjLSNuiKz RES4zoYgs49UC+qpgNC3BJE83NaVZ5G4YKTQVxgdb9pHUc0h64o6b+uMGN3wljSRleY9 +R2jcZ3+ydNNrWAnruMLx6vVXQa7bFS0BQZeUzsPQH8pK2ELQdWtJj2unna9B8iYRzJv 8VW0jUjyrVDhDDN9EGPC3eXDT03tjMFllFi/e2Navs7Wsis33Zu6mYeopY10a0saYC8i 7ZuTVWHrgWUT30L3BWXHvQMwnRiMC9/bFZv2wjqYD25P7wrr4YUG5+vi8euJDN0IQVy/ jo6Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s7-v6si3924442pfm.217.2018.08.09.06.59.04; Thu, 09 Aug 2018 06:59:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732274AbeHIQXN (ORCPT + 99 others); Thu, 9 Aug 2018 12:23:13 -0400 Received: from www262.sakura.ne.jp ([202.181.97.72]:23259 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730090AbeHIQXN (ORCPT ); Thu, 9 Aug 2018 12:23:13 -0400 Received: from fsav104.sakura.ne.jp (fsav104.sakura.ne.jp [27.133.134.231]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w79DvhrX023563; Thu, 9 Aug 2018 22:57:43 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav104.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav104.sakura.ne.jp); Thu, 09 Aug 2018 22:57:43 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav104.sakura.ne.jp) Received: from [192.168.1.8] (softbank126074194044.bbtec.net [126.74.194.44]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id w79DvhsV023559 (version=TLSv1.2 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 9 Aug 2018 22:57:43 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Subject: Re: WARNING in try_charge To: mhocko@kernel.org, Vladimir Davydov , Oleg Nesterov , David Rientjes References: <0000000000005e979605729c1564@google.com> Cc: syzbot , cgroups@vger.kernel.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com, Andrew Morton From: Tetsuo Handa Message-ID: Date: Thu, 9 Aug 2018 22:57:43 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <0000000000005e979605729c1564@google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Thu, 9 Aug 2018 22:49:40 +0900 Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until memory reserve fails Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip oom_reaped tasks") changed to select next OOM victim as soon as MMF_OOM_SKIP is set. But we don't need to select next OOM victim as long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1) caused by this race window [1]. Since memcg OOM case uses forced charge if current thread is killed, out_of_memory() can return true without selecting next OOM victim. Therefore, this patch changes task_will_free_mem(current) to ignore MMF_OOM_SKIP unless ALLOC_OOM allocation failed. [1] https://syzkaller.appspot.com/bug?id=ea8c7912757d253537375e981b61749b2da69258 Signed-off-by: Tetsuo Handa Reported-by: syzbot Cc: Michal Hocko Cc: Oleg Nesterov Cc: Vladimir Davydov Cc: David Rientjes --- include/linux/oom.h | 3 +++ mm/oom_kill.c | 8 ++++---- mm/page_alloc.c | 7 +++++-- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index 69864a5..b5abacd 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -38,6 +38,9 @@ struct oom_control { */ const int order; + /* Did we already try ALLOC_OOM allocation? i*/ + const bool reserve_tried; + /* Used by oom implementation, do not set */ unsigned long totalpages; struct task_struct *chosen; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0e10b86..95453e8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -782,7 +782,7 @@ static inline bool __task_will_free_mem(struct task_struct *task) * Caller has to make sure that task->mm is stable (hold task_lock or * it operates on the current). */ -static bool task_will_free_mem(struct task_struct *task) +static bool task_will_free_mem(struct task_struct *task, bool select_new) { struct mm_struct *mm = task->mm; struct task_struct *p; @@ -803,7 +803,7 @@ static bool task_will_free_mem(struct task_struct *task) * This task has already been drained by the oom reaper so there are * only small chances it will free some more */ - if (test_bit(MMF_OOM_SKIP, &mm->flags)) + if (test_bit(MMF_OOM_SKIP, &mm->flags) && select_new) return false; if (atomic_read(&mm->mm_users) <= 1) @@ -939,7 +939,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message) * so it can die quickly */ task_lock(p); - if (task_will_free_mem(p)) { + if (task_will_free_mem(p, true)) { mark_oom_victim(p); wake_oom_reaper(p); task_unlock(p); @@ -1069,7 +1069,7 @@ bool out_of_memory(struct oom_control *oc) * select it. The goal is to allow it to allocate so that it may * quickly exit and free its memory. */ - if (task_will_free_mem(current)) { + if (task_will_free_mem(current, oc->reserve_tried)) { mark_oom_victim(current); wake_oom_reaper(current); return true; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 879b861..03ca29a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3455,7 +3455,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) } static inline struct page * -__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, +__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, bool reserve_tried, const struct alloc_context *ac, unsigned long *did_some_progress) { struct oom_control oc = { @@ -3464,6 +3464,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) .memcg = NULL, .gfp_mask = gfp_mask, .order = order, + .reserve_tried = reserve_tried, }; struct page *page; @@ -4239,7 +4240,9 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) goto retry_cpuset; /* Reclaim has failed us, start killing things */ - page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress); + page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM + || (gfp_mask & __GFP_NOMEMALLOC), ac, + &did_some_progress); if (page) goto got_pg; -- 1.8.3.1