Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp787741ybh; Thu, 12 Mar 2020 11:07:50 -0700 (PDT) X-Google-Smtp-Source: ADFU+vuUF2IN0QhY6qDH/qRvDUL6EjixQMeUiogdhz2/2b4yTuedEE5rlqjkqk53LWd89yLnbaLU X-Received: by 2002:aca:5210:: with SMTP id g16mr3742814oib.174.1584036470587; Thu, 12 Mar 2020 11:07:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584036470; cv=none; d=google.com; s=arc-20160816; b=AzeXP2BkioWY5HVxrVoNgmJbjLQ4qx+1b/+GUW3QeaM3cJdO8btWGTnkNgTcNSEDT1 jV6dMaYH530q9t9GXHmHJGdX30OJpbrn6pqCyJzMWJ37C0l5J/Ct+QSdoSS8z9CPPw54 Icr1szqY3VO4H8IzahjF4plOHX1mfqB6eQiuFBh/uRB79VBHcmtPGL6lRCS7Ma4xi4xW cFWCtEparJcZZx21tZUrTVS09Z5bd/GimMWDWAc3gV+k6I2Ya4KMq141jt1jkI6HMidr TuU764w/NqgwqIUUZZKHYlYmQ+egEFa3dpYiuBTuwIdRr2NkOmDBUabwfs9vdz5VMtXZ 1taQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=o5Rs9S0RYunQW4vpCbkDqGUyyhPOaydkpJow5/jb/es=; b=wIvSNR+DJPcauVNzqR1ihtbWXExVEqmVQXxaDU01CInVOvxtCnyKdFtS1hQpjIFUoV 1u3mgOLBLKD5+LtGnRqog3OSUvz1h+KSSFMe5DSC5UcfdHmAKniq0+slrV+jzl6sNBJF DP8ZQIdkEWknNkWRH6Y6GyAVjVDQoRjH0aoQErfi0pi1wKVGkLdoUC1vuZyAAsGdQHdB HWKGjSzHMVg9R1ns69FMl/AWKCCPmSG2wErvbOMEZjayJBTowC10FJIwS/pwvu2wnEZ8 L0NFeGrfVGotCANOA88IVgScaXvqjS9A4cet3sg7ULI1N5wo/HZEh7Q55L6uMjGFnj+D u0QQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=AncVZ1gc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p13si3178104otk.75.2020.03.12.11.07.36; Thu, 12 Mar 2020 11:07:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=AncVZ1gc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726710AbgCLSHT (ORCPT + 99 others); Thu, 12 Mar 2020 14:07:19 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:32891 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726485AbgCLSHS (ORCPT ); Thu, 12 Mar 2020 14:07:18 -0400 Received: by mail-pl1-f194.google.com with SMTP id ay11so2981626plb.0 for ; Thu, 12 Mar 2020 11:07:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=o5Rs9S0RYunQW4vpCbkDqGUyyhPOaydkpJow5/jb/es=; b=AncVZ1gcQLTl82u2dTBzHwOLmeA0NJKMbQrX2PCLBsijaiX2atkSN9hoqI5Qb9Npx3 8GaIyWKe2T+f3rJk9RzzXQvZCz65ujA0Onk1Krr2sLslYH2+Ej0b5HDR/Z7+nA7cKvv9 r3niug17WALbTsxFhepXZT41hlg5nsQurA6/uCQTafgAFIH/wehKB6T2dlHwgkWPGauz zo75aSP9c7eqrU9821PiMNY5XtBF9RQlmaqAvWKIi1b+8qcXzEDWbArHW/eL6vqrpqpu wDzxTcsPCVCzbYiCPYsfZEZD6stNOsNK1RcP4yLw5qLo75IdiEoIsLtKQ1KDcsDH10qc 9pCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=o5Rs9S0RYunQW4vpCbkDqGUyyhPOaydkpJow5/jb/es=; b=lAonWZSkV6bAm69PiMVgioQWhKs0/B88VP8F1urLOWN1q8TQ+tdmwHlYFAtaI2uHEl WxdPNZ0EhCJz3d2EZF0OXKCIl2ChsjOjmHueqCx8gF76UQVKeMGIwcH6InuwH9eclTJv ifVO495G568aX6esen9gl5m40MofnBugTYSIFZuxpyn1UjuOuHSu5nMd/vLCOA17k7R2 SQpsx0WjKcafew2E8ThEo0ym+QowRgzhXr+rdfs01X+gMd9By7gqUIXje3LgDCiaWKSc btTqqbjxdL3BGe6QYd8BJ5qVZGfOy57fgCpaxHjFWR4Uo1URDxYynAf3RaLiQTzIbkUJ ydEA== X-Gm-Message-State: ANhLgQ3HaTq/8KIHJBRwxLFkqU6DE7W5AH9oRWsFxd4x5C92ap/Ek133 WPjrIJ9v1kciMGhvThQs/i8k3g== X-Received: by 2002:a17:902:db83:: with SMTP id m3mr9039744pld.133.1584036437569; Thu, 12 Mar 2020 11:07:17 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id e18sm9603469pjt.41.2020.03.12.11.07.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Mar 2020 11:07:16 -0700 (PDT) Date: Thu, 12 Mar 2020 11:07:15 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Tetsuo Handa cc: Andrew Morton , Vlastimil Babka , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems In-Reply-To: <202003120012.02C0CEUB043533@www262.sakura.ne.jp> Message-ID: References: <993e7783-60e9-ba03-b512-c829b9e833fd@i-love.sakura.ne.jp> <202003120012.02C0CEUB043533@www262.sakura.ne.jp> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 12 Mar 2020, Tetsuo Handa wrote: > > On Thu, 12 Mar 2020, Tetsuo Handa wrote: > > > > If you have an alternate patch to try, we can test it. But since this > > > > cond_resched() is needed anyway, I'm not sure it will change the result. > > > > > > schedule_timeout_killable(1) is an alternate patch to try; I don't think > > > that this cond_resched() is needed anyway. > > > > > > > You are suggesting schedule_timeout_killable(1) in shrink_node_memcgs()? > > > > Andrew Morton also mentioned whether cond_resched() in shrink_node_memcgs() > is enough. But like you mentioned, > It passes our testing because this is where the allocator is looping while the victim is trying to exit if only it could be scheduled. > you can try re-adding sleep outside of oom_lock: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index d09776cd6e10..3aee7e0eca4e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1576,6 +1576,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > */ > ret = should_force_charge() || out_of_memory(&oc); > mutex_unlock(&oom_lock); > + schedule_timeout_killable(1); > return ret; > } > If current was process chosen for oom kill, this would actually induce the problem, not fix it. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3c4eb750a199..e80158049651 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3797,7 +3797,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > */ > if (!mutex_trylock(&oom_lock)) { > *did_some_progress = 1; > - schedule_timeout_uninterruptible(1); > return NULL; > } > > @@ -4590,6 +4589,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > /* Retry as long as the OOM killer is making progress */ > if (did_some_progress) { > + schedule_timeout_uninterruptible(1); > no_progress_loops = 0; > goto retry; > } > > By the way, will you share the reproducer (and how to use the reproducer) ? > On an UP kernel with swap disabled, you limit a memcg to 100MB and start three processes that each fault 40MB attached to it. Same reproducer as the "mm, oom: make a last minute check to prevent unnecessary memcg oom kills" patch except in that case there are two cores.