Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2954281imm; Thu, 24 May 2018 19:49:16 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpNVAJj8BRhEQBLj//gWjpr/EUblzFVYar2hJgNr4wcDBKOSa5b9j/ewocRj7lL1BdCZfTs X-Received: by 2002:a17:902:868b:: with SMTP id g11-v6mr596456plo.305.1527216556214; Thu, 24 May 2018 19:49:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527216556; cv=none; d=google.com; s=arc-20160816; b=thjj44lCRKT3YG3HprekbFe5q+UDWvuOKCuQ7DWW93cQ6j93MMBX9HXZd81bkycGqz I3y3C2Uk9vyqkQnnzUtI9SZJw1kAfy0/eUUsCbiHyMafBF/SZf2fgcjX+cqPQv1Q6UEX 9gc+rJNlJehboK0OkMtaTML39FDV2GC9nYa0x1U6w+obBBREf3aUmiHDJG0Xh4/N/hYe k3/Vq01a/pU3oVhB8PIhXvUjjXJ1J2wjsbp7g6rI85sZJFJ0wiP+fK8OqY0GOsk/spY1 yK/If4fgFd9KTETwkLYACz8I7+sR3ibqnFqdRcjKzMqLdA62wfMEgUeiKWE88mRHNITQ Duyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :references:date:mime-version:cc:to:from:subject:message-id :arc-authentication-results; bh=yyAovjtUBiogKUMQDxHYpol/R5iMyhID4SDCaEAOhxI=; b=BV7GB0fx9Fqsfxwv226lAQk745GFAx20yKfVDV4lDRywzWPj5dkN8JiPCHWf71sWHW lurl68oWWeOan/2pyBskhWyLXgVfLconp85p7Q0OhmZJwfsK9llTz+njLfqCcYbsiO7D 4lAGnveIoXD4SvNyDoKNczEtG4O7CCZHvygTfrZbV6ZgHt3DuaEEIcE1D+4M3s4ciVVk Us2D6EVpeacR/f6GScIAj3FeqJYlop3ze+nNAZNJ/ka5hqUJ7QfJwn/NilTdD0jHBO4g wauc8UjKP4nL0I5qL5OzGlppy0h8ZRcs6GYgoBoIq0kbu0C9lCsTFt8aXW3yli7t3KAY DMvg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t188-v6si16870172pgc.458.2018.05.24.19.49.01; Thu, 24 May 2018 19:49:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S970367AbeEYATO (ORCPT + 99 others); Thu, 24 May 2018 20:19:14 -0400 Received: from www262.sakura.ne.jp ([202.181.97.72]:34129 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S967798AbeEYATN (ORCPT ); Thu, 24 May 2018 20:19:13 -0400 Received: from fsav102.sakura.ne.jp (fsav102.sakura.ne.jp [27.133.134.229]) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id w4P0J3MY018579; Fri, 25 May 2018 09:19:03 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav102.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav102.sakura.ne.jp); Fri, 25 May 2018 09:19:03 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav102.sakura.ne.jp) Received: from www262.sakura.ne.jp (localhost [127.0.0.1]) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id w4P0J3Hf018567; Fri, 25 May 2018 09:19:03 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Received: (from i-love@localhost) by www262.sakura.ne.jp (8.14.5/8.14.5/Submit) id w4P0J3Dl018566; Fri, 25 May 2018 09:19:03 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Message-Id: <201805250019.w4P0J3Dl018566@www262.sakura.ne.jp> X-Authentication-Warning: www262.sakura.ne.jp: i-love set sender to penguin-kernel@i-love.sakura.ne.jp using -f Subject: Re: [rfc patch] mm, oom: fix unnecessary killing of additional processes From: Tetsuo Handa To: David Rientjes Cc: Michal Hocko , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org MIME-Version: 1.0 Date: Fri, 25 May 2018 09:19:03 +0900 References: In-Reply-To: Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org David Rientjes wrote: > The oom reaper ensures forward progress by setting MMF_OOM_SKIP itself if > it cannot reap an mm. This can happen for a variety of reasons, > including: > > - the inability to grab mm->mmap_sem in a sufficient amount of time, > > - when the mm has blockable mmu notifiers that could cause the oom reaper > to stall indefinitely, > > but we can also add a third when the oom reaper can "reap" an mm but doing > so is unlikely to free any amount of memory: > > - when the mm's memory is fully mlocked. - when the mm's memory is fully mlocked (needs privilege) or fully shared (does not need privilege) > > When all memory is mlocked, the oom reaper will not be able to free any > substantial amount of memory. It sets MMF_OOM_SKIP before the victim can > unmap and free its memory in exit_mmap() and subsequent oom victims are > chosen unnecessarily. This is trivial to reproduce if all eligible > processes on the system have mlocked their memory: the oom killer calls > panic() even though forward progress can be made. s/mlocked/mlocked or shared/g > > This is the same issue where the exit path sets MMF_OOM_SKIP before > unmapping memory and additional processes can be chosen unnecessarily > because the oom killer is racing with exit_mmap(). > > We can't simply defer setting MMF_OOM_SKIP, however, because if there is > a true oom livelock in progress, it never gets set and no additional > killing is possible. > > To fix this, this patch introduces a per-mm reaping timeout, initially set > at 10s. It requires that the oom reaper's list becomes a properly linked > list so that other mm's may be reaped while waiting for an mm's timeout to > expire. I already proposed more simpler one at https://patchwork.kernel.org/patch/9877991/ . > > The exit path will now set MMF_OOM_SKIP only after all memory has been > freed, so additional oom killing is justified, and rely on MMF_UNSTABLE to > determine when it can race with the oom reaper. > > The oom reaper will now set MMF_OOM_SKIP only after the reap timeout has > lapsed because it can no longer guarantee forward progress. > > The reaping timeout is intentionally set for a substantial amount of time > since oom livelock is a very rare occurrence and it's better to optimize > for preventing additional (unnecessary) oom killing than a scenario that > is much more unlikely. But before thinking about your proposal, please think about how to guarantee that the OOM reaper and the exit path can run discussed at http://lkml.kernel.org/r/201805122318.HJG81246.MFVFLFJOOQtSHO@I-love.SAKURA.ne.jp .