Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp3846925imm; Fri, 25 May 2018 12:45:31 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoU03K/Fv/rlyl96sTEbl3NlXCEAAYlsuvhU+e/Lzlcn5i3tgSegjZFGn8jVOQWa10iXe9g X-Received: by 2002:a63:9812:: with SMTP id q18-v6mr2988984pgd.170.1527277531088; Fri, 25 May 2018 12:45:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527277531; cv=none; d=google.com; s=arc-20160816; b=fTZcQRilaA/YKXAmnN7gL43HkV4mXHu1LG4dwHliumeiYpwv4WBMoIrm5t9LMuiBvc dFwm4SfRhlkP6SYPdGlBxAiN7s/1YmkvtSRrRF0gEP99wwz/fM2pbhR3YFhpCwFVwT9S rMu1lyXALXHZYQe2FNjf6Xn9mDnl6vA+/ouAHJl3SyaGhaRFcDR31e69Vnn1Z09Hz9sO poCh0BP2fVNVyHheFA8BwoPUGJy6c9qpbZkErncdhhRAhKR/YxA0zyhQ7T2nec9lfYTm 0LTzMpeJHpBWFzGI62rB1XGpbyzuB1xsGomRfeHM6yfCFHl7lh5Ue3Nxdm589tpLumOW vraQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=mPxSjj5hjCZ4vfmSZOfwPGDnjviluiUdpmLdWKRQydU=; b=YcDTpbDY4edrDFoUgWnK4rBfs9a2fszUGB/WoVWTs601eB77gjuQGon4jttnw7WC4h jLGQ/OrW8NWp9Yv6bZzVKPqhZfl3xcWddHFJiLyyqZpUuEpZx05K4hdEdh+iFPASMP96 yrUp7hBvXaVcOtP50yyZ2iGomVvUmoKTrdttbabqLS9n/dJWeuZQgANge4rypeUeqSjD vTGBy8AN9pwF+9W+7oXLFpbFNCNzrtWKUUVq50FXuWTFIMA9fYNIv/Yb/nlhahdnH5BR 5eXFi0LALkkiNRnmpLG9dNrCqzc77kU4bS9PHAlvaNmQEbJHZyM5xfamzKnty1NTwEwA vz1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=I7sm1NTU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u76-v6si1432384pfj.58.2018.05.25.12.45.09; Fri, 25 May 2018 12:45:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=I7sm1NTU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S968247AbeEYToa (ORCPT + 99 others); Fri, 25 May 2018 15:44:30 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:38046 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S967803AbeEYTo3 (ORCPT ); Fri, 25 May 2018 15:44:29 -0400 Received: by mail-pf0-f193.google.com with SMTP id o76-v6so3039220pfi.5 for ; Fri, 25 May 2018 12:44:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=mPxSjj5hjCZ4vfmSZOfwPGDnjviluiUdpmLdWKRQydU=; b=I7sm1NTUSEEMv6wp+RhYIOZBf7Po3yWZcW0Blslm1dMBcuGl0uk0lruTactRvwBYxu FX5Ym4lg2qlPAarXSZWSTec5KvKkAbzVCaB2xYUrQDUfdN6JKZAg2LbC9vT/9+GbBjQN VsJdZ2CxQdgHvtIbH8L5dVHWcGAtPDokltjwv+iYJllsp9Nanm+RKZWN0Mw9M2kp2MV4 tnEx9Z2ZEHcLjD/+r5TXO+vURHyFKFU/skvqTMtTzgBw0aRhEtr0Ff0b4Kyrwo7yOMG7 3D2vvzSyuWI0RgceAWAplbyGZkJKKVa3F+jktjqBa5Sg/B8t/iOa6III0hUmHa2+hKWs jM1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=mPxSjj5hjCZ4vfmSZOfwPGDnjviluiUdpmLdWKRQydU=; b=ILsI8sA+FIaGbvA8xEFk2oeYeqSc6i+2BIWLUAHYXbV7N8GG/qJWD7WOa1ph7MV6f4 dZ0xb/Zbqk51nct/vr+zN8MIfj2yVcSd3sSnXafQRxCicX9bXScqPCNmNRRgc5ALoFRv ht3E25CyjOcpKvf53xXE4xGDG35G9AhvE8JzuaVrWCp1qve2CQpKPPS26RXRbbmqbxs4 LmuDsY/N8aoXFx1H3VXMImnKsXbXUxuMOeKmMK6GOW6f2C5XoC9aubdxJtvSbLIX7nhi HYP5WJLaXQXSdKUONvczNbQ66oAN4dFSwvgK3sFCP101WvhmVgaUVa7yBx4WKVnqO1zm HFRw== X-Gm-Message-State: ALKqPwcfM+VayoPm0e+AXrzDZcanA7lE+s+F+3mMkAgpNfoxXmJ/nsip OCGZsVkfUiFyDs6v9FfywlPymA== X-Received: by 2002:a65:51c4:: with SMTP id i4-v6mr3010780pgq.190.1527277468506; Fri, 25 May 2018 12:44:28 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id u47-v6sm11478962pgn.70.2018.05.25.12.44.27 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 25 May 2018 12:44:27 -0700 (PDT) Date: Fri, 25 May 2018 12:44:27 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Tetsuo Handa cc: Michal Hocko , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [rfc patch] mm, oom: fix unnecessary killing of additional processes In-Reply-To: <201805250019.w4P0J3Dl018566@www262.sakura.ne.jp> Message-ID: References: <201805250019.w4P0J3Dl018566@www262.sakura.ne.jp> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 25 May 2018, Tetsuo Handa wrote: > > The oom reaper ensures forward progress by setting MMF_OOM_SKIP itself if > > it cannot reap an mm. This can happen for a variety of reasons, > > including: > > > > - the inability to grab mm->mmap_sem in a sufficient amount of time, > > > > - when the mm has blockable mmu notifiers that could cause the oom reaper > > to stall indefinitely, > > > > but we can also add a third when the oom reaper can "reap" an mm but doing > > so is unlikely to free any amount of memory: > > > > - when the mm's memory is fully mlocked. > > - when the mm's memory is fully mlocked (needs privilege) or > fully shared (does not need privilege) > Good point, that is another way that unnecessary oom killing can occur because the oom reaper sets MMF_OOM_SKIP far too early. I can make the change to the commit message. Also, I noticed in my patch that oom_reap_task() should be doing list_add_tail() rather than list_add() to enqueue the mm for reaping again. > > This is the same issue where the exit path sets MMF_OOM_SKIP before > > unmapping memory and additional processes can be chosen unnecessarily > > because the oom killer is racing with exit_mmap(). > > > > We can't simply defer setting MMF_OOM_SKIP, however, because if there is > > a true oom livelock in progress, it never gets set and no additional > > killing is possible. > > > > To fix this, this patch introduces a per-mm reaping timeout, initially set > > at 10s. It requires that the oom reaper's list becomes a properly linked > > list so that other mm's may be reaped while waiting for an mm's timeout to > > expire. > > I already proposed more simpler one at https://patchwork.kernel.org/patch/9877991/ . > It's a similar idea, and I'm glad that we agree that some kind of per-mm delay is required to avoid this problem. I think yours is simpler, but consider the other two changes in my patch: - in the normal exit path, absent any timeout for the mm, we only set MMF_OOM_SKIP after free_pgtables() when it is known we will not free any additional memory, which can also cause unnecessary oom killing because the oom killer races with free_pgtables(), and - the oom reaper now operates over all concurrent victims instead of repeatedly trying to take mm->mmap_sem of the first victim, sleeping many times, retrying, giving up, and moving on the next victim. Allowing the oom reaper to iterate through all victims can allow memory freeing such that an allocator may be able to drop mm->mmap_sem. In fact, with my patch, I don't know of any condition where we kill additional processes unnecessarily *unless* the victim cannot be oom reaped or complete memory freeing in the exit path within 10 seconds. Given how rare oom livelock appears in practice, I think the 10 seconds is justified because right now it is _trivial_ to oom kill many victims completely unnecessarily.