Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4886293imm; Wed, 30 May 2018 14:07:52 -0700 (PDT) X-Google-Smtp-Source: ADUXVKL0oUS4Sr/bUjEOS07GV7y7XipctrEE4EltboHY2vGFhhkCfluLSn3yUWsFiAfyEOtCFcqs X-Received: by 2002:a17:902:264:: with SMTP id 91-v6mr4243022plc.341.1527714472370; Wed, 30 May 2018 14:07:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527714472; cv=none; d=google.com; s=arc-20160816; b=vRpt50v5tuDu6dvaSmprIVM7vhVIk3/HcX5b7mMmmKbtwy1U0eRsfFFuKdW+YHT4u/ T1BkWePNlblNGo9rIO1eNnfhSvCrCdsF8mJunPPRAPFzDs4icpl1L7lPbYD0P83yoV/x Bftl9S3aLedEZVReINtX81r/VFGeRqfqtw4vkpbC9Wg4IgY7kp3SmncsBbIYS9u7CvE/ AWwsOZc+I74uet/7DpahjFzOcY8Xh2k7mHn0pGQ2LJ2ycZPrEyj/W1vO09K6YwLo+ros XItA/PORdu8saHk3Xi4d1TFkO/cmb2QaM9zxcO3sAj21HeXALn/q6XDChcqSGqweth9a 3FTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=tGgo2WSaafXG8MEch1htBXD10UY7XofGPJnNyKxVLaA=; b=xpUlGH3Kroktd4fKA6mg5Ed4XcUX6zTtmQyxtVIlOXnk+bUTlKGRCBfVDWqLkwxEMz DNJamgOkSvMRfOLyNM8Z6kliyRitfgfvHcJDvfDwPM/XkAH6Gal1ofhtJXhOcHxWDW+g IS1b3Wix3AGutcoV/8qjw+BOSp8YqFfWbOY5tBUoihnKf8WkiDhAq3JI9wVZ3P+22m5M bxJf0I1xkpITgHiqwChyCc25Qx5fbwTVREYJj+XOZgG5lWHwUsG0BO4ZPOlX/G37pil/ 28SzwF6AsXjncVm/YR97+CfIKP8OcMbciTN/3fvaPiu2qeJT/x/T8h5spO24U1pD1E7b o4Aw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OSkECjGr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q8-v6si28233584pgp.533.2018.05.30.14.07.38; Wed, 30 May 2018 14:07:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OSkECjGr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932388AbeE3VGz (ORCPT + 99 others); Wed, 30 May 2018 17:06:55 -0400 Received: from mail-pg0-f67.google.com ([74.125.83.67]:36679 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932271AbeE3VGx (ORCPT ); Wed, 30 May 2018 17:06:53 -0400 Received: by mail-pg0-f67.google.com with SMTP id m5-v6so4314958pgd.3 for ; Wed, 30 May 2018 14:06:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=tGgo2WSaafXG8MEch1htBXD10UY7XofGPJnNyKxVLaA=; b=OSkECjGrG+0QZJifxbIS/TmSn1rKE2PiPZvBNtkvrG6PpJ1PVU75/pT4jhMF2TLTw8 aaTFFgHbHnPg8eJ2lNHdTEegHUPrVYbP0Tu9LxAPDwaBmcx76wFldj76x++NrBWwEvIX 6WHdPQDYp1NLEX27HrS0XIxAoFjWjZEl3s6DtuEmQ0+H62uCm38VGfLnG+R0bsSDOGw9 GAUA/PCYSzcfOQXljtubM+ErS9Pr9yNKBF+Pp+/vBehuc+DoQzUIiqunMlHOR92e/NCy zezRcTuExXUzrcbNKr0yJ3SLkZrNr4ON72bfeYJnNnS7uvTzhQ6L9NVC5p28zm/s7Nep K5hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=tGgo2WSaafXG8MEch1htBXD10UY7XofGPJnNyKxVLaA=; b=Wt4pBh1RjkcyWVBP3Yn+otL+hqZ7UYFtCszCnl/fqlvcRjAtKQ0+8gAZqXqQLsIL9l hZ5jxaKamHupczYYvDt1xLOnfZHXCqosRJEPlwTTZLOVEdezlC7qbvDb6gzzZvuV9E69 wAwiw/3Th/1JyzOSpfaHCCpdnppyHRMg2t83h0WHqikURCKGFBEpqt7SVyQJ81n8ZEsJ 4iV5iAV++gL20DmarrQUv7c2t98mFNkzDxhqp75rK+jLEsn8WmLz+wqcBdbWfkhhKaQV AMA8SY4INgyztaCJ6raVXNnmONlEePBKl5tbkIR1NSBfwnSqIrppiYm763RIu36FvaSF 0Diw== X-Gm-Message-State: ALKqPwexDw2snmGIf3MG8NQLZGh2miyGnyu1WPZdn49+vP5DR7RIIOCE 4Z/PUkb21zuPieh1KolorNPpyg== X-Received: by 2002:a62:32c6:: with SMTP id y189-v6mr4184179pfy.241.1527714413129; Wed, 30 May 2018 14:06:53 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id s65-v6sm90946617pfj.124.2018.05.30.14.06.51 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 30 May 2018 14:06:51 -0700 (PDT) Date: Wed, 30 May 2018 14:06:51 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Michal Hocko cc: Tetsuo Handa , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [rfc patch] mm, oom: fix unnecessary killing of additional processes In-Reply-To: <20180528081345.GD1517@dhcp22.suse.cz> Message-ID: References: <20180525072636.GE11881@dhcp22.suse.cz> <20180528081345.GD1517@dhcp22.suse.cz> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 28 May 2018, Michal Hocko wrote: > > That's not sufficient since the oom reaper is also not able to oom reap if > > the mm has blockable mmu notifiers or all memory is shared filebacked > > memory, so it immediately sets MMF_OOM_SKIP and additional processes are > > oom killed. > > Could you be more specific with a real world example where that is the > case? I mean the full address space of non-reclaimable file backed > memory where waiting some more would help? Blockable mmu notifiers are > a PITA for sure. I wish we could have a better way to deal with them. > Maybe we can tell them we are in the non-blockable context and have them > release as much as possible. Still something that a random timeout > wouldn't help I am afraid. > It's not a random timeout, it's sufficiently long such that we don't oom kill several processes needlessly in the very rare case where oom livelock would actually prevent the original victim from exiting. The oom reaper processing an mm, finding everything to be mlocked, and immediately MMF_OOM_SKIP is inappropriate. This is rather trivial to reproduce for a large memory hogging process that mlocks all of its memory; we consistently see spurious and unnecessary oom kills simply because the oom reaper has set MMF_OOM_SKIP very early. This patch introduces a "give up" period such that the oom reaper is still allowed to do its good work but only gives up in the hope the victim can make forward progress at some substantial period of time in the future. I would understand the objection if oom livelock where the victim cannot make forward progress were commonplace, but in the interest of not killing several processes needlessly every time a large mlocked process is targeted, I think it compels a waiting period. > Trying to reap a different oom victim when the current one is not making > progress during the lock contention is certainly something that make > sense. It has been proposed in the past and we just gave it up because > it was more complex. Do you have any specific example when this would > help to justify the additional complexity? > I'm not sure how you're defining complexity, the patch adds ~30 lines of code and prevents processes from needlessly being oom killed when oom reaping is largely unsuccessful and before the victim finishes free_pgtables() and then also allows the oom reaper to operate on multiple mm's instead of processing one at a time. Obviously if there is a delay before MMF_OOM_SKIP is set it requires that the oom reaper be able to process other mm's, otherwise we stall needlessly for 10s. Operating on multiple mm's in a linked list while waiting for victims to exit during a timeout period is thus very much needed, it wouldn't make sense without it. > > But also note that even if oom reaping is possible, in the presence of an > > antagonist that continues to allocate memory, that it is possible to oom > > kill additional victims unnecessarily if we aren't able to complete > > free_pgtables() in exit_mmap() of the original victim. > > If there is unbound source of allocations then we are screwed no matter > what. We just hope that the allocator will get noticed by the oom killer > and it will be stopped. > It's not unbounded, it's just an allocator that acts as an antagonist. At the risk of being overly verbose, for system or memcg oom conditions: a large mlocked process is oom killed, other processes continue to allocate/charge, the oom reaper almost immediately grants MMF_OOM_SKIP without being able to free any memory, and the other important processes are needlessly oom killed before the original victim can reach exit_mmap(). This happens a _lot_. I'm open to hearing any other suggestions that you have other than waiting some time period before MMF_OOM_SKIP gets set to solve this problem.