Received: by 10.192.165.148 with SMTP id m20csp2393238imm; Sun, 22 Apr 2018 06:20:30 -0700 (PDT) X-Google-Smtp-Source: AIpwx49O6VdX+JPM5BNDp/WXcFT19/PWjTDZOW+tADbc1WbbVC/TZvDfQ8JcHc5gNLCCWCimHFbw X-Received: by 2002:a17:902:32a2:: with SMTP id z31-v6mr17298085plb.41.1524403230512; Sun, 22 Apr 2018 06:20:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524403230; cv=none; d=google.com; s=arc-20160816; b=EsNISEQHUp27B8KXlXey+Peu/pf25aXztyHLD362hDCVRrrM7qFo391Xo0AP6d286P T7Qp92i1GIuIpBX1A5PG2Vs6CMJF89rZNbGcTK35x9aluhiHTPIVPnUaSznMTjZRhb89 9i4G4+NHkcjhEpeyUh0z3wvKH6l67WINvOXV9RvlgrTSu38/Y19Pn8h3KHghiBPap4DP p+5PkAOptCzcBvboafg7/IgW4lyxicUXngv7pHeeJXdgA72hKjPXv4Op4ctgJ6dByV9f b/PcrQty4kmtW+6pz+KFdh0vWgpCij3MZ9YeZtljVwU2PCCE/cHCB+MQ8rJCPYOdCoQ/ UNmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=hW9nH1uZmWyomqvN7HvkmUTyMMmqEmbT6GPYPZnCX9Q=; b=XhXl06DPnRpQyrHrAkaHJqH6K1cD0Nb59YYUfmojHGjKvk9CTLRP+YkqtRTteBYJX3 YNdLQQAi0AWB+NcewXcxjlaTPvi7ZeWIJzS29xtXhF/2q/PyjQPmbijZq86VMT2NH5oH IQ9E3QJW8Q7Wbs35iB+MUe1Db9msBPFbbO7S/8jVvOQWvHVH+tkrWxTaxeL/HYqzTzdf CvGIuDg/Anu3ye5qSVfTYQe/dIlE6M8bEUgV3g0oZWYFSH194AGhswYumW3rwd9Fv2kA +mXEVsP0joY1P+oh795hTpeFiKOIgW1XI2qLdm5v5ywxwZpNFInMwWox3MW8ETY5CXme wdOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w3-v6si9331919plb.17.2018.04.22.06.20.14; Sun, 22 Apr 2018 06:20:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751654AbeDVNTC (ORCPT + 99 others); Sun, 22 Apr 2018 09:19:02 -0400 Received: from mx2.suse.de ([195.135.220.15]:53745 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751053AbeDVNTB (ORCPT ); Sun, 22 Apr 2018 09:19:01 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 56614AE95; Sun, 22 Apr 2018 13:19:00 +0000 (UTC) Date: Sun, 22 Apr 2018 07:18:57 -0600 From: Michal Hocko To: David Rientjes Cc: Andrew Morton , Tetsuo Handa , Andrea Arcangeli , Roman Gushchin , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap Message-ID: <20180422131857.GI17484@dhcp22.suse.cz> References: <201804180057.w3I0vieV034949@www262.sakura.ne.jp> <20180418075051.GO17484@dhcp22.suse.cz> <20180419063556.GK17484@dhcp22.suse.cz> <20180420082349.GW17484@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat 21-04-18 20:45:11, David Rientjes wrote: > On Fri, 20 Apr 2018, Michal Hocko wrote: > > > > The solution is certainly not to hold > > > down_write(&mm->mmap_sem) during munlock_vma_pages_all() instead. > > > > Why not? This is what we do for normal paths. exit path just tries to be > > clever because it knows that it doesn't have to lock because there is no > > concurent user. At least it wasn't until the oom reaper came. So I > > really fail to see why we shouldn't do the most obvious thing and use > > the locking. > > > > Because the oom reaper uses the ability to acquire mm->mmap_sem as a > heuristic to determine when to give up and allow another process to be > chosen. > > If the heuristics of the oom reaper are to be improved, that's great, but > this patch fixes the oops on powerpc as 4.17 material and as a stable > backport. It's also well tested. > > > > If > > > exit_mmap() is not making forward progress then that's a separate issue; > > > > Please read what I wrote. There is a page lock and there is no way to > > guarantee it will make a forward progress. Or do you consider that not > > true? > > > > I don't have any evidence of it, and since this is called before > exit_mmap() sets MMF_OOM_SKIP then the oom reaper would need to set the > bit itself and I would be able to find the artifact it leaves behind in > the kernel log. I cannot find a single instance of a thread stuck in > munlock by way of exit_mmap() that causes the oom reaper to have to set > the bit itself, and I should be able to if this were a problem. Look. The fact that you do not _see any evidence_ is completely irrelevant. The OOM reaper is about _guarantee_. And the guarantee is gone with the page_lock because that is used in contexts which do allocate memory and it can depend on other locks. So _no way_ we can make MMF_OOM_SKIP to depend on it. I will not repeat it anymore. I will not allow to ruin the whole oom reaper endeavor by adding "this should not happen" stuff that the oom killer was full of. > > > Holding down_write on > > > mm->mmap_sem otherwise needlessly over a large amount of code is riskier > > > (hasn't been done or tested here), more error prone (any code change over > > > this large area of code or in functions it calls are unnecessarily > > > burdened by unnecessary locking), makes exit_mmap() less extensible for > > > the same reason, > > > > I do not see any of the calls in that path could suffer from holding > > mmap_sem. Do you? > > > > > and causes the oom reaper to give up and go set > > > MMF_OOM_SKIP itself because it depends on taking down_read while the > > > thread is still exiting. > > > > Which is the standard backoff mechanism. > > > > The reason today's locking methodology is preferred is because of the > backoff mechanism. Your patch kills many processes unnecessarily if the > oom killer selects a large process to kill, which it specifically tries to > do, because unmap_vmas() and free_pgtables() takes a very long time, > sometimes tens of seconds. and I absolutely agree that the feedback mechanism should be improved. The patch I propose _might_ to lead to killing another task. I do not pretend otherwise. But it will keep the lockup free guarantee which is oom repeer giving us. Btw. the past oom implementation would simply kill more in that case as well because exiting tasks with task->mm == NULL would be ignored completely. So this is not a big regression even if that happens occasionally. Maybe invoking the reaper as suggested by Tetsuo will help here. Maybe we will come up with something more smart. But I would like to have a stop gap solution for stable that is easy enough. And your patch is not doing that because it adds a very subtle dependency on the page lock. So please stop repeating your arguments all over and either come with an argument which proves me wrong and the lock_page dependency is not real or come with an alternative solution which doesn't make MMF_OOM_SKIP depend on the page lock. -- Michal Hocko SUSE Labs