Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9155C433FE for ; Wed, 24 Nov 2021 13:11:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347666AbhKXNNN (ORCPT ); Wed, 24 Nov 2021 08:13:13 -0500 Received: from mail.kernel.org ([198.145.29.99]:52052 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347595AbhKXNKO (ORCPT ); Wed, 24 Nov 2021 08:10:14 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8713261A79; Wed, 24 Nov 2021 12:41:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1637757685; bh=0ZdyZPsaU8WtkCDFRr4Z27JM/ySAtAZk51ZYzEA3Pzw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sGaY2sOrAVyEbFRP5FLisGWO8imyz5uBzKGzmakw0vKC3Lpike/xhAKWXR5IJu17X 8iRs3i2iq0ocvMFv9jaO+awD3y9mFSJ1ePLYzmlrWbdANTQ6DyIFUrfiIyIOqnUrxt jh0f1FHKArctB1C0vVb35x5/hEtONJIG+G5g/HH8= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Michal Hocko , Vasily Averin , Johannes Weiner , Mel Gorman , Roman Gushchin , Shakeel Butt , Tetsuo Handa , Uladzislau Rezki , Vladimir Davydov , Vlastimil Babka , Andrew Morton , Linus Torvalds Subject: [PATCH 4.19 243/323] mm, oom: do not trigger out_of_memory from the #PF Date: Wed, 24 Nov 2021 12:57:13 +0100 Message-Id: <20211124115727.115967139@linuxfoundation.org> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211124115718.822024889@linuxfoundation.org> References: <20211124115718.822024889@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Michal Hocko commit 60e2793d440a3ec95abb5d6d4fc034a4b480472d upstream. Any allocation failure during the #PF path will return with VM_FAULT_OOM which in turn results in pagefault_out_of_memory. This can happen for 2 different reasons. a) Memcg is out of memory and we rely on mem_cgroup_oom_synchronize to perform the memcg OOM handling or b) normal allocation fails. The latter is quite problematic because allocation paths already trigger out_of_memory and the page allocator tries really hard to not fail allocations. Anyway, if the OOM killer has been already invoked there is no reason to invoke it again from the #PF path. Especially when the OOM condition might be gone by that time and we have no way to find out other than allocate. Moreover if the allocation failed and the OOM killer hasn't been invoked then we are unlikely to do the right thing from the #PF context because we have already lost the allocation context and restictions and therefore might oom kill a task from a different NUMA domain. This all suggests that there is no legitimate reason to trigger out_of_memory from pagefault_out_of_memory so drop it. Just to be sure that no #PF path returns with VM_FAULT_OOM without allocation print a warning that this is happening before we restart the #PF. [VvS: #PF allocation can hit into limit of cgroup v1 kmem controller. This is a local problem related to memcg, however, it causes unnecessary global OOM kills that are repeated over and over again and escalate into a real disaster. This has been broken since kmem accounting has been introduced for cgroup v1 (3.8). There was no kmem specific reclaim for the separate limit so the only way to handle kmem hard limit was to return with ENOMEM. In upstream the problem will be fixed by removing the outdated kmem limit, however stable and LTS kernels cannot do it and are still affected. This patch fixes the problem and should be backported into stable/LTS.] Link: https://lkml.kernel.org/r/f5fd8dd8-0ad4-c524-5f65-920b01972a42@virtuozzo.com Signed-off-by: Michal Hocko Signed-off-by: Vasily Averin Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Mel Gorman Cc: Roman Gushchin Cc: Shakeel Butt Cc: Tetsuo Handa Cc: Uladzislau Rezki Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/oom_kill.c | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-) --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1135,19 +1135,15 @@ bool out_of_memory(struct oom_control *o } /* - * The pagefault handler calls here because it is out of memory, so kill a - * memory-hogging task. If oom_lock is held by somebody else, a parallel oom - * killing is already in progress so do nothing. + * The pagefault handler calls here because some allocation has failed. We have + * to take care of the memcg OOM here because this is the only safe context without + * any locks held but let the oom killer triggered from the allocation context care + * about the global OOM. */ void pagefault_out_of_memory(void) { - struct oom_control oc = { - .zonelist = NULL, - .nodemask = NULL, - .memcg = NULL, - .gfp_mask = 0, - .order = 0, - }; + static DEFINE_RATELIMIT_STATE(pfoom_rs, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); if (mem_cgroup_oom_synchronize(true)) return; @@ -1155,8 +1151,6 @@ void pagefault_out_of_memory(void) if (fatal_signal_pending(current)) return; - if (!mutex_trylock(&oom_lock)) - return; - out_of_memory(&oc); - mutex_unlock(&oom_lock); + if (__ratelimit(&pfoom_rs)) + pr_warn("Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF\n"); }