Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp1499491pxb; Sun, 19 Sep 2021 20:17:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy+h9MQ5Nme6pu1l6L9WFMnlRxSKY7REr9LoraXUp8qdxcjFrNiHyVyvhqJPduGcLqJp0Bo X-Received: by 2002:a5d:850f:: with SMTP id q15mr17524500ion.118.1632107871583; Sun, 19 Sep 2021 20:17:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632107871; cv=none; d=google.com; s=arc-20160816; b=M6/oYqFYZMEONLb38VYykZ0PZ8rsViTp7uu8C7/LKgj6C0TazceKnA+cZX+p+d0HlH e99I8vNawfQ2hvR7baNx58nA+q53ulYJpVU88NYSfTy0EnWAILUoFWhp9AJbnfK06w39 AqmDBlPKyHKEmQouZyKcbj5SjvaI24/f0fmVZuj62AqiuMOBPy3eJ09qiACjUyoGXYp0 iWvyvcpfHl5+sSiwPmXJKsTZ55crqPY5LnH6+De9WcbdT89xXtyZ8BPUBaUMCzATeWNN Z8VZWLfDPWINKkvFlCM0/Tbp/ILK950Qa7Pelf4W7qgejC8lKHsVp4WQUDgBPo6LYrWf dDdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=fgiT+Bo348EFLYg0CavZDhWfAeXE2uDD/plU1rmXRsU=; b=yLrnSQAuWveijVB/u0lvBTBklNSamrqznzVtLS1oEEopSbgw/qZyDR88fQWz7/Jusm wQM502l8PGlUOFErqu+l9jSdaQw14qAol7VzmrIQsK4gSusPinNDFMt8sUx4dzuhr292 2NEIUtdV7UIbZ9slbyyB+Rzm+rUpoImRpY2bJonqKgbnTBE3oD71TtxxtwRCI/zcPhjE ioFjdP7UH454hh9fEv5CNyOaj05+l0a5/K72+j76xsb8uWLZRx3WxouUTxxxwLIShNBD jPcUg6TLpq4YcKVD480Xixg0B8h6iSFyS1uIb5/b8f9JmTemHYl6czzX5i9MKNuG5uJa nj3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=ueCIGMa2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c18si13794490iod.38.2021.09.19.20.17.40; Sun, 19 Sep 2021 20:17:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=ueCIGMa2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233500AbhISXcz (ORCPT + 99 others); Sun, 19 Sep 2021 19:32:55 -0400 Received: from mail.kernel.org ([198.145.29.99]:44370 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229517AbhISXcy (ORCPT ); Sun, 19 Sep 2021 19:32:54 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id B1E6260F9D; Sun, 19 Sep 2021 23:31:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1632094288; bh=zyJGiIuWE5OLgrUwe9tcbx/tG63SoIirkujQLHeuHsk=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=ueCIGMa2UqDW7NZYr2xH1V1f2uLeYgpzucwCUQsgrMzW1PofIQ59as8TZpWnli+P9 EUKAVZXzAPzg4ZiWrxiQio7O2cU+4X6QoXL6rhQZrPG/m1WGOposi9Gb1c6ei6SMNY QyzFxQUkv59B/4V5H9uH6lYE0kxYf0t55rngF3Sg= Date: Sun, 19 Sep 2021 16:31:26 -0700 From: Andrew Morton To: Vasily Averin Cc: Michal Hocko , Johannes Weiner , Vladimir Davydov , Tetsuo Handa , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel@openvz.org, "Uladzislau Rezki (Sony)" Subject: Re: [PATCH mm] vmalloc: back off when the current task is OOM-killed Message-Id: <20210919163126.431674722b8db218453dc18c@linux-foundation.org> In-Reply-To: References: X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.32; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 17 Sep 2021 11:06:49 +0300 Vasily Averin wrote: > Huge vmalloc allocation on heavy loaded node can lead to a global > memory shortage. A task called vmalloc can have the worst badness > and be chosen by OOM-killer, however received fatal signal and > oom victim mark does not interrupt allocation cycle. Vmalloc will > continue allocating pages over and over again, exacerbating the crisis > and consuming the memory freed up by another killed tasks. > > This patch allows OOM-killer to break vmalloc cycle, makes OOM more > effective and avoid host panic. > > Unfortunately it is not 100% safe. Previous attempt to break vmalloc > cycle was reverted by commit b8c8a338f75e ("Revert "vmalloc: back off when > the current task is killed"") due to some vmalloc callers did not handled > failures properly. Found issues was resolved, however, there may > be other similar places. Well that was lame of us. I believe that at least one of the kernel testbots can utilize fault injection. If we were to wire up vmalloc (as we have done with slab and pagealloc) then this will help to locate such buggy vmalloc callers. > Such failures may be acceptable for emergencies, such as OOM. On the other > hand, we would like to detect them earlier. However they are quite rare, > and will be hidden by OOM messages, so I'm afraid they wikk have quite > small chance of being noticed and reported. > > To improve the detection of such places this patch also interrupts the vmalloc > allocation cycle for all fatal signals. The checks are hidden under DEBUG_VM > config option to do not break unaware production kernels. This sounds like a pretty sad half-measure?