Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp6029004pxb; Tue, 16 Feb 2021 14:07:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJy/UYzd03D+xuk5N3aUxA+fS05IuIY/qmYSQsfffjBdvYxukGAw4taLD12poRc1b9BRbw0H X-Received: by 2002:a17:906:5fc5:: with SMTP id k5mr11804528ejv.207.1613513267749; Tue, 16 Feb 2021 14:07:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613513267; cv=none; d=google.com; s=arc-20160816; b=V5wkcBKMaz+SeYz6CBLeBY1nwRzi5q7FlpEr08Y9cQ7ptFQ5a3zwakJvdWY3gQ9XTb DJxWiOfovzTgM8wYRrTnNbiIh6M4OCvC8Lvk1138Jdp1vLb9OHcORE6WH807eaRUeotk AunV5GGtUc17gW6chIBQNdGyVgoFVV6A+4Vz3e+iLZ6Al/BkXVF7HcSmLhco4dlLxXER Qi1Nbe41pSkpfG67OAbSOQdsX2Q5M4hQthTg5kXOuXIn6iyNLWPvwG6F85hH3FrGf5ZI OsI4A57aNwrm1l5u6k4z/ODq4CD5QdLrsd/tJimvEkWxoEzlRC/PGLbrPSu2k/YNKTCQ 1Xcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=0zls66CbYUnShNIJgPu/UgmV9jANuDUd31/IHtthW64=; b=bQ1hhlm0Ufjm0swGuFguSmt5eqdBeplENCuta5bc47blaYq0VW7HbzpvmX77hWopFu XRxmK66EPs7ZY84+a+pkEMMZVVRSHe55kaVo+yEx748/4YCFs1M0O7/pQ+KMJ0WNfCvj a+96sAqF8I8uE1fl/z0Jho3cgMKh7tInx47SrkL7BACZjVeR0tnQXaUixtg/IXX47Glj oneAHSvr7p7mnYX0XF050PcRp2xk5dhnPzYupJyfBbqQwFtCkN06G9MXubAdWcnzxzsS yja5QBl/8nCYk5fchyrVJZ2qwBmuOSxKbvLXhb3TLOZThCWXKRzroYOx3N8rFN1TTk2S aCHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rslZrRF7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c4si14279322edf.577.2021.02.16.14.07.06; Tue, 16 Feb 2021 14:07:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rslZrRF7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229912AbhBPVyD (ORCPT + 99 others); Tue, 16 Feb 2021 16:54:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229577AbhBPVxy (ORCPT ); Tue, 16 Feb 2021 16:53:54 -0500 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3433C06174A for ; Tue, 16 Feb 2021 13:53:14 -0800 (PST) Received: by mail-pf1-x432.google.com with SMTP id w18so7035871pfu.9 for ; Tue, 16 Feb 2021 13:53:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=0zls66CbYUnShNIJgPu/UgmV9jANuDUd31/IHtthW64=; b=rslZrRF79usj1wpjXVj2FLVT6ukeUqIXfYp5xeWcRAHSl9cQ0Fb2JHM2/VQtacubdv 0jDyyHeGdw6FoQDPKyx172wGjZ6ol9M0R32Bmyv+r005q0S1f94nj8WnQpqd5p19nBjz zmrlTX/nbqP1cM/ECqSaNioNZ/IYivwB+q2uiXTB0QixjmFBaug0ley9QJW7pjYXUNHT MHCYLkoUTgbovleuh8cFnsmQhpmdcGCbuyLYfpnC5jJPNNvi7xpCq4ibT2w2WOxV7yNo lk93w1kHnbjWL8c1TF8WOgtA+bfPVUpqkgQ7pXgieOh+yyIFhkDM4JWUJvtzrYpn8zob S1xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=0zls66CbYUnShNIJgPu/UgmV9jANuDUd31/IHtthW64=; b=O1e5b7LmngMEacKgU0zvByNMtSUVtDa74t7OrSrM2cICfzMb+eC5zURHf5WjjxyPrd vIqZRvD0dYN7IwH4ja1rv9gid5rwniLJF2SInGflo95yR26nJBoo89C6bwTsWkSvSCM4 ETEBg9kaKSObZ72BgUGjZnnp0Dw3ZKhE7FQtO0m+Kb2Tr1IMOnYj+NVLBxyScig5pBB8 DrTe6csRpWw2KCS+IR4N59oIxvHNA+6/z9aeK0XTUOfe6mUNDoN8OXPVxwPmLZBO3dbZ ppKOgG/5MGEEoJfcGHO5VNG9UMoBwtqxfArbVz6PMtHZHim7cqQoRhLzSge3lhwLcUo+ 5+aw== X-Gm-Message-State: AOAM531RYQgkwcvzXl/cb8bNGF0q7iDEvLA/A5AaF2aVnNax5zqRhPNl cCOekgjDru0W/OAOFg6uPuWXXQ== X-Received: by 2002:a05:6a00:1a08:b029:1cd:404e:a70c with SMTP id g8-20020a056a001a08b02901cd404ea70cmr21540920pfv.33.1613512393866; Tue, 16 Feb 2021 13:53:13 -0800 (PST) Received: from [2620:15c:17:3:984e:d574:ca36:ce3c] ([2620:15c:17:3:984e:d574:ca36:ce3c]) by smtp.gmail.com with ESMTPSA id w11sm106603pge.28.2021.02.16.13.53.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Feb 2021 13:53:13 -0800 (PST) Date: Tue, 16 Feb 2021 13:53:12 -0800 (PST) From: David Rientjes To: Michal Hocko cc: Eiichi Tsukata , corbet@lwn.net, mike.kravetz@oracle.com, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, akpm@linux-foundation.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, felipe.franciosi@nutanix.com Subject: Re: [RFC PATCH] mm, oom: introduce vm.sacrifice_hugepage_on_oom In-Reply-To: Message-ID: References: <20210216030713.79101-1-eiichi.tsukata@nutanix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 16 Feb 2021, Michal Hocko wrote: > > Hugepages can be preallocated to avoid unpredictable allocation latency. > > If we run into 4k page shortage, the kernel can trigger OOM even though > > there were free hugepages. When OOM is triggered by user address page > > fault handler, we can use oom notifier to free hugepages in user space > > but if it's triggered by memory allocation for kernel, there is no way > > to synchronously handle it in user space. > > Can you expand some more on what kind of problem do you see? > Hugetlb pages are, by definition, a preallocated, unreclaimable and > admin controlled pool of pages. Small nit: true of non-surplus hugetlb pages. > Under those conditions it is expected > and required that the sizing would be done very carefully. Why is that a > problem in your particular setup/scenario? > > If the sizing is really done properly and then a random process can > trigger OOM then this can lead to malfunctioning of those workloads > which do depend on hugetlb pool, right? So isn't this a kinda DoS > scenario? > > > This patch introduces a new sysctl vm.sacrifice_hugepage_on_oom. If > > enabled, it first tries to free a hugepage if available before invoking > > the oom-killer. The default value is disabled not to change the current > > behavior. > > Why is this interface not hugepage size aware? It is quite different to > release a GB huge page or 2MB one. Or is it expected to release the > smallest one? To the implementation... > > [...] > > +static int sacrifice_hugepage(void) > > +{ > > + int ret; > > + > > + spin_lock(&hugetlb_lock); > > + ret = free_pool_huge_page(&default_hstate, &node_states[N_MEMORY], 0); > > ... no it is going to release the default huge page. This will be 2MB in > most cases but this is not given. > > Unless I am mistaken this will free up also reserved hugetlb pages. This > would mean that a page fault would SIGBUS which is very likely not > something we want to do right? You also want to use oom nodemask rather > than a full one. > > Overall, I am not really happy about this feature even when above is > fixed, but let's hear more the actual problem first. Shouldn't this behavior be possible as an oomd plugin instead, perhaps triggered by psi? I'm not sure if oomd is intended only to kill something (oomkilld? lol) or if it can be made to do sysadmin level behavior, such as shrinking the hugetlb pool, to solve the oom condition. If so, it seems like we want to do this at the absolute last minute. In other words, reclaim has failed to free memory by other means so we would like to shrink the hugetlb pool. (It's the reason why it's implemented as a predecessor to oom as opposed to part of reclaim in general.) Do we have the ability to suppress the oom killer until oomd has a chance to react in this scenario?