Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp699260pxy; Wed, 21 Apr 2021 12:44:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyw7oVaza/mDz5qRykLsfxHUL4cTJfoMwqAjKgUjH5GgJhKqqsL1wKE44PMdz6yg7b4FRLm X-Received: by 2002:a17:906:d8cb:: with SMTP id re11mr33577376ejb.204.1619034246718; Wed, 21 Apr 2021 12:44:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619034246; cv=none; d=google.com; s=arc-20160816; b=VakzCK/5QQwYm5c8ZGKhRI1H7uqDe6t5h3w2K15LnsRoFFC9K3GPsucWXIE24UvYzw GHdhLNiQxttqfUkBQycO9/vRTA2lzsmEqPjoLoj43G+QuwzFglKn2JrWVU/SsbsWJo+N 80cdg0IAwnFAnpzLHD80vo7enzdTVoQO9Oqv8S4Aw7b3RAd1dYPhVbA2+NoAACbEl0AM qo9X3JC53Dngz85YjUD0xF9OBsiPJJoXjUHeMxbpH0KNHWcrBp0dXJy87DGmAzLnBuMg 9B36d0lsJOSKd2/fQhF0j90bDgKsn5j6rLzzVYFj8nBNP9MMfwmu3MUuz4Oq5j6zXHkK GdEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=KT5Z/sPJXf0qDYy55bMraiO4nEL20VpFAQ+FPYD2/dw=; b=BiE73lJWM01pYD0uv+CkJrftfkGzfmk5o6MOd4SBHIQRtOCic6Zv6/Qe4Pl+TDmZSx lqx6qodzqM5M4f4AndB0VC+pS8dkRS8CAxh98+ianAK0Fepfuxz+zJDmvNCSRPT16cFe u62nYTOJwmJSeiATX7Ewb0icypmS+WWPMhyKukytvTuJo8Cl/5mc0HbWX7OJ1GOiXuyc /YA1IOJdS6g+Il+b4tDolmX0y15Up44VjJnvEAGR2gi3LLXJ/6n3Xerf2jPWUGtu5ADU 5+iIVbk8ntSDKZ8OzjyHPZg+IUb+5Y1BwsOboPKLcpiDmaBjg/OvokoPXLtHDxzWsU8X U9AA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=WFAWOqMm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t15si188823eju.394.2021.04.21.12.43.43; Wed, 21 Apr 2021 12:44:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=WFAWOqMm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243251AbhDUOOk (ORCPT + 99 others); Wed, 21 Apr 2021 10:14:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239066AbhDUOOb (ORCPT ); Wed, 21 Apr 2021 10:14:31 -0400 Received: from mail-lf1-x131.google.com (mail-lf1-x131.google.com [IPv6:2a00:1450:4864:20::131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33D1DC06138A for ; Wed, 21 Apr 2021 07:13:58 -0700 (PDT) Received: by mail-lf1-x131.google.com with SMTP id g8so67124638lfv.12 for ; Wed, 21 Apr 2021 07:13:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KT5Z/sPJXf0qDYy55bMraiO4nEL20VpFAQ+FPYD2/dw=; b=WFAWOqMmzOQnQNVjpAKoR6YqWuxUDxMFdKgszw7Kn8uz7r8hz4pWSNkrWd5wNrvPGp q39wDsrIdJmGWqrYNH8EUO6pO+DVhIsP8AaaNicOjCkRz1CNsoCn+ETNeFl3Q8hL00Ct CI1EzrSnzpp4EWE1LXyyb2yFXfBfV/RmqRZCM6pQdzS7WbVf46BwNckO0u6hNrXyZKcf 0nxRN8JIvGm8b0shRuwYStgnyt9mEE0rBE+HLxDOYWOElrfF7Tk4/ZSmZiS4411A4+bU 38t+fVT8D6CfEj3lIXI40eG7FQCx5Bzpg1adr0Ij4D6cG3oMI17mt+20w8Tl4tniDdvW Xiag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KT5Z/sPJXf0qDYy55bMraiO4nEL20VpFAQ+FPYD2/dw=; b=t0BSkimq25+6McerXD6CpbblkIBhNfM8F2YoCX1Qk1zXkNUrLnMaqx4XxNECqI0z+d BBMgyHg7kjCL4VFDnVeeCXNCh4g3LL5Kv8RQ8gcla/ss6f5i/pHsCW8RPLKijh61+pAQ hiDQylcP1MkKajRJbxaZ0ST8kHrVtxIuoxDkCB95HG+pdBoczUOEP6SzTNNcp8h8PlGP O6l9xcevMPnjKIbZ4TG/eUvKNoOmXTCxKbTr7I1UYwzYAk0kdmnOO47LDBuaW0kooojA qPMkQ08RHGL3rWoHEDAtkcAuadIsBEfPn3sdFbUfIlr2NRTd/MMIfoBgCb2JFvpUqcS3 /dEw== X-Gm-Message-State: AOAM532Kb7CT8gJksSo5R+rPAc8cPn1emcodyMcYc3zCm9bFWNpf5er2 9nt3oCIsrBPUQzFqrX+kEF4GXLnIFo8Qe/T23Sd+Aw== X-Received: by 2002:a05:6512:2037:: with SMTP id s23mr19348557lfs.358.1619014436354; Wed, 21 Apr 2021 07:13:56 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Shakeel Butt Date: Wed, 21 Apr 2021 07:13:45 -0700 Message-ID: Subject: Re: [RFC] memory reserve for userspace oom-killer To: Michal Hocko Cc: Roman Gushchin , Johannes Weiner , Linux MM , Andrew Morton , Cgroups , David Rientjes , LKML , Suren Baghdasaryan , Greg Thelen , Dragos Sbirlea , Priya Duraisamy Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 21, 2021 at 12:23 AM Michal Hocko wrote: > [...] > > In our observation the global reclaim is very non-deterministic at the > > tail and dramatically impacts the reliability of the system. We are > > looking for a solution which is independent of the global reclaim. > > I believe it is worth purusing a solution that would make the memory > reclaim more predictable. I have seen direct reclaim memory throttling > in the past. For some reason which I haven't tried to examine this has > become less of a problem with newer kernels. Maybe the memory access > patterns have changed or those problems got replaced by other issues but > an excessive throttling is definitely something that we want to address > rather than work around by some user visible APIs. > I agree we want to address the excessive throttling but for everyone on the machine and most importantly it is a moving target. The reclaim code continues to evolve and in addition it has callbacks to diverse sets of subsystems. The user visible APIs is for one specific use-case i.e. oom-killer which will indirectly help in reducing the excessive throttling. [...] > > So, the suggestion is to have a per-task flag to (1) indicate to not > > throttle and (2) fail allocations easily on significant memory > > pressure. > > > > For (1), the challenge I see is that there are a lot of places in the > > reclaim code paths where a task can get throttled. There are > > filesystems that block/throttle in slab shrinking. Any process can get > > blocked on an unrelated page or inode writeback within reclaim. > > > > For (2), I am not sure how to deterministically define "significant > > memory pressure". One idea is to follow the __GFP_NORETRY semantics > > and along with (1) the userspace oom-killer will see ENOMEM more > > reliably than stucking in the reclaim. > > Some of the interfaces (e.g. seq_file uses GFP_KERNEL reclaim strength) > could be more relaxed and rather fail than OOM kill but wouldn't your > OOM handler be effectivelly dysfunctional when not able to collect data > to make a decision? > Yes it would be. Roman is suggesting to have a precomputed kill-list (pidfds ready to send SIGKILL) and whenever oom-killer gets ENOMEM, it would go with the kill-list. Though we are still contemplating the ways and side-effects of preferably returning ENOMEM in slowpath for oom-killer and in addition the complexity to maintain the kill-list and keeping it up to date. thanks, Shakeel