Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp604442pxy; Wed, 21 Apr 2021 10:17:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxXesVCdRYfdwvhvme/7fPYHCVqf+zQu0tVl/gGVutNZa9h8jcA7RMSswXjFIwMOSeM/MP3 X-Received: by 2002:a17:906:aec6:: with SMTP id me6mr33593533ejb.52.1619025457522; Wed, 21 Apr 2021 10:17:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619025457; cv=none; d=google.com; s=arc-20160816; b=dn1NClbBhTw8e8YN/wXmu2BGF4uqfjuhpDGAoVueO6qVx1Kf5iH/ifL0LhCovcf9bL FQdPgg1IhNKmkvwg0Ee2Jy48s78gYUih+ZdQJUfD27iM5T46JMHLUPweokFvVmQr3rQd xuKs4Lax6agn+8Prylbp96dFemQEYXckIdhYpnwqqhkns6LVEODAx6Ts9+QtOpSRXyZn ZB8L2QPZqQadcp1B57Ok50dJvsZF7YQbYmYbM1lpcGF9zcUxK7SArdYCUiGKtj4ioOOJ tV4VMbtcTVWxQDSOt7vj/yqt7vxKZcz5lDcUBo7RUtaxdtapkl+XHBpVjG8zekDXhUs+ JnJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ZywKeowOaZ5VjiR6Xyl1lA10hFbmQYAWAA2LZ4rnjYw=; b=Xk/XYz6KSFKpJvmfYZgzaNvO1wuPSYSoUD6jnKu68jQwBWl/2Fzc8InyH4HaqK4nSd qerlKiDKS26tWio/PYmch/KD70ZZxc3EbaP3m1IbAGZDyIqUbPFJJx4oqnS78cPu8wkr HVfNr2UH+qTdJoAcqf2aq18g3N8I5rY5myPb9ohD0j21kW7aeL5L5HmmKYQYXos6rEWJ dNVGtX3KghaidJ0UYCcbCnwwd8uwB+CE5kXriDRLcU2Y6b//58ZK1ddrAzZLznJcicto w4abyLEaVYTNLtstr4MwqpNcASJhNk9yhJ+CDjYZG8/LT2O163Lvu/oGcK14G7XHGlE1 U2WA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=CK1O++Mt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r22si2286201edw.93.2021.04.21.10.17.14; Wed, 21 Apr 2021 10:17:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=CK1O++Mt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239158AbhDUN1Z (ORCPT + 99 others); Wed, 21 Apr 2021 09:27:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238455AbhDUN1X (ORCPT ); Wed, 21 Apr 2021 09:27:23 -0400 Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A60B5C06174A for ; Wed, 21 Apr 2021 06:26:50 -0700 (PDT) Received: by mail-lf1-x12c.google.com with SMTP id h36so12690864lfv.7 for ; Wed, 21 Apr 2021 06:26:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZywKeowOaZ5VjiR6Xyl1lA10hFbmQYAWAA2LZ4rnjYw=; b=CK1O++Mtfpi16+3yujCdkCfBGjU4sVfzcD13bBfsUdBy4QfY5X0LGDdVSJi7h59uGE CAbgTBgagVouDpUaWZMzdF9CN6uKvmqX57EAYp9CPdLIm2iwV3lrOhiMPNnXnS1UAc3U VOjJ9diIXc6gRimHPoCeH9Hiqca0KFQ/WE/zg2xuxmv7LyUlzY+T+aZvjN/dULKrWWGN AtQZCyKcPiVpNk7gB9XIYbHiTdQe2wVPftF0ZEMKDc8lqANKKd2tbwSEwcHIzZdonz3B mcNDmd8/vG/gsoS8jfvtZABzkpzTRy1JgaOM1Emh/D0bKwhc4dtv2LM7QlFYVIRXeY3A Ulog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZywKeowOaZ5VjiR6Xyl1lA10hFbmQYAWAA2LZ4rnjYw=; b=H5/nsxNXHdbYbK+SioI3I8Wmd1Jt3SvYXxoFBoY3QftZ0FY2LTPBZAnasvk+LLhitw e9hJhEoM+BFlXHZjrii16KiF0h9Cx5u/sIkDSJXIfr764SRCO+tjTrl/NbyeBr0vgW1X 0ETq0A+Sp9VrhJYKnos6+qZkRh8vu2qsvMUKCjNU2DKxq8gy6PSTtCy6p8sLwiuIKUcv SnAp4FlT4Y+C3Shrz2qFitdx17ie7pNiIb5x/em4YmAcFuCf7QgqPpxSHoXJn38RIMQK VbXdanAm7XYPxq+54xFJpm+Yf+ue9drjY29NuoQqmE6ndUtkL1DzkJclr7B4j+K7rOpg qWmw== X-Gm-Message-State: AOAM533CbaW+KORoE9zpFtjjf41nOqUQ3oI9ErtePf7XPbf5TVEfbQ+9 DnfHUbaRJITSpRXExr48Pe3VRDYjxqYoed0OnuYFTA== X-Received: by 2002:a05:6512:2037:: with SMTP id s23mr19213599lfs.358.1619011608927; Wed, 21 Apr 2021 06:26:48 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Shakeel Butt Date: Wed, 21 Apr 2021 06:26:37 -0700 Message-ID: Subject: Re: [RFC] memory reserve for userspace oom-killer To: Roman Gushchin Cc: Johannes Weiner , Michal Hocko , Linux MM , Andrew Morton , Cgroups , David Rientjes , LKML , Suren Baghdasaryan , Greg Thelen , Dragos Sbirlea , Priya Duraisamy Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 20, 2021 at 7:58 PM Roman Gushchin wrote: > [...] > > > > Michal has suggested ALLOC_OOM which is less risky. > > The problem is that even if you'll serve the oom daemon task with pages > from a reserve/custom pool, it doesn't guarantee anything, because the task > still can wait for a long time on some mutex, taken by another process, > throttled somewhere in the reclaim. I am assuming here by mutex you are referring to locks which oom-killer might have to take to read metrics or any possible lock which oom-killer might have to take which some other process can take too. Have you observed this situation happening with oomd on production? > You're basically trying to introduce a > "higher memory priority" and as always in such cases there will be priority > inversion problems. > > So I doubt that you can simple create a common mechanism which will work > flawlessly for all kinds of allocations, I anticipate many special cases > requiring an individual approach. > [...] > > First, I need to admit that I didn't follow the bpf development too close > for last couple of years, so my knowledge can be a bit outdated. > > But in general bpf is great when there is a fixed amount of data as input > (e.g. skb) and a fixed output (e.g. drop/pass the packet). There are different > maps which are handy to store some persistent data between calls. > > However traversing complex data structures is way more complicated. It's > especially tricky if the data structure is not of a fixed size: bpf programs > have to be deterministic, so there are significant constraints on loops. > > Just for example: it's easy to call a bpf program for each task in the system, > provide some stats/access to some fields of struct task and expect it to return > an oom score, which then the kernel will look at to select the victim. > Something like this can be done with cgroups too. > > Writing a kthread, which can sleep, poll some data all over the system and > decide what to do (what oomd/... does), will be really challenging. > And going back, it will not provide any guarantees unless we're not taking > any locks, which is already quite challenging. > Thanks for the info and I agree this direction needs much more thought and time to be materialized. thanks, Shakeel