Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp7218073ybh; Thu, 8 Aug 2019 12:00:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqzr1P4fBwmZz7+TcFXxU2PUDij3dsxVDm3xbf8Rfqy6SFaLfy8H83fBDrlucRMsC6R4xtez X-Received: by 2002:a62:18d4:: with SMTP id 203mr17227211pfy.165.1565290857017; Thu, 08 Aug 2019 12:00:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565290857; cv=none; d=google.com; s=arc-20160816; b=0qQVLAQEpGS+9qLzXwL+ZZghs9hZVCIOrqdAI1SEieNmd24V59elXHTJEV/yIraJiJ vbChDHHldr8AOUGhIx1DhFtZWsIDyKnAoyUd6DYDSDHThH4i0pC8uBmWFOIlveRWDToO Zw0xeLZl+xmtFgKuW45fnFKMbU9lmVPvtxOpyDSEL21aj3R7MpxUD1G3Z/o/4LeyqlgH A1M02qJAJJLEj9x8mmTXdKDwvIsP7VafhpFDScINA2JDuiaSY621Shb/QkxhHqNia2Q0 +vkLKusPG7+yW88RKkAexFQXO/7r5e7OlXLTUfuyjVq/EK+dm50v8QU0fVcJ2ezmE6+N W4aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=jC9SZ4A1hXUew9ADzlJ2MF15Eh4DqbwXOb8aB2HOTkI=; b=hYjVtFEiLZqalDVPDF74a95+NNregcPY609j+KfTVqhcHrX9O9J2KMoOCqgDd9ouk2 3B9yguD6L8aYY2h21IHSXQT2e4Fs5AGUhUragBFt8Ww1tsx48KdIGeu+ZRN8o5DFU6S3 JK0Qh0ah2VevnBIqFFAPjMZMiRyUNLLKULjImtw15BnaMhAxkOMUv7pTxgqCWKh0iMWn Rk3alI9e6GouKt8g8Z7bu9jM1aqqUc+X/XZOOXGCoWdPLIIfoZ4PzScy9eCrZ2RrVAJz 4Zc0SSXWYl/RvgmmrN99k7Hkg7ZJslorQ4jO+7jv86tajJ/sPbkkPZ/iB1Y2B4OaHICH jgEw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v16si2456443pjb.91.2019.08.08.12.00.40; Thu, 08 Aug 2019 12:00:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390272AbfHHS72 (ORCPT + 99 others); Thu, 8 Aug 2019 14:59:28 -0400 Received: from mx2.suse.de ([195.135.220.15]:46766 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390218AbfHHS72 (ORCPT ); Thu, 8 Aug 2019 14:59:28 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 325BEAFD4; Thu, 8 Aug 2019 18:59:26 +0000 (UTC) Date: Thu, 8 Aug 2019 20:59:25 +0200 From: Michal Hocko To: ndrw.xf@redhazel.co.uk Cc: Johannes Weiner , Suren Baghdasaryan , Vlastimil Babka , "Artem S. Tashkinov" , Andrew Morton , LKML , linux-mm Subject: Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure Message-ID: <20190808185925.GH18351@dhcp22.suse.cz> References: <20190806142728.GA12107@cmpxchg.org> <20190806143608.GE11812@dhcp22.suse.cz> <20190806220150.GA22516@cmpxchg.org> <20190807075927.GO11812@dhcp22.suse.cz> <20190807205138.GA24222@cmpxchg.org> <20190808114826.GC18351@dhcp22.suse.cz> <806F5696-A8D6-481D-A82F-49DEC1F2B035@redhazel.co.uk> <20190808163228.GE18351@dhcp22.suse.cz> <5FBB0A26-0CFE-4B88-A4F2-6A42E3377EDB@redhazel.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5FBB0A26-0CFE-4B88-A4F2-6A42E3377EDB@redhazel.co.uk> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 08-08-19 18:57:02, ndrw.xf@redhazel.co.uk wrote: > > > On 8 August 2019 17:32:28 BST, Michal Hocko wrote: > > > >> Would it be possible to reserve a fixed (configurable) amount of RAM > >for caches, > > > >I am afraid there is nothing like that available and I would even argue > >it doesn't make much sense either. What would you consider to be a > >cache? A kernel/userspace reclaimable memory? What about any other in > >kernel memory users? How would you setup such a limit and make it > >reasonably maintainable over different kernel releases when the memory > >footprint changes over time? > > Frankly, I don't know. The earlyoom userspace tool works well enough > for me so I assumed this functionality could be implemented in > kernel. Default thresholds would have to be tested but it is unlikely > zero is the optimum value. Well, I am afraid that implementing anything like that in the kernel will lead to many regressions and bug reports. People tend to have very different opinions on when it is suitable to kill a potentially important part of a workload just because memory gets low. > >Besides that how does that differ from the existing reclaim mechanism? > >Once your cache hits the limit, there would have to be some sort of the > >reclaim to happen and then we are back to square one when the reclaim > >is > >making progress but you are effectively treshing over the hot working > >set (e.g. code pages) > > By forcing OOM killer. Reclaiming memory when system becomes unresponsive is precisely what I want to avoid. > > >> and trigger OOM killer earlier, before most UI code is evicted from > >memory? > > > >How does the kernel knows that important memory is evicted? > > I assume current memory management policy (LRU?) is sufficient to keep most frequently used pages in memory. LRU aspect doesn't help much, really. If we are reclaiming the same set of pages becuase they are needed for the workload to operate then we are effectivelly treshing no matter what kind of replacement policy you are going to use. [...] > >PSI is giving you a matric that tells you how much time you > >spend on the memory reclaim. So you can start watching the system from > >lower utilization already. > > This is a fantastic news. Really. I didn't know this is how it > works. Two potential issues, though: > 1. PSI (if possible) should be normalised wrt the memory reclaiming > cost (SSDs have lower cost than HDDs). If not automatically then > perhaps via a user configurable option. That's somewhat similar to > having configurable PSI thresholds. The cost of the reclaim is inherently reflected in those numbers already because it gives you the amount of time that is spent getting a memory for you. If you are under a memory pressure then the memory reclaim is a part of the allocation path. > 2. It seems PSI measures the _rate_ pages are evicted from > memory. While this may correlate with the _absolute_ amount of of > memory left, it is not the same. Perhaps weighting PSI with absolute > amount of memory used for caches would improve this metric. Please refer to Documentation/accounting/psi.rst for more information about how PSI works. -- Michal Hocko SUSE Labs