Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp7246310ybh; Thu, 8 Aug 2019 12:27:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqxQXq6aoPvPrWWGyFrkBD42rB7gRJm751oC19WWnKovLppji+THPHBEiJqZ8IE2w2W3HHdG X-Received: by 2002:a17:902:7c90:: with SMTP id y16mr15570233pll.238.1565292447605; Thu, 08 Aug 2019 12:27:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565292447; cv=none; d=google.com; s=arc-20160816; b=cJOQPRrY5DcRe0Chgt1UjjfOJSbi1mTnMuAHFCJe0vOT3sv2ab4e1/yck9dfdXC8Uh JmEKPT07R7miMc+2r3Tj7MMYeRH1zbsXMLLytn77u+/mBh5+hGZf10R5/rO8/kk+aXoV C/0cWmZka7lweATBrdR/nXSynisGv6dK1WsK24tkFktoBMCpmJYpdcqs8Zv1J0jVZTsl zxEqfE30O4Cn4vPG9o+NRcm/les4BMxjXyoSYInpFziaJYKoAOE2QcX6zgbUZ4cuHqWo vyQxy+k9DO5Ce+eMrRMqLjtEg66l9EKsW+nFgAxM+c/Gj55njUn6+f7o/zS27H1Y8VpX HH2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:from:cc:to:subject :content-transfer-encoding:mime-version:references:in-reply-to :user-agent:date; bh=AJnqSGUJV9aQGqfRVDl0df2dwoSjebxtF3x0mkmsIXE=; b=bIrTZhmPkR0tQEQppNbT4uWM7QUrER+XL7TxYTE50Nu0UUMKgjQf6B7yZWMVMYNqAr 7Kmbt6yD1zM6musAY5cKn3biK/VK3MDw6qqVlQpozodaq//2+YI+qoYH6Z4szZ573Sw2 l77kJMh8lMTn2LCT/L+ASsHFkZL00aYfev75cpUscgNQA2hpBxX+AVbWIx43s8BwYgYy Z2zC3vLExRpWjK1TgfPyrducPPM3hnTEw9u4K/iSgsWzE2cQYJtBMU80BiZ+gIUM5tRv PQ9eIK/ycj4nc6jX+6FlmZS9r2HKpknr0r2PT9x5Y8VEtmqexSNbYyDEhpjmtVcpIX9O 7Y5g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y63si51368132pgd.403.2019.08.08.12.27.09; Thu, 08 Aug 2019 12:27:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732572AbfHHR5G convert rfc822-to-8bit (ORCPT + 99 others); Thu, 8 Aug 2019 13:57:06 -0400 Received: from 68.66.241.172.static.a2webhosting.com ([68.66.241.172]:50292 "EHLO vps.redhazel.co.uk" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1729780AbfHHR5G (ORCPT ); Thu, 8 Aug 2019 13:57:06 -0400 Received: from [100.121.56.177] (unknown [213.205.240.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by vps.redhazel.co.uk (Postfix) with ESMTPSA id AEFC31C02182; Thu, 8 Aug 2019 18:57:03 +0100 (BST) Date: Thu, 08 Aug 2019 18:57:02 +0100 User-Agent: K-9 Mail for Android In-Reply-To: <20190808163228.GE18351@dhcp22.suse.cz> References: <398f31f3-0353-da0c-fc54-643687bb4774@suse.cz> <20190806142728.GA12107@cmpxchg.org> <20190806143608.GE11812@dhcp22.suse.cz> <20190806220150.GA22516@cmpxchg.org> <20190807075927.GO11812@dhcp22.suse.cz> <20190807205138.GA24222@cmpxchg.org> <20190808114826.GC18351@dhcp22.suse.cz> <806F5696-A8D6-481D-A82F-49DEC1F2B035@redhazel.co.uk> <20190808163228.GE18351@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Subject: Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure To: Michal Hocko CC: Johannes Weiner , Suren Baghdasaryan , Vlastimil Babka , "Artem S. Tashkinov" , Andrew Morton , LKML , linux-mm From: ndrw.xf@redhazel.co.uk Message-ID: <5FBB0A26-0CFE-4B88-A4F2-6A42E3377EDB@redhazel.co.uk> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8 August 2019 17:32:28 BST, Michal Hocko wrote: > >> Would it be possible to reserve a fixed (configurable) amount of RAM >for caches, > >I am afraid there is nothing like that available and I would even argue >it doesn't make much sense either. What would you consider to be a >cache? A kernel/userspace reclaimable memory? What about any other in >kernel memory users? How would you setup such a limit and make it >reasonably maintainable over different kernel releases when the memory >footprint changes over time? Frankly, I don't know. The earlyoom userspace tool works well enough for me so I assumed this functionality could be implemented in kernel. Default thresholds would have to be tested but it is unlikely zero is the optimum value. >Besides that how does that differ from the existing reclaim mechanism? >Once your cache hits the limit, there would have to be some sort of the >reclaim to happen and then we are back to square one when the reclaim >is >making progress but you are effectively treshing over the hot working >set (e.g. code pages) By forcing OOM killer. Reclaiming memory when system becomes unresponsive is precisely what I want to avoid. >> and trigger OOM killer earlier, before most UI code is evicted from >memory? > >How does the kernel knows that important memory is evicted? I assume current memory management policy (LRU?) is sufficient to keep most frequently used pages in memory. >If you know which task is that then you can put it into a memory cgroup >with a stricter memory limit and have it killed before the overal >system >starts suffering. This is what I intended to use. But I don't know how to bypass SystemD or configure such policies via SystemD. >PSI is giving you a matric that tells you how much time you >spend on the memory reclaim. So you can start watching the system from >lower utilization already. This is a fantastic news. Really. I didn't know this is how it works. Two potential issues, though: 1. PSI (if possible) should be normalised wrt the memory reclaiming cost (SSDs have lower cost than HDDs). If not automatically then perhaps via a user configurable option. That's somewhat similar to having configurable PSI thresholds. 2. It seems PSI measures the _rate_ pages are evicted from memory. While this may correlate with the _absolute_ amount of of memory left, it is not the same. Perhaps weighting PSI with absolute amount of memory used for caches would improve this metric. Best regards, ndrw