Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp7387229ybh; Thu, 8 Aug 2019 15:02:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqxY+aAU9p5vlco4gqCtULBVtua+Myb5NnvhGwHsKTXpJDn9KqsR/pCQ03P5tOZfGZggbUWt X-Received: by 2002:a17:902:788e:: with SMTP id q14mr16017480pll.234.1565301723365; Thu, 08 Aug 2019 15:02:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565301723; cv=none; d=google.com; s=arc-20160816; b=zE5t2fS9bfwj+/1xP+GjgxkDsel5XdjFDCsK+wkp4pEL9y3duVXa3CD2QMIRrNiMtL 8aI802onpUyoeI2Kpb3v0PEnYMtYWPcHnOq1oNSge07rxqGMWC3koWsj+8wls7NiPDDs k3/ub5QADQo069ZHNvZAIvB5iBGkHtj8hQlMKWX3gqG2NuhDk8pMarNQ/VgQ+KGDUbMt V/H6HPCiYn3eZPcfuiN56XTv9e3QJ1/sAftTRF8qyPwlM3A0zedfurZM1GTceDSSoP3g pBOE2z1ZDEafbJHAGck2Pj7y8YuCYuOslQtjlZlDB3KPqE+0ZoXKBEaUN8QhLd3dWeUH avqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:subject:from; bh=l1IxYBfv33MNiS25HgXGEbDjrrNAu9O12K2pnMeMX4Y=; b=Tw3EOzOn8FrLs7Imes7Ss/FahKwO4/XYckd6XQnWi8xRuQDEAijVEM2h7uXaB560xw 1WS3WWL/ugF9frM5oBZxtBBqFq+UDjEpRILl9DWWivsFiOVEdxNQcknNgtCzLFVuvL38 aUF/p6dVt7PNFWsT/CEaFslOUdr/5lJ5noAYKnbrM9mJcC6njstmyRCFZyNOyBUzEExI NKsZwxm9hdrFL2kvQf+/UZ+qkqMRG5CWK6jevF00qRMi3jv6bXYaiYHIlWGCLyTGh35c +d/uxBJnbv1dcOGV5mJ+dptFLCgaZ6lYiRxc8KAVuz+Ca/kwU+M6aVgeNTWnYEGVX/Us CBAQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v23si50828259pgn.557.2019.08.08.15.01.44; Thu, 08 Aug 2019 15:02:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404477AbfHHV7f (ORCPT + 99 others); Thu, 8 Aug 2019 17:59:35 -0400 Received: from 68.66.241.172.static.a2webhosting.com ([68.66.241.172]:37936 "EHLO vps.redhazel.co.uk" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1728020AbfHHV7e (ORCPT ); Thu, 8 Aug 2019 17:59:34 -0400 Received: from [192.168.1.66] (unknown [212.159.68.143]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by vps.redhazel.co.uk (Postfix) with ESMTPSA id A2A331C0219A; Thu, 8 Aug 2019 22:59:32 +0100 (BST) From: ndrw Subject: Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure To: Michal Hocko Cc: Johannes Weiner , Suren Baghdasaryan , Vlastimil Babka , "Artem S. Tashkinov" , Andrew Morton , LKML , linux-mm References: <20190806142728.GA12107@cmpxchg.org> <20190806143608.GE11812@dhcp22.suse.cz> <20190806220150.GA22516@cmpxchg.org> <20190807075927.GO11812@dhcp22.suse.cz> <20190807205138.GA24222@cmpxchg.org> <20190808114826.GC18351@dhcp22.suse.cz> <806F5696-A8D6-481D-A82F-49DEC1F2B035@redhazel.co.uk> <20190808163228.GE18351@dhcp22.suse.cz> <5FBB0A26-0CFE-4B88-A4F2-6A42E3377EDB@redhazel.co.uk> <20190808185925.GH18351@dhcp22.suse.cz> Message-ID: <08e5d007-a41a-e322-5631-b89978b9cc20@redhazel.co.uk> Date: Thu, 8 Aug 2019 22:59:32 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190808185925.GH18351@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/08/2019 19:59, Michal Hocko wrote: > Well, I am afraid that implementing anything like that in the kernel > will lead to many regressions and bug reports. People tend to have very > different opinions on when it is suitable to kill a potentially > important part of a workload just because memory gets low. Are you proposing having a zero memory reserve or not having such option at all? I'm fine with the current default (zero reserve/margin). I strongly prefer forcing OOM killer when the system is still running normally. Not just for preventing stalls: in my limited testing I found the OOM killer on a stalled system rather inaccurate, occasionally killing system services etc. I had much better experience with earlyoom. > LRU aspect doesn't help much, really. If we are reclaiming the same set > of pages becuase they are needed for the workload to operate then we are > effectivelly treshing no matter what kind of replacement policy you are > going to use. In my case it would work fine (my system already works well with earlyoom, and without it it remains responsive until last couple hundred MB of RAM). >>> PSI is giving you a matric that tells you how much time you >>> spend on the memory reclaim. So you can start watching the system from >>> lower utilization already. I've tested it on a system with 45GB of RAM, SSD, swap disabled (my intention was to approximate a worst-case scenario) and it didn't really detect stall before it happened. I can see some activity after reaching ~42GB, the system remains fully responsive until it suddenly freezes and requires sysrq-f. PSI appears to increase a bit when the system is about to run out of memory but the change is so small it would be difficult to set a reliable threshold. I expect the PSI numbers to increase significantly after the stall (I wasn't able to capture them) but, as mentioned above, I was hoping for a solution that would work before the stall. $ while true; do sleep 1; cat /proc/pressure/memory ; done [starting a test script and waiting for several minutes to fill up memory] some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 some avg10=0.00 avg60=0.00 avg300=0.00 total=10389 full avg10=0.00 avg60=0.00 avg300=0.00 total=6442 some avg10=0.00 avg60=0.00 avg300=0.00 total=18950 full avg10=0.00 avg60=0.00 avg300=0.00 total=11576 some avg10=0.00 avg60=0.00 avg300=0.00 total=25655 full avg10=0.00 avg60=0.00 avg300=0.00 total=16159 some avg10=0.00 avg60=0.00 avg300=0.00 total=31438 full avg10=0.00 avg60=0.00 avg300=0.00 total=19552 some avg10=0.00 avg60=0.00 avg300=0.00 total=44549 full avg10=0.00 avg60=0.00 avg300=0.00 total=27772 some avg10=0.00 avg60=0.00 avg300=0.00 total=52520 full avg10=0.00 avg60=0.00 avg300=0.00 total=32580 some avg10=0.00 avg60=0.00 avg300=0.00 total=60451 full avg10=0.00 avg60=0.00 avg300=0.00 total=37704 some avg10=0.00 avg60=0.00 avg300=0.00 total=68986 full avg10=0.00 avg60=0.00 avg300=0.00 total=42859 some avg10=0.00 avg60=0.00 avg300=0.00 total=76598 full avg10=0.00 avg60=0.00 avg300=0.00 total=48370 some avg10=0.00 avg60=0.00 avg300=0.00 total=83080 full avg10=0.00 avg60=0.00 avg300=0.00 total=52930 some avg10=0.00 avg60=0.00 avg300=0.00 total=89384 full avg10=0.00 avg60=0.00 avg300=0.00 total=56350 some avg10=0.00 avg60=0.00 avg300=0.00 total=95293 full avg10=0.00 avg60=0.00 avg300=0.00 total=60260 some avg10=0.00 avg60=0.00 avg300=0.00 total=101566 full avg10=0.00 avg60=0.00 avg300=0.00 total=64408 some avg10=0.00 avg60=0.00 avg300=0.00 total=108131 full avg10=0.00 avg60=0.00 avg300=0.00 total=68412 some avg10=0.00 avg60=0.00 avg300=0.00 total=121932 full avg10=0.00 avg60=0.00 avg300=0.00 total=77413 some avg10=0.00 avg60=0.00 avg300=0.00 total=140807 full avg10=0.00 avg60=0.00 avg300=0.00 total=91269 some avg10=0.00 avg60=0.00 avg300=0.00 total=170494 full avg10=0.00 avg60=0.00 avg300=0.00 total=110611 [stall, sysrq-f] Best regards, ndrw