Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp5965427ybh; Wed, 7 Aug 2019 14:40:59 -0700 (PDT) X-Google-Smtp-Source: APXvYqzC5acB4edTOn+bsEGhnertBJtPBR7EOicP7NKZqemoB0vFm5mmcViDW2T1/u7Rsc8YvJeG X-Received: by 2002:a62:5883:: with SMTP id m125mr11495889pfb.248.1565214059216; Wed, 07 Aug 2019 14:40:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565214059; cv=none; d=google.com; s=arc-20160816; b=ptEWOnV2yzw5N/8yJPUbjvZ7RII8rjU/50Zwe4vf9xvRsG44ZZj4QxHxqF+BsV2cHs +dF/gpNqPK+YaTKD8sl6wMF008eUkPiPwZMH2nJbjEazQCEnJQP64kVUHS460wo7fEXA pDuBbhrfO7VlLhxnetpsF2U3BnGJHqxE6ICmnuKnaBq43+n3EkNsYmhPJVp2WpgY/SpI Lewnl2lgW7R5LcWmtHAh78jLjtocCQKa58yKE6UMENX4DyKcNKaAkTVQshBmIKjIrBkR 4zthAVgO6zNlz6ginnDLWZqXatc6V9L/iQX5tzWaXF+MTrBCJezAbrsdZjHUXbcO9Be+ tOag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=L+M4+2pGsDRUyWSE4KWHyF4gGN/CTdrXfy8qftEaVU8=; b=s67Kngz9tiIx5roN0iK36mKkLTFXag++xS3h0eGpVmB+oUXU3kmfDv7cFX2Zm4jMB5 6BU8OQPRYZCovbd8DA6fjgwKnXDkcKYCRh1QJgxWAtFyE6N/UYBYh2C8xTvHo12CkydM 3KNtD2S+ZiARTNyIIge46VaplRLGEh4Zukex0IvHE6TRPFSNUuJ3P7T5CXrGFiV8Lan2 oGZGSSNk5YpjZH0VaK12oGOCoLDkmtl+KeI6NjmJq5AA/C/p/pcO/qxLGSdmXoR3yFWa fsuJcvZ3u/JOHVgiywgKzWLxsdzXwIJ0fWyQfnRDXr3cGPjZwfhF7N7RDGYdYPOPj5J/ x33A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=fgAle9Op; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l15si226265pjq.104.2019.08.07.14.40.43; Wed, 07 Aug 2019 14:40:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=fgAle9Op; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389347AbfHGVer (ORCPT + 99 others); Wed, 7 Aug 2019 17:34:47 -0400 Received: from mail-pf1-f193.google.com ([209.85.210.193]:43224 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728083AbfHGVeq (ORCPT ); Wed, 7 Aug 2019 17:34:46 -0400 Received: by mail-pf1-f193.google.com with SMTP id i189so42823380pfg.10 for ; Wed, 07 Aug 2019 14:34:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=L+M4+2pGsDRUyWSE4KWHyF4gGN/CTdrXfy8qftEaVU8=; b=fgAle9OpDpnJ7GKQv0WtHbnZoKzcB8hPE4HOnQKm0VzYK81k8Q7cif033smKCs6zXw UxFxvbTClbbOkt2DzoYqMPsWd/JtUR6WO/58/L6gGzELqNPyroeUg1AQD8SOr0c1Tt80 ondZOC7Xw7YyiyOFPPNOS67ELbBfwxbTaqzUZ/2NpGl9wW+yd5G/1UTq5bmKSivBOJNk Myiix8N6eWA5kgUgyOtBLebt4+a81tQ5C+Foveh+lL7Te+X1eQ5mi29r6dV5y5lm+CMQ inJiYzjr2BoXWtPGe/7WyAz1JuhnIy2buR4K9S+62kEZ8/xkaLWQome9OUbJHLbDCf5y 3Emw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=L+M4+2pGsDRUyWSE4KWHyF4gGN/CTdrXfy8qftEaVU8=; b=fZpByRyfy5IFyNGD8Cz9+2IDYwGrzMKPZM7R+SGbe/ntkDZC2SotKF4oOIqGIqbTt/ xJigfKSM3lleUwYaNqVdahH2pvzSBRIpf4Yzfp8Hn9vSE8Iggg/1HwBej89KCRND3EaX iPxBRU/gWxqMfIBTcC1EOZ2fSM9hpB9DVbQ9aO1/podUXTLBK9YXfKfzhYLhCkZtujHG R7XSjksbufcPoYWNgglP3hzJL31MXDbE5ngqsjjnQQjCC5l/4x16f+8Ne7YPPZS4T7nm WfvSfGF/XXIRDnBDD1LLAmTeXdVhRSAKqiiKMwjvfVRYjVGPbfaRPiwjVjy+p1oTsezw qcgQ== X-Gm-Message-State: APjAAAVXVwiqsQZIcNOdGgDeznbcWWj37mf/JfQMtmlk0VkTIDeP5rZA bXlYZunX1cvpHG8L9W5oMCneyQ== X-Received: by 2002:a63:1f1b:: with SMTP id f27mr9431471pgf.233.1565213686124; Wed, 07 Aug 2019 14:34:46 -0700 (PDT) Received: from localhost ([2620:10d:c091:500::1:f7c1]) by smtp.gmail.com with ESMTPSA id y14sm45924523pge.7.2019.08.07.14.34.45 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 07 Aug 2019 14:34:45 -0700 (PDT) Date: Wed, 7 Aug 2019 17:34:43 -0400 From: Johannes Weiner To: Andrew Morton Cc: Michal Hocko , Suren Baghdasaryan , Vlastimil Babka , "Artem S. Tashkinov" , LKML , linux-mm Subject: Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure Message-ID: <20190807213443.GA11227@cmpxchg.org> References: <20190805193148.GB4128@cmpxchg.org> <398f31f3-0353-da0c-fc54-643687bb4774@suse.cz> <20190806142728.GA12107@cmpxchg.org> <20190806143608.GE11812@dhcp22.suse.cz> <20190806220150.GA22516@cmpxchg.org> <20190807075927.GO11812@dhcp22.suse.cz> <20190807205138.GA24222@cmpxchg.org> <20190807140130.7418e783654a9c53e6b6cd1b@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190807140130.7418e783654a9c53e6b6cd1b@linux-foundation.org> User-Agent: Mutt/1.12.0 (2019-05-25) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 07, 2019 at 02:01:30PM -0700, Andrew Morton wrote: > On Wed, 7 Aug 2019 16:51:38 -0400 Johannes Weiner wrote: > > > However, eb414681d5a0 ("psi: pressure stall information for CPU, > > memory, and IO") introduced a memory pressure metric that quantifies > > the share of wallclock time in which userspace waits on reclaim, > > refaults, swapins. By using absolute time, it encodes all the above > > mentioned variables of hardware capacity and workload behavior. When > > memory pressure is 40%, it means that 40% of the time the workload is > > stalled on memory, period. This is the actual measure for the lack of > > forward progress that users can experience. It's also something they > > expect the kernel to manage and remedy if it becomes non-existent. > > > > To accomplish this, this patch implements a thrashing cutoff for the > > OOM killer. If the kernel determines a sustained high level of memory > > pressure, and thus a lack of forward progress in userspace, it will > > trigger the OOM killer to reduce memory contention. > > > > Per default, the OOM killer will engage after 15 seconds of at least > > 80% memory pressure. These values are tunable via sysctls > > vm.thrashing_oom_period and vm.thrashing_oom_level. > > Could be implemented in userspace? > We do in fact do this with oomd. But it requires a comprehensive cgroup setup, with complete memory and IO isolation, to protect that daemon from the memory pressure and excessive paging of the rest of the system (mlock doesn't really cut it because you need to potentially allocate quite a few proc dentries and inodes just to walk the process tree and determine a kill target). In a fleet that works fine, since we need to maintain that cgroup infra anyway. But for other users, that's a lot of stack for basic "don't hang forever if I allocate too much memory" functionality.