Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1317729imm; Wed, 25 Jul 2018 15:59:13 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdgOR1Vg6S2mjUuZ+vCOxCuaHG4n9+ydw12SIbsSPCHICkeh68rZTPeiOU1aqGjvmvsJ1EL X-Received: by 2002:a63:1c13:: with SMTP id c19-v6mr22450036pgc.332.1532559553595; Wed, 25 Jul 2018 15:59:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532559553; cv=none; d=google.com; s=arc-20160816; b=Mv8ypE8lUId/ZSecUaMGOLnXndhJd5Unueyjf7caFzkYkNylcukpDzwjZyI/i55xEE FUmlkW03DO1JnnLSjSISBp4AOfY7bOy3EleXEM/YOJv4CcbI9bHhb042G2rVRpb4EXIk E/6yWg+QKPH/UAkBrJv7VlYkr3GpuvpaJcHRPswEqfJBPbiKLu0zMcwrXzGE6Qr7ecSv jr5zouoNk2O7pi0ARfX0UcYisbUluZYbCP/mU9A6aZ/t17vU6UMqFV4v7GSe+I++/MZb 7BYBNd0GOw27fISk3flsIsUQ+2aElMc/0qEycJXST7qFv3SkgzPupQki2A3fHu/x1YkQ yWKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=4DFenC4IFmeOdLYVoPIkOsKOkd0Rud+Nhk9jcHit5Ow=; b=PhL9N+3/27DxC+gHMpcMkUYdHuVWr9ccpQMWlt8U+gw2YkKFiKY00fcYIALK5m/gkd 8KT1YScC3Z3T/akilM98Tn6DTcwI/TZbIZT/iscfsaO+ETQ1D0PWtziLJWe7sle3o35c uFEB8HhJNeDRRtblY4Y6RJYzxv9xO1emXKUHXrveaoBRrP+3lNO2uj/r1KqMemnA1aj2 Wj3HX30hlr6GC0ndACATRTg/jBiHo8bhcWqRJs1kiNkRDZDfsp7nfhn4EYn4QXvh0g+s CP+2srdKc/rRz1VBMlPA8QNXK5HHIEj5lQduDkMUIcD+eERm3gcLxWJmtLfXDJLafnIq 74Fw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@endlessm-com.20150623.gappssmtp.com header.s=20150623 header.b="QHB7+/Nd"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y40-v6si13642185pla.470.2018.07.25.15.58.58; Wed, 25 Jul 2018 15:59:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@endlessm-com.20150623.gappssmtp.com header.s=20150623 header.b="QHB7+/Nd"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731749AbeGZALT (ORCPT + 99 others); Wed, 25 Jul 2018 20:11:19 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:38612 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731702AbeGZALS (ORCPT ); Wed, 25 Jul 2018 20:11:18 -0400 Received: by mail-oi0-f68.google.com with SMTP id v8-v6so16867616oie.5 for ; Wed, 25 Jul 2018 15:57:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=endlessm-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=4DFenC4IFmeOdLYVoPIkOsKOkd0Rud+Nhk9jcHit5Ow=; b=QHB7+/Ndxu3QCw2GdN/va8neKfG5lCRDMW6v7xZbET6WvCJK3fJdJJGxqZYAFESPmy loTmu7jnHGiaJIicvwOqtxiP/TvhABO6NirqnyJw0K9yTgu5zwIp/SVxJH4DZ1m18Ria H3ivSNcsAjb0gy0oQz2hRkaPHw+NCFsvGUo+sX6M3ylZ58Ukn2g3iPvXoqzDZ7AaJCKI LCgLRSFtSpxlruwi/9kpMKx95mllddXdtqZQUiRIGxddEJi0O6i1bm6EMajUhhZfs3AT x2PYBvrXvvzS/gh2ej9DpBpw4DsqdnQjkO09RpxT2Shbrfurp6DPrRP30ibds148PL4m vVmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=4DFenC4IFmeOdLYVoPIkOsKOkd0Rud+Nhk9jcHit5Ow=; b=dhys2w2oL5iyHDyBFX1+4t9tILuPeKaS/W1BPPDBMUN0GLHmvGWS4E9ThpZYXHc/Ic fJJRkVYe0j21uopTOHi8w6SFsU2nco2BrO1/tetOrx8to+Xc3Cm6F5ge6DPywbQkIx+a sfda57wgRl/LBXiMRVO1ba73njQzLRDzwlFCZj1nJTj3j9S+Bkh8xqzQT6qqWGY0LMMk G8t0K/vlhf9nt/Dvh0VC2STCeJ1Rt9p0GVXkzMOKP9FbrzlufVGK+h+v7Pfn1vCu/KA6 yk+eucbDBqIHFuodngeHYpBXlCKxY5KMLWvfTMJx4/djADr/eejzDPdO5y7+fbBNC2JA XZ0Q== X-Gm-Message-State: AOUpUlEGglhZZz8OjVBSsCzbyoIPwRn/2mTr+mkrQ1l4nFinuTlgABeD MnFqKe0TF08k+l02/SukeZ50zh+DQcgd73jYZyfUuw== X-Received: by 2002:aca:b355:: with SMTP id c82-v6mr5779202oif.9.1532559446803; Wed, 25 Jul 2018 15:57:26 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:3e18:0:0:0:0:0 with HTTP; Wed, 25 Jul 2018 15:57:26 -0700 (PDT) In-Reply-To: <20180717122327.GG7193@dhcp22.suse.cz> References: <20180712172942.10094-1-hannes@cmpxchg.org> <20180716155745.10368-1-drake@endlessm.com> <20180717112515.GE7193@dhcp22.suse.cz> <20180717122327.GG7193@dhcp22.suse.cz> From: Daniel Drake Date: Wed, 25 Jul 2018 17:57:26 -0500 Message-ID: Subject: Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 To: Michal Hocko Cc: hannes@cmpxchg.org, Linux Kernel , linux-mm@kvack.org, cgroups@vger.kernel.org, Linux Upstreaming Team , linux-block@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Andrew Morton , Tejun Heo , Balbir Singh , Mike Galbraith , Oliver Yang , Shakeel Butt , xxx xxx , Taras Kondratiuk , Daniel Walker , Vinayak Menon , Ruslan Ruslichenko , kernel-team@fb.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 17, 2018 at 7:23 AM, Michal Hocko wrote: > On Tue 17-07-18 07:13:52, Daniel Drake wrote: >> On Tue, Jul 17, 2018 at 6:25 AM, Michal Hocko wrote: >> > Yes this is really unfortunate. One thing that could help would be to >> > consider a trashing level during the reclaim (get_scan_count) to simply >> > forget about LRUs which are constantly refaulting pages back. We already >> > have the infrastructure for that. We just need to plumb it in. >> >> Can you go into a bit more detail about that infrastructure and how we >> might detect which pages are being constantly refaulted? I'm >> interested in spending a few hours on this topic to see if I can come >> up with anything. > > mm/workingset.c allows for tracking when an actual page got evicted. > workingset_refault tells us whether a give filemap fault is a recent > refault and activates the page if that is the case. So what you need is > to note how many refaulted pages we have on the active LRU list. If that > is a large part of the list and if the inactive list is really small > then we know we are trashing. Thanks for the guidance. So this sounds like it is something that should be done on a timer (or on some other condition?), check the state of the active LRU list as described and if things are bad then invoke the OOM killer? I'm having trouble linking that idea to your original suggestion: > One thing that could help would be to consider a trashing level during the reclaim > (get_scan_count) to simply forget about LRUs which are constantly refaulting > pages back. which I interpret to mean that the for_each_evictable_lru loop in get_scan_count should skip over constantly-refaulty LRUs rather than add them to nr[] and lru_pages, which I assume would then cause direct reclaim to fail when we are thrashing, leading to OOM kill? Are these two different ideas, or am I just misunderstanding something basic? That confusion aside, studying the code to understand how I can determine if a page is being constantly refaulted or not, I see that the well documented condition for this (in workingset_refault) is: (refault - eviction) & EVICTION_MASK <= active_file refault and active_file are just values from the lruvec which seems easily accessible. However the eviction value is taken at the point of page eviction, and it is then stored in the shadow entries stored in the page cache for pages that have been evicted, but the shadow entry is then lost when the page is reactivated. The suggestion(s) seem to revolve around checking if currently-active pages are refaulting a lot, and I am still not clear on how to determine that, given that the shadow/eviction information was lost at the point when those active pages were refaulted. BTW feel free to drop this thread if you are busy, or delay your response to a convenient time. I'm new to this area and probably making silly mistakes, and not yet convinced that I'll be able to see it through. Daniel