Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1155261ybl; Wed, 14 Aug 2019 11:38:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqxkpgSuZhRngh0gKmT/9S5nJVyUGT35wPKXpClKj/5qleLIQTqeQMJnRRVGTVx7DIUE5O+h X-Received: by 2002:a62:5250:: with SMTP id g77mr1404392pfb.158.1565807881649; Wed, 14 Aug 2019 11:38:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565807881; cv=none; d=google.com; s=arc-20160816; b=ApckuECa840IXVF+IEd26+Dxr1Rqw1VuAtgTcvIFecUIWjrIUQX8rSJnmoDae8Xbjx +iHR/fTpY3weLh5YpAAN6nZrTmGZxM6OEqq1YZjgBnZ9zMOm0GKWksbe5TXMdci9n7kO L9FRsX+6SbRAw7bABV+O1l55Wbzq9Of8EdixCiYFuDhQt53gyaHtoI4jlRA5q3IXkIvg W94O065YA+X2jGZNoRIHy8jn3lq2GHX3ed2pTN5rqh/U5tOUkfCXatp26EN8SMwb4tF7 5pWNFvftYIKo4GzIZGCkaeZqYqerFlYokcqd44Vpt00rTd86+E+VB9ESPbymsQMqRnVj 8zTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=QKyZWBTPjsTVnJCDI8gdesxNMRjHRPqx0h2UG5D6w0Y=; b=rmqm0PTDkQ5B4v+k3av0eNtZ94Mk1xUm2AfFOMoLhJtrhj6faJIWziop/QYUn8+H1Q NeOMZEUCKfc9T0jtt9AM33W4Hx75//ui2ZcjreyDo1FSoGI5sYSehpOg5THD01os81iu VTIPwnmkwxlSoVQaBZCp8RQvaWpiHzYQ4J8g9OXmbGa2jK1lSpbEGX1yFZWAnj8WzFk1 1QPSnytHf4VFiVAbhP//kKSxtp8nZ412wfJnBhovgNm85y8ozx3V2Rzy23FDi9kH0FCJ dFyNFREh7hgj3HWgCnsyMlXjHLn3nVcFheISyOk7rVMCnE9ZrstQeZtQOldlBii73qZc f9SA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u141si233939pfc.95.2019.08.14.11.37.45; Wed, 14 Aug 2019 11:38:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728825AbfHNShC (ORCPT + 99 others); Wed, 14 Aug 2019 14:37:02 -0400 Received: from mx2.suse.de ([195.135.220.15]:51548 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728389AbfHNShB (ORCPT ); Wed, 14 Aug 2019 14:37:01 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D8BC2AC6E; Wed, 14 Aug 2019 18:36:58 +0000 (UTC) Date: Wed, 14 Aug 2019 20:36:57 +0200 From: Michal Hocko To: Joel Fernandes Cc: khlebnikov@yandex-team.ru, linux-kernel@vger.kernel.org, Minchan Kim , Alexey Dobriyan , Andrew Morton , Borislav Petkov , Brendan Gregg , Catalin Marinas , Christian Hansen , dancol@google.com, fmayer@google.com, "H. Peter Anvin" , Ingo Molnar , Jonathan Corbet , Kees Cook , kernel-team@android.com, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Mike Rapoport , namhyung@google.com, paulmck@linux.ibm.com, Robin Murphy , Roman Gushchin , Stephen Rothwell , surenb@google.com, Thomas Gleixner , tkjos@google.com, Vladimir Davydov , Vlastimil Babka , Will Deacon Subject: Re: [PATCH v5 2/6] mm/page_idle: Add support for handling swapped PG_Idle pages Message-ID: <20190814183657.GK17933@dhcp22.suse.cz> References: <20190807171559.182301-1-joel@joelfernandes.org> <20190807171559.182301-2-joel@joelfernandes.org> <20190813150450.GN17933@dhcp22.suse.cz> <20190813153659.GD14622@google.com> <20190814080531.GP17933@dhcp22.suse.cz> <20190814163203.GB59398@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190814163203.GB59398@google.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 14-08-19 12:32:03, Joel Fernandes wrote: > On Wed, Aug 14, 2019 at 10:05:31AM +0200, Michal Hocko wrote: > > On Tue 13-08-19 11:36:59, Joel Fernandes wrote: > > > On Tue, Aug 13, 2019 at 05:04:50PM +0200, Michal Hocko wrote: > > > > On Wed 07-08-19 13:15:55, Joel Fernandes (Google) wrote: > > > > > Idle page tracking currently does not work well in the following > > > > > scenario: > > > > > 1. mark page-A idle which was present at that time. > > > > > 2. run workload > > > > > 3. page-A is not touched by workload > > > > > 4. *sudden* memory pressure happen so finally page A is finally swapped out > > > > > 5. now see the page A - it appears as if it was accessed (pte unmapped > > > > > so idle bit not set in output) - but it's incorrect. > > > > > > > > > > To fix this, we store the idle information into a new idle bit of the > > > > > swap PTE during swapping of anonymous pages. > > > > > > > > > > Also in the future, madvise extensions will allow a system process > > > > > manager (like Android's ActivityManager) to swap pages out of a process > > > > > that it knows will be cold. To an external process like a heap profiler > > > > > that is doing idle tracking on another process, this procedure will > > > > > interfere with the idle page tracking similar to the above steps. > > > > > > > > This could be solved by checking the !present/swapped out pages > > > > right? Whoever decided to put the page out to the swap just made it > > > > idle effectively. So the monitor can make some educated guess for > > > > tracking. If that is fundamentally not possible then please describe > > > > why. > > > > > > But the monitoring process (profiler) does not have control over the 'whoever > > > made it effectively idle' process. > > > > Why does that matter? Whether it is a global/memcg reclaim or somebody > > calling MADV_PAGEOUT or whatever it is a decision to make the page not > > hot. Sure you could argue that a missing idle bit on swap entries might > > mean that the swap out decision was pre-mature/sub-optimal/wrong but is > > this the aim of the interface? > > > > > As you said it will be a guess, it will not be accurate. > > > > Yes and the point I am trying to make is that having some space and not > > giving a guarantee sounds like a safer option for this interface because > > I do see your point of view, but jJust because a future (and possibly not > going to happen) usecase which you mentioned as pte reclaim, makes you feel > that userspace may be subject to inaccuracies anyway, doesn't mean we should > make everything inaccurate.. We already know idle page tracking is not > completely accurate. But that doesn't mean we miss out on the opportunity to > make the "non pte-reclaim" usecase inaccurate as well. Just keep in mind that you will add more burden to future features because they would have to somehow overcome this user visible behavior and we will get to the usual question - Is this going to break something that relies on the idle bit being stable? > IMO, we should do our best for today, and not hypothesize. How likely is pte > reclaim and is there a thread to describe that direction? Not that I am aware of now but with large NVDIMM mapped files I can see that this will get more and more interesting. -- Michal Hocko SUSE Labs