Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp494350ybl; Wed, 14 Aug 2019 01:06:46 -0700 (PDT) X-Google-Smtp-Source: APXvYqwix6bCpfxVmrIQf9KDruXPmKuQ1WOPLoyNyjxXObw67tXUOzPfCpo3/6L4LG26JFni1OuC X-Received: by 2002:a63:db45:: with SMTP id x5mr20671653pgi.293.1565770006511; Wed, 14 Aug 2019 01:06:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565770006; cv=none; d=google.com; s=arc-20160816; b=vmzh98Qorfsp1Nc17VmQPXcFNEIgswAYT2GlzGKdaZhiEUbdbbcAgQMJX+LhxQrUBt LrCcgFWiBy002Q3Kwod4LnnZfiuqlhYH9xMpVFKcARTuWjIBT8YHqrMgwnYDFg0qZNLl XSQXaFa7Sl/fwZMoJA9A7q/tCg1CP0CbjA9eZ+0yuJAbAoU8n1rhCYpu7fINrT9oLpwa fl3QsIt0euGj0MypT7JRYhwfKauC/tJxrNb8Vhr5pjl1r+2R9XZrsJEc70NoMocLj/Md 40clINFrZ7nPZfeKQjFkmRefgqDuzk3FMt97fbOgREndsUWHhwNCgsiarsyAMyT9We9W AL1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=jafbXha1wVBbjGm5WobNnZw3OmFtttcyxl8V0/ejqQE=; b=z6BN+44WhSA4w8BMjWGps0FUSWtYSwB6Kf/W/RfwPct20y1UyJJNYeUrZP4e2/WcWO ikBs/hSeaF3Q/NUeH9nx8BmZqO1jzdRhYdciGt0WsVfQ9j69nnPxxUiB5DlVz04N1bk9 jWP1sVM8xPayED1tOfRVyt1WE9U/M0rMWmLMktE1etJ6Vywi2cv/Qnz4vrEfNV7xOlYy YSz0+OlSyohR6VaqiyB90otgfQrcpu0nnZcRdaQ0GumlESLJ690InJrwYV5odbemuTbc ZoiP+C21uDzI/lna6EbIT2knDhdsWsm5qFC8jp0ReigpspcX3WUX7uad9MH4eAf0IWGT ecWQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 185si8616700pfv.39.2019.08.14.01.06.30; Wed, 14 Aug 2019 01:06:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727201AbfHNIFf (ORCPT + 99 others); Wed, 14 Aug 2019 04:05:35 -0400 Received: from mx2.suse.de ([195.135.220.15]:36094 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725265AbfHNIFf (ORCPT ); Wed, 14 Aug 2019 04:05:35 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 818D6AFB0; Wed, 14 Aug 2019 08:05:32 +0000 (UTC) Date: Wed, 14 Aug 2019 10:05:31 +0200 From: Michal Hocko To: Joel Fernandes Cc: khlebnikov@yandex-team.ru, linux-kernel@vger.kernel.org, Minchan Kim , Alexey Dobriyan , Andrew Morton , Borislav Petkov , Brendan Gregg , Catalin Marinas , Christian Hansen , dancol@google.com, fmayer@google.com, "H. Peter Anvin" , Ingo Molnar , Jonathan Corbet , Kees Cook , kernel-team@android.com, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Mike Rapoport , namhyung@google.com, paulmck@linux.ibm.com, Robin Murphy , Roman Gushchin , Stephen Rothwell , surenb@google.com, Thomas Gleixner , tkjos@google.com, Vladimir Davydov , Vlastimil Babka , Will Deacon Subject: Re: [PATCH v5 2/6] mm/page_idle: Add support for handling swapped PG_Idle pages Message-ID: <20190814080531.GP17933@dhcp22.suse.cz> References: <20190807171559.182301-1-joel@joelfernandes.org> <20190807171559.182301-2-joel@joelfernandes.org> <20190813150450.GN17933@dhcp22.suse.cz> <20190813153659.GD14622@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190813153659.GD14622@google.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 13-08-19 11:36:59, Joel Fernandes wrote: > On Tue, Aug 13, 2019 at 05:04:50PM +0200, Michal Hocko wrote: > > On Wed 07-08-19 13:15:55, Joel Fernandes (Google) wrote: > > > Idle page tracking currently does not work well in the following > > > scenario: > > > 1. mark page-A idle which was present at that time. > > > 2. run workload > > > 3. page-A is not touched by workload > > > 4. *sudden* memory pressure happen so finally page A is finally swapped out > > > 5. now see the page A - it appears as if it was accessed (pte unmapped > > > so idle bit not set in output) - but it's incorrect. > > > > > > To fix this, we store the idle information into a new idle bit of the > > > swap PTE during swapping of anonymous pages. > > > > > > Also in the future, madvise extensions will allow a system process > > > manager (like Android's ActivityManager) to swap pages out of a process > > > that it knows will be cold. To an external process like a heap profiler > > > that is doing idle tracking on another process, this procedure will > > > interfere with the idle page tracking similar to the above steps. > > > > This could be solved by checking the !present/swapped out pages > > right? Whoever decided to put the page out to the swap just made it > > idle effectively. So the monitor can make some educated guess for > > tracking. If that is fundamentally not possible then please describe > > why. > > But the monitoring process (profiler) does not have control over the 'whoever > made it effectively idle' process. Why does that matter? Whether it is a global/memcg reclaim or somebody calling MADV_PAGEOUT or whatever it is a decision to make the page not hot. Sure you could argue that a missing idle bit on swap entries might mean that the swap out decision was pre-mature/sub-optimal/wrong but is this the aim of the interface? > As you said it will be a guess, it will not be accurate. Yes and the point I am trying to make is that having some space and not giving a guarantee sounds like a safer option for this interface because ... > > I am curious what is your concern with using a bit in the swap PTE? ... It is a promiss of the semantic I find limiting for future. The bit in the pte might turn out insufficient (e.g. pte reclaim) so teaching the userspace to consider this a hard guarantee is a ticket to problems later on. Maybe I am overly paranoid because I have seen so many "nice to have" features turning into a maintenance burden in the past. If this is really considered mostly debugging purpouse interface then a certain level of imprecision should be tolerateable. If there is a really strong real world usecase that simply has no other way to go then this might be added later. Adding an information is always safer than take it away. That being said, if I am a minority voice here then I will not really stand in the way and won't nack the patch. I will not ack it neither though. -- Michal Hocko SUSE Labs