Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp3386090ybf; Tue, 3 Mar 2020 05:08:56 -0800 (PST) X-Google-Smtp-Source: ADFU+vsC9OSW/sZJCfgQNuSG+C8/h2RnQlPmUfNdtN2xA3YCojhh5hJWEBkzTpfBIy4VOLX1H3X8 X-Received: by 2002:a9d:69ce:: with SMTP id v14mr3375306oto.248.1583240936653; Tue, 03 Mar 2020 05:08:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583240936; cv=none; d=google.com; s=arc-20160816; b=eweZTgwdf0mnS1YFc+Ij8aBvNBLY/VCXG+yHlro8HD1XLo2TseDRszBnTt26xnaI38 TqpurkMwZPtbw/E4pqRbZQimXjwmssXtWm3ioLWUjTc0bep7D7q4U3ks/QOrScg/UZfA LYLsTTV1nllypkuW5fVHHRogrTMDfUko7GbzC2I2xB3g7Eewnq3U9Nb8krzD/SQnrjP+ VVuWFF9kyfyHs6DColk4VVYeQIohzf6pD36k7G+ac3mORnSELrD+Vr2VBuK+/aFAzx47 WnlNXO2IIGy0UrNmhCkmNfCb7HXGn2/ipA8JW18O1pFvs3ajfSNWkIw5UWBlQs4QBErj prXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Mw6MBBbYwyuBzsvNUa8tIuCP6aALnrswP+JGFi3BslU=; b=rOLdbB2OhqKMRCSyMavkm7kZlnXhJQ8uQEQLeOApxOMWAovjM8Rh56UkNHbv1Z/nIY sEm8/FiFAuyPS0OysELRnLYn9kJXQySYF6+YcCwVTMSZ0kSqL3E7YJT6G3B3MxmaKmZK R9iZSc5tltWt0doidSN7fgHMLfkjQXoWoENawHeBrDRVfN9JKppQc3X+K7sW36EP4Ct2 mpucS+LfUXG2Cye7ZfehSse9vSwmWnHhHC62tov05xfjZyuBqnqCbrumysavAbJ48nVN N+di8Oh65voyamuTH1j9QuI8YW16WCV428kA+vD5ivQlBCa/jZ4K3AeaGE5Wfg0j6UwQ 7Q/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m11si8008676oim.223.2020.03.03.05.08.38; Tue, 03 Mar 2020 05:08:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729071AbgCCNCu (ORCPT + 99 others); Tue, 3 Mar 2020 08:02:50 -0500 Received: from mx2.suse.de ([195.135.220.15]:58640 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728496AbgCCNCt (ORCPT ); Tue, 3 Mar 2020 08:02:49 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 2F724AF68; Tue, 3 Mar 2020 13:02:47 +0000 (UTC) Date: Tue, 3 Mar 2020 13:02:41 +0000 From: Mel Gorman To: "Huang, Ying" Cc: David Hildenbrand , Michal Hocko , Johannes Weiner , Matthew Wilcox , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Zi Yan , Peter Zijlstra , Dave Hansen , Minchan Kim , Hugh Dickins , Alexander Duyck Subject: Re: [RFC 0/3] mm: Discard lazily freed pages when migrating Message-ID: <20200303130241.GE3772@suse.de> References: <20200228033819.3857058-1-ying.huang@intel.com> <20200228034248.GE29971@bombadil.infradead.org> <87a7538977.fsf@yhuang-dev.intel.com> <871rqf850z.fsf@yhuang-dev.intel.com> <20200228094954.GB3772@suse.de> <87h7z76lwf.fsf@yhuang-dev.intel.com> <20200302151607.GC3772@suse.de> <87zhcy5hoj.fsf@yhuang-dev.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <87zhcy5hoj.fsf@yhuang-dev.intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 03, 2020 at 09:51:56AM +0800, Huang, Ying wrote: > Mel Gorman writes: > > On Mon, Mar 02, 2020 at 07:23:12PM +0800, Huang, Ying wrote: > >> If some applications cannot tolerate the latency incurred by the memory > >> allocation and zeroing. Then we cannot discard instead of migrate > >> always. While in some situations, less memory pressure can help. So > >> it's better to let the administrator and the application choose the > >> right behavior in the specific situation? > >> > > > > Is there an application you have in mind that benefits from discarding > > MADV_FREE pages instead of migrating them? > > > > Allowing the administrator or application to tune this would be very > > problematic. An application would require an update to the system call > > to take advantage of it and then detect if the running kernel supports > > it. An administrator would have to detect that MADV_FREE pages are being > > prematurely discarded leading to a slowdown and that is hard to detect. > > It could be inferred from monitoring compaction stats and checking > > if compaction activity is correlated with higher minor faults in the > > target application. Proving the correlation would require using the perf > > software event PERF_COUNT_SW_PAGE_FAULTS_MIN and matching the addresses > > to MADV_FREE regions that were freed prematurely. That is not an obvious > > debugging step to take when an application detects latency spikes. > > > > Now, you could add a counter specifically for MADV_FREE pages freed for > > reasons other than memory pressure and hope the administrator knows about > > the counter and what it means. That type of knowledge could take a long > > time to spread so it's really very important that there is evidence of > > an application that suffers due to the current MADV_FREE and migration > > behaviour. > > OK. I understand that this patchset isn't a universal win, so we need > some way to justify it. I will try to find some application for that. > > Another thought, as proposed by David Hildenbrand, it's may be a > universal win to discard clean MADV_FREE pages when migrating if there are > already memory pressure on the target node. For example, if the free > memory on the target node is lower than high watermark? > That is an extremely specific corner case that is not likely to occur. NUMA balancing is not going to migrate a MADV_FREE page under these circumstances as a write cancels MADV_FREE is read attempt will probably fail to allocate a destination page in alloc_misplaced_dst_page so the data gets lost instead of remaining remote. sys_movepages is a possibility but the circumstances of an application delibertly trying to migrate to a loaded node is low. Compaction never migrates cross-node so the state of a remote node under pressure do not matter. Once again, there needs to be a reasonable use case to be able to meaningfully balance between the benefits and risks of changing the MADV_FREE semantics. -- Mel Gorman SUSE Labs