Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp487791ybf; Fri, 28 Feb 2020 01:51:31 -0800 (PST) X-Google-Smtp-Source: APXvYqzcbRH5D0qre2WYkVvMvhtzk/B8OHMmnFuvRiJQCZZpMCsFHcmbQPKifRqNdfEJrZEngLh5 X-Received: by 2002:aca:4e42:: with SMTP id c63mr2574181oib.26.1582883491637; Fri, 28 Feb 2020 01:51:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582883491; cv=none; d=google.com; s=arc-20160816; b=rCvLaMiti23AEGPjdNK2vHL21iB3FNbAqxtLu0BqQT4C5hiC9BepKYwV+Gu5s1c9ge e1y3aB0JPFTo6wiKaxSs4rr/pX00kvriL/lLb7bR8QHt0AaJPwoC3OZP53Ry8Su/GF3T l4xgQJXag2kHgyPCEaWId+6BJU47AlrnrrE0SF3OGNmkwJq1Ps6pdp6kdJ//xSZdujo8 6ZSx8J1dnfx6b2Lf8M03A8Kg29UEe/bDK/flRxeT/EIj5VsMCig81+wFe0c73WBpwLdU AK6reGn0vQG2IWs/wWkY4Ze1lSjfCjbyk+bmsdvihjs0tIgdXyHTNLMKc6H+r+UfX8vH W/7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=JuWbxYpwiWaTODBjAB7HmLfrXVb5I46eoTle9ZgxHYU=; b=RMGmz714ZkuhlSG1cNLgfG7B2QUOBngcQFVEb5gfiGmUy96ZvcaxSSd12EqP69Uuxz hXa/ziHFltW7Bfa9b0BI6lM0q9jkG5gh3yKUvkn9jNPp7Up/h/xKtOi43ApkAI2dG3NL kOZX2fu1LABLCBHIUS2b/ndXoET/svBZEPqB2OVijB8HISuJMvk/DPUyLbQZvgRU3z1n Ry0i4fkfMzYc7bOujhoT2avJYNRWQ25HG5uRUXPntlN+gnHA7qCayPVyT9tNU/SsXRYT eS9UpfXysGjnfKIukiu+FU3Ov6uJbQD9ipSoMfD6Z56sQBH/sBjqkXMQkHJoSBdsCIz2 WC2Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q5si1181535otc.104.2020.02.28.01.51.20; Fri, 28 Feb 2020 01:51:31 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726673AbgB1Juw (ORCPT + 99 others); Fri, 28 Feb 2020 04:50:52 -0500 Received: from mail-wr1-f67.google.com ([209.85.221.67]:41371 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726413AbgB1Juw (ORCPT ); Fri, 28 Feb 2020 04:50:52 -0500 Received: by mail-wr1-f67.google.com with SMTP id v4so2191820wrs.8 for ; Fri, 28 Feb 2020 01:50:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=JuWbxYpwiWaTODBjAB7HmLfrXVb5I46eoTle9ZgxHYU=; b=PjfxvfY1R/bL7jhYVATEvjfvB77T/nYX57HvT15yAwMaL/22djlUwO38kvqJgA+3zv YgujuKR17dalgC4cq/zYLbt0Jr2pp7zmgw5u5zksRGmDvOQ6N6vqhhzJuFQgGG0B4Zd7 HaY+rLTg5m/qpHzxsqs55cIDCdlUbotatzDdthT3icFMVCyYpHifcF/u3vPYp2Tzr7yV VTgTJlWRp8C9+snJ93YB937SA/DyldVyzBB05mKiF7RFQBgdGcmkywLTOrgw9EzPrE/J MKFAyFcPMvrvHmEPfejPIgGmKz0wOlKwNzxW5946YzCeaSsfczde9uf7pdNX3+QYEayk Qixg== X-Gm-Message-State: APjAAAU5IxUuoPtz4tuMSauq6ExTt6PvzTpLW44lPpn9vIfEWk2RZk2C wRLnmlgnbogcf8cXVlca7lA= X-Received: by 2002:a5d:6406:: with SMTP id z6mr4042600wru.294.1582883450512; Fri, 28 Feb 2020 01:50:50 -0800 (PST) Received: from localhost (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id d4sm1332086wmb.48.2020.02.28.01.50.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Feb 2020 01:50:49 -0800 (PST) Date: Fri, 28 Feb 2020 10:50:48 +0100 From: Michal Hocko To: "Huang, Ying" Cc: David Hildenbrand , Matthew Wilcox , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mel Gorman , Vlastimil Babka , Zi Yan , Peter Zijlstra , Dave Hansen , Minchan Kim , Johannes Weiner , Hugh Dickins , Alexander Duyck Subject: Re: [RFC 0/3] mm: Discard lazily freed pages when migrating Message-ID: <20200228095048.GK3771@dhcp22.suse.cz> References: <20200228033819.3857058-1-ying.huang@intel.com> <20200228034248.GE29971@bombadil.infradead.org> <87a7538977.fsf@yhuang-dev.intel.com> <871rqf850z.fsf@yhuang-dev.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <871rqf850z.fsf@yhuang-dev.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 28-02-20 16:55:40, Huang, Ying wrote: > David Hildenbrand writes: [...] > > E.g., free page reporting in QEMU wants to use MADV_FREE. The guest will > > report currently free pages to the hypervisor, which will MADV_FREE the > > reported memory. As long as there is no memory pressure, there is no > > need to actually free the pages. Once the guest reuses such a page, it > > could happen that there is still the old page and pulling in in a fresh > > (zeroed) page can be avoided. > > > > AFAIKs, after your change, we would get more pages discarded from our > > guest, resulting in more fresh (zeroed) pages having to be pulled in > > when a guest touches a reported free page again. But OTOH, page > > migration is speed up (avoiding to migrate these pages). > > Let's look at this problem in another perspective. To migrate the > MADV_FREE pages of the QEMU process from the node A to the node B, we > need to free the original pages in the node A, and (maybe) allocate the > same number of pages in the node B. So the question becomes > > - we may need to allocate some pages in the node B > - these pages may be accessed by the application or not > - we should allocate all these pages in advance or allocate them lazily > when they are accessed. > > We thought the common philosophy in Linux kernel is to allocate lazily. The common philosophy is to cache as much as possible. And MADV_FREE pages are a kind of cache as well. If the target node is short on memory then those will be reclaimed as a cache so a pro-active freeing sounds counter productive as you do not have any idea whether that cache is going to be used in future. In other words you are not going to free a clean page cache if you want to use that memory as a migration target right? So you should make a clear case about why MADV_FREE cache is less important than the clean page cache and ideally have a good justification backed by real workloads. -- Michal Hocko SUSE Labs