Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp2311778ybf; Mon, 2 Mar 2020 06:13:21 -0800 (PST) X-Google-Smtp-Source: APXvYqyVcL3CC/n0ceBcq0VmEjrY+GRH7BzY0D5qJxkKlS/OiJFAg2xHuE3Yp+J9KQ6tWFAzqQeU X-Received: by 2002:aca:dc04:: with SMTP id t4mr11148974oig.51.1583158401148; Mon, 02 Mar 2020 06:13:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583158401; cv=none; d=google.com; s=arc-20160816; b=By7OJKHxOVR1hJdykuIMiivvCRIudt7rHB2Z+XGtnaKG20znNA3tzRREvWMSNcqjQF FpQ9zD0ZdHSL0eV42i3BMejrSq1p/McDhbZLA7pcnrnfyxM1+uDSof/hdL04kwyuu6FL zbaEEGi4Y8X6VA7Cnut3GMJF1ZKE2YLFZYR04rlE3PVl3Ysr6NnW6lMEbAsdwN7q11lu PRtN1V5axa5+zWtmPuSh6JnGn0mSWR10Eo6H0nwwvIahlxzozqZ77K3unjbKJOvwXrOH lw3omyFpaRx3XPR25lCTasXGjUzTZpNvRQUKpojk/NqZgDyQTIGFg6pQsTodcbjPLcCF JqVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from; bh=FEN4uRctYSODfeDcYdRXEybgvXuaReVQawnlLeJeGRM=; b=m+yf2hPbHplN7+tIwXReGIi2yhTOJ0zpX0QjNzHXmz4IHt/sCHQdQoaZBIVJfyaZ4M lSOZ3QFZyFHWNdko5PPbiayJgpz1503KgGa4Js5AY2RbZdauAeIzzF8C77ABN2KLfwIz zSaTUo+13sFe05eYKLM//rFZnvFRW6NHAhXRzEjL7AtMMhfu4EKaLcfj6RVZn14qPuiu XPzQ/66O8XgFAr2QU1jbvrQkPq5VL8c+QLbY2li8GMJVMB0uFVuRdRe5k/PTvL4523+K ego//c+9Ni0n8l+cIzupgFb0a6tZkPbo0GYa4v2rSNPECXf38dGeXM6kSyp8RPBCz1df SqWQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 19si6515433oip.93.2020.03.02.06.13.08; Mon, 02 Mar 2020 06:13:21 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727032AbgCBOM7 (ORCPT + 99 others); Mon, 2 Mar 2020 09:12:59 -0500 Received: from mga17.intel.com ([192.55.52.151]:58061 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726884AbgCBOM7 (ORCPT ); Mon, 2 Mar 2020 09:12:59 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Mar 2020 06:12:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,507,1574150400"; d="scan'208";a="233168478" Received: from yhuang-dev.sh.intel.com (HELO yhuang-dev) ([10.239.159.23]) by orsmga008.jf.intel.com with ESMTP; 02 Mar 2020 06:12:54 -0800 From: "Huang\, Ying" To: Michal Hocko Cc: David Hildenbrand , Matthew Wilcox , Andrew Morton , , , Mel Gorman , Vlastimil Babka , Zi Yan , Peter Zijlstra , Dave Hansen , Minchan Kim , "Johannes Weiner" , Hugh Dickins , "Alexander Duyck" Subject: Re: [RFC 0/3] mm: Discard lazily freed pages when migrating References: <20200228033819.3857058-1-ying.huang@intel.com> <20200228034248.GE29971@bombadil.infradead.org> <87a7538977.fsf@yhuang-dev.intel.com> <871rqf850z.fsf@yhuang-dev.intel.com> <20200228095048.GK3771@dhcp22.suse.cz> Date: Mon, 02 Mar 2020 22:12:53 +0800 In-Reply-To: <20200228095048.GK3771@dhcp22.suse.cz> (Michal Hocko's message of "Fri, 28 Feb 2020 10:50:48 +0100") Message-ID: <87d09u7sm2.fsf@yhuang-dev.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Michal Hocko writes: > On Fri 28-02-20 16:55:40, Huang, Ying wrote: >> David Hildenbrand writes: > [...] >> > E.g., free page reporting in QEMU wants to use MADV_FREE. The guest will >> > report currently free pages to the hypervisor, which will MADV_FREE the >> > reported memory. As long as there is no memory pressure, there is no >> > need to actually free the pages. Once the guest reuses such a page, it >> > could happen that there is still the old page and pulling in in a fresh >> > (zeroed) page can be avoided. >> > >> > AFAIKs, after your change, we would get more pages discarded from our >> > guest, resulting in more fresh (zeroed) pages having to be pulled in >> > when a guest touches a reported free page again. But OTOH, page >> > migration is speed up (avoiding to migrate these pages). >> >> Let's look at this problem in another perspective. To migrate the >> MADV_FREE pages of the QEMU process from the node A to the node B, we >> need to free the original pages in the node A, and (maybe) allocate the >> same number of pages in the node B. So the question becomes >> >> - we may need to allocate some pages in the node B >> - these pages may be accessed by the application or not >> - we should allocate all these pages in advance or allocate them lazily >> when they are accessed. >> >> We thought the common philosophy in Linux kernel is to allocate lazily. > > The common philosophy is to cache as much as possible. Yes. This is another common philosophy. And MADV_FREE pages is different from caches such as the page caches because it has no valid contents. And this patchset doesn't disable MADV_FREE mechanism. It just change the migration behavior. So MADV_FREE pages will be kept until reclaiming most of the times. > And MADV_FREE pages are a kind of cache as well. If the target node is > short on memory then those will be reclaimed as a cache so a > pro-active freeing sounds counter productive as you do not have any > idea whether that cache is going to be used in future. In other words > you are not going to free a clean page cache if you want to use that > memory as a migration target right? So you should make a clear case > about why MADV_FREE cache is less important than the clean page cache > and ideally have a good justification backed by real workloads. Clean page cache still have valid contents, while clean MADV_FREE pages has no valid contents. So penalty of discarding the clean page cache is reading from disk, while the penalty of discarding clean MADV_FREE pages is just page allocation and zeroing. I understand that MADV_FREE is another kind of cache and has its value. But in the original implementation, during migration, we have already freed the original "cache", then reallocate the cache elsewhere and copy. This appears more like all pages are populated in mmap() always. I know there's value to populate all pages in mmap(), but does that need to be done always by default? Best Regards, Huang, Ying