Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751835AbdFTS4e (ORCPT ); Tue, 20 Jun 2017 14:56:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48138 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751029AbdFTS4d (ORCPT ); Tue, 20 Jun 2017 14:56:33 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 7F22337EEF Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=mst@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 7F22337EEF Date: Tue, 20 Jun 2017 21:56:25 +0300 From: "Michael S. Tsirkin" To: David Hildenbrand Cc: Rik van Riel , Dave Hansen , Wei Wang , linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, cornelia.huck@de.ibm.com, akpm@linux-foundation.org, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, liliang.opensource@gmail.com, Nitesh Narayan Lal Subject: Re: [PATCH v11 4/6] mm: function to offer a page block on the free list Message-ID: <20170620215552-mutt-send-email-mst@kernel.org> References: <1497004901-30593-1-git-send-email-wei.w.wang@intel.com> <1497004901-30593-5-git-send-email-wei.w.wang@intel.com> <1497977049.20270.100.camel@redhat.com> <7b626551-6d1b-c8d5-4ef7-e357399e78dc@redhat.com> <20170620211445-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 20 Jun 2017 18:56:32 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2680 Lines: 63 On Tue, Jun 20, 2017 at 08:54:29PM +0200, David Hildenbrand wrote: > On 20.06.2017 20:17, Michael S. Tsirkin wrote: > > On Tue, Jun 20, 2017 at 06:49:33PM +0200, David Hildenbrand wrote: > >> On 20.06.2017 18:44, Rik van Riel wrote: > >>> On Mon, 2017-06-12 at 07:10 -0700, Dave Hansen wrote: > >>> > >>>> The hypervisor is going to throw away the contents of these pages, > >>>> right? As soon as the spinlock is released, someone can allocate a > >>>> page, and put good data in it. What keeps the hypervisor from > >>>> throwing > >>>> away good data? > >>> > >>> That looks like it may be the wrong API, then? > >>> > >>> We already have hooks called arch_free_page and > >>> arch_alloc_page in the VM, which are called when > >>> pages are freed, and allocated, respectively. > >>> > >>> Nitesh Lal (on the CC list) is working on a way > >>> to efficiently batch recently freed pages for > >>> free page hinting to the hypervisor. > >>> > >>> If that is done efficiently enough (eg. with > >>> MADV_FREE on the hypervisor side for lazy freeing, > >>> and lazy later re-use of the pages), do we still > >>> need the harder to use batch interface from this > >>> patch? > >>> > >> David's opinion incoming: > >> > >> No, I think proper free page hinting would be the optimum solution, if > >> done right. This would avoid the batch interface and even turn > >> virtio-balloon in some sense useless. > > > > I agree generally. But we have to balance that against the fact that > > this was discussed since at least 2011 and no one built this solution > > yet. > > I totally agree, and I still think it will be hard to get a decent > performance for free page hinting (let's call it challenging). But I > heard of some interesting ideas. Surprise me. > > Still, I would favor such an interface over a mm interface where people > start asking the same question over and over again ("how can this even > work"). Not only because it wasn't explained sufficiently enough, but > also because this interface is so special for one use case and one > scenario (concurrent dirty tracking in the host during migration). > > IMHO even simply writing all-zeros to all free pages before starting > migration (or even when freeing a page) would be a cleaner interface > than this (because it atomically works with the entity the host cares > about for migration). But yes, performance is horrible that's why I am > not even suggesting it. Just saying that this mm interface is very very > special and if we could find something better, I'd favor it. As long as there's a single user, changing to a better interface once it's found won't be hard at all :) > -- > > Thanks, > > David