Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965050AbcCPBUo (ORCPT ); Tue, 15 Mar 2016 21:20:44 -0400 Received: from mga04.intel.com ([192.55.52.120]:65347 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752756AbcCPBUn convert rfc822-to-8bit (ORCPT ); Tue, 15 Mar 2016 21:20:43 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,342,1455004800"; d="scan'208";a="924939522" From: "Li, Liang Z" To: "Dr. David Alan Gilbert" CC: "Michael S. Tsirkin" , Amit Shah , "quintela@redhat.com" , "qemu-devel@nongnu.org" , "linux-kernel@vger.kernel.org" , "akpm@linux-foundation.org" , "pbonzini@redhat.com" , "rth@twiddle.net" , "ehabkost@redhat.com" , "linux-mm@kvack.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "mohan_parthasarathy@hpe.com" , "jitendra.kolhe@hpe.com" , "simhan@hpe.com" Subject: RE: [RFC qemu 0/4] A PV solution for live migration optimization Thread-Topic: [RFC qemu 0/4] A PV solution for live migration optimization Thread-Index: AQHRdTqPjTxTnYWKZEWM4lf/HjT6rZ9O5pSAgANvNuD//36fAIAAkLUw//+niICAAXgSIIAFMZkAgAEkWICAAIva8IAAEhsAgADcNAA= Date: Wed, 16 Mar 2016 01:20:39 +0000 Message-ID: References: <1457001868-15949-1-git-send-email-liang.z.li@intel.com> <20160308111343.GM15443@grmbl.mre> <20160310075728.GB4678@grmbl.mre> <20160310111844.GB2276@work-vm> <20160314170334.GK2234@work-vm> <20160315121613-mutt-send-email-mst@redhat.com> <20160315195515.GL11728@work-vm> In-Reply-To: <20160315195515.GL11728@work-vm> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMzJkMzc4ZTItMmQzMS00ZDljLWE4NWMtMmM2ZTZjNjRhNjYyIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IkRcL25uVHZ4Wkd0aHNub3ppYUc4TXRQbVFmakNrZVwvV0wxZGswRWtNVlZRUT0ifQ== x-ctpclassification: CTP_IC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4197 Lines: 99 > > > > > > I'm just catching back up on this thread; so without > > > > > > reference to any particular previous mail in the thread. > > > > > > > > > > > > 1) How many of the free pages do we tell the host about? > > > > > > Your main change is telling the host about all the > > > > > > free pages. > > > > > > > > > > Yes, all the guest's free pages. > > > > > > > > > > > If we tell the host about all the free pages, then we might > > > > > > end up needing to allocate more pages and update the host > > > > > > with pages we now want to use; that would have to wait for the > > > > > > host to acknowledge that use of these pages, since if we don't > > > > > > wait for it then it might have skipped migrating a page we > > > > > > just started using (I don't understand how your series solves that). > > > > > > So the guest probably needs to keep some free pages - how > many? > > > > > > > > > > Actually, there is no need to care about whether the free pages > > > > > will be > > > used by the host. > > > > > We only care about some of the free pages we get reused by the > > > > > guest, > > > right? > > > > > > > > > > The dirty page logging can be used to solve this, starting the > > > > > dirty page logging before getting the free pages informant from guest. > > > > > Even some of the free pages are modified by the guest during the > > > > > process of getting the free pages information, these modified > > > > > pages will > > > be traced by the dirty page logging mechanism. So in the following > > > migration_bitmap_sync() function. > > > > > The pages in the free pages bitmap, but latter was modified, > > > > > will be reset to dirty. We won't omit any dirtied pages. > > > > > > > > > > So, guest doesn't need to keep any free pages. > > > > > > > > OK, yes, that works; so we do: > > > > * enable dirty logging > > > > * ask guest for free pages > > > > * initialise the migration bitmap as everything-free > > > > * then later we do the normal sync-dirty bitmap stuff and it all just > works. > > > > > > > > That's nice and simple. > > > > > > This works once, sure. But there's an issue is that you have to > > > defer migration until you get the free page list, and this only > > > works once. So you end up with heuristics about how long to wait. > > > > > > Instead I propose: > > > > > > - mark all pages dirty as we do now. > > > > > > - at start of migration, start tracking dirty > > > pages in kvm, and tell guest to start tracking free pages > > > > > > we can now introduce any kind of delay, for example wait for ack > > > from guest, or do whatever else, or even just start migrating pages > > > > > > - repeatedly: > > > - get list of free pages from guest > > > - clear them in migration bitmap > > > - get dirty list from kvm > > > > > > - at end of migration, stop tracking writes in kvm, > > > and tell guest to stop tracking free pages > > > > I had thought of filtering out the free pages in each migration bitmap > synchronization. > > The advantage is we can skip process as many free pages as possible. Not > just once. > > The disadvantage is that we should change the current memory > > management code to track the free pages, instead of traversing the free > page list to construct the free pages bitmap, to reduce the overhead to get > the free pages bitmap. > > I am not sure the if the Kernel people would like it. > > > > If keeping the traversing mechanism, because of the overhead, maybe it's > not worth to filter out the free pages repeatedly. > > Well, Michael's idea of not waiting for the dirty bitmap to be filled does make > that idea of constnatly using the free-bitmap better. > No wait is a good idea. Actually, we could shorten the waiting time by pre allocating the free pages bit map and update it when guest allocating/freeing pages. it requires to modify the mm related code. I don't know whether the kernel people like this. > In that case, is it easier if something (guest/host?) allocates some memory in > the guests physical RAM space and just points the host to it, rather than > having an explicit 'send'. > Good idea too. Liang > Dave