Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753327AbcCNRDp (ORCPT ); Mon, 14 Mar 2016 13:03:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51227 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751941AbcCNRDl (ORCPT ); Mon, 14 Mar 2016 13:03:41 -0400 Date: Mon, 14 Mar 2016 17:03:34 +0000 From: "Dr. David Alan Gilbert" To: "Li, Liang Z" Cc: Amit Shah , "quintela@redhat.com" , "qemu-devel@nongnu.org" , "linux-kernel@vger.kernel.org" , "mst@redhat.com" , "akpm@linux-foundation.org" , "pbonzini@redhat.com" , "rth@twiddle.net" , "ehabkost@redhat.com" , "linux-mm@kvack.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "mohan_parthasarathy@hpe.com" , "jitendra.kolhe@hpe.com" , "simhan@hpe.com" Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization Message-ID: <20160314170334.GK2234@work-vm> References: <1457001868-15949-1-git-send-email-liang.z.li@intel.com> <20160308111343.GM15443@grmbl.mre> <20160310075728.GB4678@grmbl.mre> <20160310111844.GB2276@work-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 14 Mar 2016 17:03:40 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5744 Lines: 125 * Li, Liang Z (liang.z.li@intel.com) wrote: > > > > Hi, > > I'm just catching back up on this thread; so without reference to any > > particular previous mail in the thread. > > > > 1) How many of the free pages do we tell the host about? > > Your main change is telling the host about all the > > free pages. > > Yes, all the guest's free pages. > > > If we tell the host about all the free pages, then we might > > end up needing to allocate more pages and update the host > > with pages we now want to use; that would have to wait for the > > host to acknowledge that use of these pages, since if we don't > > wait for it then it might have skipped migrating a page we > > just started using (I don't understand how your series solves that). > > So the guest probably needs to keep some free pages - how many? > > Actually, there is no need to care about whether the free pages will be used by the host. > We only care about some of the free pages we get reused by the guest, right? > > The dirty page logging can be used to solve this, starting the dirty page logging before getting > the free pages informant from guest. Even some of the free pages are modified by the guest > during the process of getting the free pages information, these modified pages will be traced > by the dirty page logging mechanism. So in the following migration_bitmap_sync() function. > The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't > omit any dirtied pages. > > So, guest doesn't need to keep any free pages. OK, yes, that works; so we do: * enable dirty logging * ask guest for free pages * initialise the migration bitmap as everything-free * then later we do the normal sync-dirty bitmap stuff and it all just works. That's nice and simple. > > 2) Clearing out caches > > Does it make sense to clean caches? They're apparently useful data > > so if we clean them it's likely to slow the guest down; I guess > > they're also likely to be fairly static data - so at least fairly > > easy to migrate. > > The answer here partially depends on what you want from your migration; > > if you're after the fastest possible migration time it might make > > sense to clean the caches and avoid migrating them; but that might > > be at the cost of more disruption to the guest - there's a trade off > > somewhere and it's not clear to me how you set that depending on your > > guest/network/reqirements. > > > > Yes, clean the caches is an option. Let the users decide using it or not. > > > 3) Why is ballooning slow? > > You've got a figure of 5s to balloon on an 8GB VM - but an > > 8GB VM isn't huge; so I worry about how long it would take > > on a big VM. We need to understand why it's slow > > * is it due to the guest shuffling pages around? > > * is it due to the virtio-balloon protocol sending one page > > at a time? > > + Do balloon pages normally clump in physical memory > > - i.e. would a 'large balloon' message help > > - or do we need a bitmap because it tends not to clump? > > > > I didn't do a comprehensive test. But I found most of the time spending > on allocating the pages and sending the PFNs to guest, I don't know that's > the most time consuming operation, allocating the pages or sending the PFNs. It might be a good idea to analyse it a bit more to convince people where the problem is. > > * is it due to the madvise on the host? > > If we were using the normal balloon messages, then we > > could, during migration, just route those to the migration > > code rather than bothering with the madvise. > > If they're clumping together we could just turn that into > > one big madvise; if they're not then would we benefit from > > a call that lets us madvise lots of areas? > > > > My test showed madvise() is not the main reason for the long time, only taken > 10% of the total inflating balloon operation time. > Big madvise can more or less improve the performance. OK; 10% of the total is still pretty big even for your 8GB VM. > > 4) Speeding up the migration of those free pages > > You're using the bitmap to avoid migrating those free pages; HPe's > > patchset is reconstructing a bitmap from the balloon data; OK, so > > this all makes sense to avoid migrating them - I'd also been thinking > > of using pagemap to spot zero pages that would help find other zero'd > > pages, but perhaps ballooned is enough? > > > Could you describe your ideal with more details? At the moment the migration code spends a fair amount of time checking if a page is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap and check if the page was mapped; that would seem cheap if we're checking big ranges; and that would find all the balloon pages. > > 5) Second-migrate > > Given a VM where you've done all those tricks on, what happens when > > you migrate it a second time? I guess you're aiming for the guest > > to update it's bitmap; HPe's solution is to migrate it's balloon > > bitmap along with the migration data. > > Nothing is special in the second migration, QEMU will request the guest for free pages > Information, and the guest will traverse it's current free page list to construct a > new free page bitmap and send it to QEMU. Just like in the first migration. Right. Dave > Liang > > > > Dave > > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK