Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933871AbcCODbo (ORCPT ); Mon, 14 Mar 2016 23:31:44 -0400 Received: from mga04.intel.com ([192.55.52.120]:60068 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932131AbcCODbl convert rfc822-to-8bit (ORCPT ); Mon, 14 Mar 2016 23:31:41 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,337,1455004800"; d="scan'208";a="669658317" From: "Li, Liang Z" To: "Dr. David Alan Gilbert" CC: Amit Shah , "quintela@redhat.com" , "qemu-devel@nongnu.org" , "linux-kernel@vger.kernel.org" , "mst@redhat.com" , "akpm@linux-foundation.org" , "pbonzini@redhat.com" , "rth@twiddle.net" , "ehabkost@redhat.com" , "linux-mm@kvack.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "mohan_parthasarathy@hpe.com" , "jitendra.kolhe@hpe.com" , "simhan@hpe.com" Subject: RE: [RFC qemu 0/4] A PV solution for live migration optimization Thread-Topic: [RFC qemu 0/4] A PV solution for live migration optimization Thread-Index: AQHRdTqPjTxTnYWKZEWM4lf/HjT6rZ9O5pSAgANvNuD//36fAIAAkLUw//+niICAAXgSIIAFMZkAgAEdbFA= Date: Tue, 15 Mar 2016 03:31:36 +0000 Message-ID: References: <1457001868-15949-1-git-send-email-liang.z.li@intel.com> <20160308111343.GM15443@grmbl.mre> <20160310075728.GB4678@grmbl.mre> <20160310111844.GB2276@work-vm> <20160314170334.GK2234@work-vm> In-Reply-To: <20160314170334.GK2234@work-vm> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNzEwNjY5YzAtN2QxMi00Y2MzLTljMDYtOTMxYmMxZTU0ZDc1IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IkJWYVRlVExWOVExYTBOb0pLd0VPNHRmSm1mWWhmaTNrXC9HVk9pZlc4aGpZPSJ9 x-ctpclassification: CTP_IC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6107 Lines: 138 > > > Hi, > > > I'm just catching back up on this thread; so without reference to > > > any particular previous mail in the thread. > > > > > > 1) How many of the free pages do we tell the host about? > > > Your main change is telling the host about all the > > > free pages. > > > > Yes, all the guest's free pages. > > > > > If we tell the host about all the free pages, then we might > > > end up needing to allocate more pages and update the host > > > with pages we now want to use; that would have to wait for the > > > host to acknowledge that use of these pages, since if we don't > > > wait for it then it might have skipped migrating a page we > > > just started using (I don't understand how your series solves that). > > > So the guest probably needs to keep some free pages - how many? > > > > Actually, there is no need to care about whether the free pages will be > used by the host. > > We only care about some of the free pages we get reused by the guest, > right? > > > > The dirty page logging can be used to solve this, starting the dirty > > page logging before getting the free pages informant from guest. Even > > some of the free pages are modified by the guest during the process of > > getting the free pages information, these modified pages will be traced by > the dirty page logging mechanism. So in the following > migration_bitmap_sync() function. > > The pages in the free pages bitmap, but latter was modified, will be > > reset to dirty. We won't omit any dirtied pages. > > > > So, guest doesn't need to keep any free pages. > > OK, yes, that works; so we do: > * enable dirty logging > * ask guest for free pages > * initialise the migration bitmap as everything-free > * then later we do the normal sync-dirty bitmap stuff and it all just works. > > That's nice and simple. > > > > 2) Clearing out caches > > > Does it make sense to clean caches? They're apparently useful data > > > so if we clean them it's likely to slow the guest down; I guess > > > they're also likely to be fairly static data - so at least fairly > > > easy to migrate. > > > The answer here partially depends on what you want from your > migration; > > > if you're after the fastest possible migration time it might make > > > sense to clean the caches and avoid migrating them; but that might > > > be at the cost of more disruption to the guest - there's a trade off > > > somewhere and it's not clear to me how you set that depending on > your > > > guest/network/reqirements. > > > > > > > Yes, clean the caches is an option. Let the users decide using it or not. > > > > > 3) Why is ballooning slow? > > > You've got a figure of 5s to balloon on an 8GB VM - but an > > > 8GB VM isn't huge; so I worry about how long it would take > > > on a big VM. We need to understand why it's slow > > > * is it due to the guest shuffling pages around? > > > * is it due to the virtio-balloon protocol sending one page > > > at a time? > > > + Do balloon pages normally clump in physical memory > > > - i.e. would a 'large balloon' message help > > > - or do we need a bitmap because it tends not to clump? > > > > > > > I didn't do a comprehensive test. But I found most of the time > > spending on allocating the pages and sending the PFNs to guest, I > > don't know that's the most time consuming operation, allocating the pages > or sending the PFNs. > > It might be a good idea to analyse it a bit more to convince people where the > problem is. > Yes, I will try to measure the time spending on different parts. > > > * is it due to the madvise on the host? > > > If we were using the normal balloon messages, then we > > > could, during migration, just route those to the migration > > > code rather than bothering with the madvise. > > > If they're clumping together we could just turn that into > > > one big madvise; if they're not then would we benefit from > > > a call that lets us madvise lots of areas? > > > > > > > My test showed madvise() is not the main reason for the long time, > > only taken 10% of the total inflating balloon operation time. > > Big madvise can more or less improve the performance. > > OK; 10% of the total is still pretty big even for your 8GB VM. > > > > 4) Speeding up the migration of those free pages > > > You're using the bitmap to avoid migrating those free pages; HPe's > > > patchset is reconstructing a bitmap from the balloon data; OK, so > > > this all makes sense to avoid migrating them - I'd also been thinking > > > of using pagemap to spot zero pages that would help find other zero'd > > > pages, but perhaps ballooned is enough? > > > > > Could you describe your ideal with more details? > > At the moment the migration code spends a fair amount of time checking if a > page is zero; I was thinking perhaps the qemu could just open > /proc/self/pagemap and check if the page was mapped; that would seem > cheap if we're checking big ranges; and that would find all the balloon pages. > Even if virtio-balloon is not enabled, it can be used to find the pages that never used by guest. > > > 5) Second-migrate > > > Given a VM where you've done all those tricks on, what happens when > > > you migrate it a second time? I guess you're aiming for the guest > > > to update it's bitmap; HPe's solution is to migrate it's balloon > > > bitmap along with the migration data. > > > > Nothing is special in the second migration, QEMU will request the > > guest for free pages Information, and the guest will traverse it's > > current free page list to construct a new free page bitmap and send it to > QEMU. Just like in the first migration. > > Right. > > Dave > > > Liang > > > > > > Dave > > > > > > -- > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK