Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753866AbcCHODq (ORCPT ); Tue, 8 Mar 2016 09:03:46 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40918 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751462AbcCHODi (ORCPT ); Tue, 8 Mar 2016 09:03:38 -0500 Date: Tue, 8 Mar 2016 16:03:31 +0200 From: "Michael S. Tsirkin" To: "Li, Liang Z" Cc: "Dr. David Alan Gilbert" , Roman Kagan , "ehabkost@redhat.com" , "kvm@vger.kernel.org" , "quintela@redhat.com" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "linux-mm@kvack.org" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "akpm@linux-foundation.org" , "virtualization@lists.linux-foundation.org" , "rth@twiddle.net" Subject: Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization Message-ID: <20160308160145-mutt-send-email-mst@redhat.com> References: <20160303174615.GF2115@work-vm> <20160304075538.GC9100@rkaganb.sw.ru> <20160304083550.GE9100@rkaganb.sw.ru> <20160304090820.GA2149@work-vm> <20160304114519-mutt-send-email-mst@redhat.com> <20160304122456-mutt-send-email-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Tue, 08 Mar 2016 14:03:37 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2635 Lines: 72 On Fri, Mar 04, 2016 at 03:13:03PM +0000, Li, Liang Z wrote: > > > Maybe I am not clear enough. > > > > > > I mean if we inflate balloon before live migration, for a 8GB guest, it takes > > about 5 Seconds for the inflating operation to finish. > > > > And these 5 seconds are spent where? > > > > The time is spent on allocating the pages and send the allocated pages pfns to QEMU > through virtio. What if we skip allocating pages but use the existing interface to send pfns to QEMU? > > > For the PV solution, there is no need to inflate balloon before live > > > migration, the only cost is to traversing the free_list to construct > > > the free pages bitmap, and it takes about 20ms for a 8GB idle guest( less if > > there is less free pages), passing the free pages info to host will take about > > extra 3ms. > > > > > > > > > Liang > > > > So now let's please stop talking about solutions at a high level and discuss the > > interface changes you make in detail. > > What makes it faster? Better host/guest interface? No need to go through > > buddy allocator within guest? Less interrupts? Something else? > > > > I assume you are familiar with the current virtio-balloon and how it works. > The new interface is very simple, send a request to the virtio-balloon driver, > The virtio-driver will travers the '&zone->free_area[order].free_list[t])' to > construct a 'free_page_bitmap', and then the driver will send the content > of 'free_page_bitmap' back to QEMU. That all the new interface does and > there are no ' alloc_page' related affairs, so it's faster. > > > Some code snippet: > ---------------------------------------------- > +static void mark_free_pages_bitmap(struct zone *zone, > + unsigned long *free_page_bitmap, unsigned long pfn_gap) { > + unsigned long pfn, flags, i; > + unsigned int order, t; > + struct list_head *curr; > + > + if (zone_is_empty(zone)) > + return; > + > + spin_lock_irqsave(&zone->lock, flags); > + > + for_each_migratetype_order(order, t) { > + list_for_each(curr, &zone->free_area[order].free_list[t]) { > + > + pfn = page_to_pfn(list_entry(curr, struct page, lru)); > + for (i = 0; i < (1UL << order); i++) { > + if ((pfn + i) >= PFN_4G) > + set_bit_le(pfn + i - pfn_gap, > + free_page_bitmap); > + else > + set_bit_le(pfn + i, free_page_bitmap); > + } > + } > + } > + > + spin_unlock_irqrestore(&zone->lock, flags); } > ---------------------------------------------------- > Sorry for my poor English and expression, if you still can't understand, > you could glance at the patch, total about 400 lines. > > > > > > -- > > > > MST