Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757649AbcCDPNZ (ORCPT ); Fri, 4 Mar 2016 10:13:25 -0500 Received: from mga01.intel.com ([192.55.52.88]:61364 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754057AbcCDPNX convert rfc822-to-8bit (ORCPT ); Fri, 4 Mar 2016 10:13:23 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,536,1449561600"; d="scan'208";a="926863321" From: "Li, Liang Z" To: "Michael S. Tsirkin" CC: "Dr. David Alan Gilbert" , Roman Kagan , "ehabkost@redhat.com" , "kvm@vger.kernel.org" , "quintela@redhat.com" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "linux-mm@kvack.org" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "akpm@linux-foundation.org" , "virtualization@lists.linux-foundation.org" , "rth@twiddle.net" Subject: RE: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization Thread-Topic: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization Thread-Index: AQHRdTqPjTxTnYWKZEWM4lf/HjT6rZ9HeJiAgADtUoCAAImCUP//gbqAgAAJEwCAAIakYP//hDSAgACKl8D//4NVAAAZNxYg Date: Fri, 4 Mar 2016 15:13:03 +0000 Message-ID: References: <1457001868-15949-1-git-send-email-liang.z.li@intel.com> <20160303174615.GF2115@work-vm> <20160304075538.GC9100@rkaganb.sw.ru> <20160304083550.GE9100@rkaganb.sw.ru> <20160304090820.GA2149@work-vm> <20160304114519-mutt-send-email-mst@redhat.com> <20160304122456-mutt-send-email-mst@redhat.com> In-Reply-To: <20160304122456-mutt-send-email-mst@redhat.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiM2E3MjMwOGMtYzMwYy00ZDkzLTliODUtNGRkMDgxNTlkNjlmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6Ino5MUVjdjd4RDFMYXNlUUxEYTlGSXZMQzJOY1dtNFU2cDBQeFpDMFVcLzAwPSJ9 x-ctpclassification: CTP_IC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2353 Lines: 68 > > Maybe I am not clear enough. > > > > I mean if we inflate balloon before live migration, for a 8GB guest, it takes > about 5 Seconds for the inflating operation to finish. > > And these 5 seconds are spent where? > The time is spent on allocating the pages and send the allocated pages pfns to QEMU through virtio. > > For the PV solution, there is no need to inflate balloon before live > > migration, the only cost is to traversing the free_list to construct > > the free pages bitmap, and it takes about 20ms for a 8GB idle guest( less if > there is less free pages), passing the free pages info to host will take about > extra 3ms. > > > > > > Liang > > So now let's please stop talking about solutions at a high level and discuss the > interface changes you make in detail. > What makes it faster? Better host/guest interface? No need to go through > buddy allocator within guest? Less interrupts? Something else? > I assume you are familiar with the current virtio-balloon and how it works. The new interface is very simple, send a request to the virtio-balloon driver, The virtio-driver will travers the '&zone->free_area[order].free_list[t])' to construct a 'free_page_bitmap', and then the driver will send the content of 'free_page_bitmap' back to QEMU. That all the new interface does and there are no ' alloc_page' related affairs, so it's faster. Some code snippet: ---------------------------------------------- +static void mark_free_pages_bitmap(struct zone *zone, + unsigned long *free_page_bitmap, unsigned long pfn_gap) { + unsigned long pfn, flags, i; + unsigned int order, t; + struct list_head *curr; + + if (zone_is_empty(zone)) + return; + + spin_lock_irqsave(&zone->lock, flags); + + for_each_migratetype_order(order, t) { + list_for_each(curr, &zone->free_area[order].free_list[t]) { + + pfn = page_to_pfn(list_entry(curr, struct page, lru)); + for (i = 0; i < (1UL << order); i++) { + if ((pfn + i) >= PFN_4G) + set_bit_le(pfn + i - pfn_gap, + free_page_bitmap); + else + set_bit_le(pfn + i, free_page_bitmap); + } + } + } + + spin_unlock_irqrestore(&zone->lock, flags); } ---------------------------------------------------- Sorry for my poor English and expression, if you still can't understand, you could glance at the patch, total about 400 lines. > > > > -- > > > MST