Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753787AbdC1B6Y (ORCPT ); Mon, 27 Mar 2017 21:58:24 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:29672 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752664AbdC1B6X (ORCPT ); Mon, 27 Mar 2017 21:58:23 -0400 Subject: Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" To: Dan Streetman References: <0628e2af-f7e7-056a-82ec-68860f9c4f29@oracle.com> <20170324211016.GG9755@char.us.oracle.com> Cc: Konrad Rzeszutek Wilk , Juergen Gross , xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org From: Boris Ostrovsky Message-ID: <9b134234-5b38-c325-b3c2-f37b4c45c2cf@oracle.com> Date: Mon, 27 Mar 2017 21:57:58 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5859 Lines: 125 On 03/27/2017 03:57 PM, Dan Streetman wrote: > On Fri, Mar 24, 2017 at 9:33 PM, Boris Ostrovsky > wrote: >> >>> >>> I think we can all agree that the *ideal* situation would be, for the >>> balloon driver to not immediately hotplug memory so it can add 11 more >>> pages, so maybe I just need to figure out why the balloon driver >>> thinks it needs 11 more pages, and fix that. >> >> >> >> How does the new memory appear in the guest? Via online_pages()? >> >> Or is ballooning triggered from watch_target()? > > yes, it's triggered from watch_target() which then calls > online_pages() with the new memory. I added some debug (all numbers > are in hex): > > [ 0.500080] xen:balloon: Initialising balloon driver > [ 0.503027] xen:balloon: balloon_init: current/target pages 1fff9d > [ 0.504044] xen_balloon: Initialising balloon driver > [ 0.508046] xen_balloon: watch_target: new target 800000 kb > [ 0.508046] xen:balloon: balloon_set_new_target: target 200000 > [ 0.524024] xen:balloon: current_credit: target pages 200000 > current pages 1fff9d credit 63 > [ 0.567055] xen:balloon: balloon_process: current_credit 63 > [ 0.568005] xen:balloon: reserve_additional_memory: adding memory > resource for 8000 pages > [ 3.694443] online_pages: pfn 210000 nr_pages 8000 type 0 > [ 3.701072] xen:balloon: current_credit: target pages 200000 > current pages 1fff9d credit 63 > [ 3.701074] xen:balloon: balloon_process: current_credit 63 > [ 3.701075] xen:balloon: increase_reservation: nr_pages 63 > [ 3.701170] xen:balloon: increase_reservation: done, current_pages 1fffa8 > [ 3.701172] xen:balloon: current_credit: target pages 200000 > current pages 1fffa8 credit 58 > [ 3.701173] xen:balloon: balloon_process: current_credit 58 > [ 3.701173] xen:balloon: increase_reservation: nr_pages 58 > [ 3.701180] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 > [ 5.708085] xen:balloon: current_credit: target pages 200000 > current pages 1fffa8 credit 58 > [ 5.708088] xen:balloon: balloon_process: current_credit 58 > [ 5.708089] xen:balloon: increase_reservation: nr_pages 58 > [ 5.708106] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 > [ 9.716065] xen:balloon: current_credit: target pages 200000 > current pages 1fffa8 credit 58 > [ 9.716068] xen:balloon: balloon_process: current_credit 58 > [ 9.716069] xen:balloon: increase_reservation: nr_pages 58 > [ 9.716087] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 > > > and that continues forever at the max interval (32), since > max_retry_count is unlimited. So I think I understand things now; > first, the current_pages is set properly based on the e820 map: > > $ dmesg|grep -i e820 > [ 0.000000] e820: BIOS-provided physical RAM map: > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable > [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable > [ 0.000000] BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable > [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved > [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable > [ 0.000000] e820: last_pfn = 0x210000 max_arch_pfn = 0x400000000 > [ 0.000000] e820: last_pfn = 0xf0000 max_arch_pfn = 0x400000000 > [ 0.000000] e820: [mem 0xf0000000-0xfbffffff] available for PCI devices > [ 0.528007] e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff] > ubuntu@ip-172-31-60-112:~$ printf "%x\n" $[ 0x210000 - 0x100000 + > 0xf0000 - 0x100 + 0x9e - 1 ] > 1fff9d > > > then, the xen balloon notices its target has been set to 200000 by the > hypervisor. That target does account for the hole at 0xf0000 to > 0x100000, but it doesn't account for the hole at 0xe0 to 0x100 ( 0x20 > pages), nor the hole at 0x9e to 0xa0 ( 2 pages ), nor the unlisted > hole (that the kernel removes) at 0xa0 to 0xe0 ( 0x40 pages). That's > 0x62 pages, plus the 1-page hole at addr 0 that the kernel always > reserves, is 0x63 pages of holes, which aren't accounted for in the > hypervisor's target. > > so the balloon driver hotplugs the memory, and tries to increase its > reservation to provide the needed pages to get the current_pages up to > the target. However, when it calls the hypervisor to populate the > physmap, the hypervisor only allows 11 (0xb) pages to be populated; > all calls after that get back 0 from the hypervisor. > > Do you think the hypervisor's balloon target should account for the > e820 holes (and for the kernel's added hole at addr 0)? > Alternately/additionally, if the hypervisor doesn't want to support > ballooning, should it just return error from the call to populate the > physmap, and not allow those 11 pages? > > At this point, it doesn't seem to me like the kernel is doing anything > wrong, correct? > I think there is indeed a disconnect between target memory (provided by the toolstack) and current memory (i.e actual pages available to the guest). For example [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved are missed in target calculation. The hvmloader marks them as RESERVED (in build_e820_table()) but target value is not aware of this action. And then the same problem repeats when kernel removes 0x000a0000-0x000fffff chunk. (BTW, this is all happening before the new 0x8000 pages are onlined, which takes places much later and is a separate and what looks to me an unrelated event). -boris