Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751732AbdCWH4Z (ORCPT ); Thu, 23 Mar 2017 03:56:25 -0400 Received: from mx2.suse.de ([195.135.220.15]:45453 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751123AbdCWH4Y (ORCPT ); Thu, 23 Mar 2017 03:56:24 -0400 Subject: Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" To: Dan Streetman References: <0628e2af-f7e7-056a-82ec-68860f9c4f29@oracle.com> Cc: Boris Ostrovsky , Konrad Rzeszutek Wilk , xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org From: Juergen Gross Message-ID: <1bf56d75-4ffb-ba41-4c96-76c120c7800c@suse.com> Date: Thu, 23 Mar 2017 08:56:20 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <0628e2af-f7e7-056a-82ec-68860f9c4f29@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3574 Lines: 80 On 23/03/17 03:13, Boris Ostrovsky wrote: > > > On 03/22/2017 05:16 PM, Dan Streetman wrote: >> I have a question about a problem introduced by this commit: >> c275a57f5ec3056f732843b11659d892235faff7 >> "xen/balloon: Set balloon's initial state to number of existing RAM >> pages" >> >> It changed the xen balloon current_pages calculation to start with the >> number of physical pages in the system, instead of max_pfn. Since >> get_num_physpages() does not include holes, it's always less than the >> e820 map's max_pfn. >> >> However, the problem that commit introduced is, if the hypervisor sets >> the balloon target to equal to the e820 map's max_pfn, then the >> balloon target will *always* be higher than the initial current pages. >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >> the OS adds any holes, the balloon target will be higher than the >> current pages. This is the situation, for example, for Amazon AWS >> instances. The result is, the xen balloon will always immediately >> hotplug some memory at boot, but then make only (max_pfn - >> get_num_physpages()) available to the system. >> >> This balloon-hotplugged memory can cause problems, if the hypervisor >> wasn't expecting it; specifically, the system's physical page >> addresses now will exceed the e820 map's max_pfn, due to the >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >> DMA to/from those physical pages above the e820 max_pfn, it causes >> problems. For example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >> >> The additional small amount of balloon memory can cause other problems >> as well, for example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >> >> Anyway, I'd like to ask, was the original commit added because >> hypervisors are supposed to set their balloon target to the guest >> system's number of phys pages (max_pfn - holes)? The mailing list >> discussion and commit description seem to indicate that. > > > IIRC the problem that this was trying to fix was that since max_pfn > includes holes, upon booting we'd immediately balloon down by the > (typically, MMIO) hole size. > > If you boot a guest with ~4+GB memory you should see this. > > >> However I'm >> not sure how that is possible, because the kernel reserves its own >> holes, regardless of any predefined holes in the e820 map; for >> example, the kernel reserves 64k (by default) at phys addr 0 (the >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >> the hypervisor really has no way to know what the "right" target to >> specify is; unless it knows the exact guest OS and kernel version, and >> kernel config values, it will never be able to correctly specify its >> target to be exactly (e820 max_pfn - all holes). >> >> Should this commit be reverted? Should the xen balloon target be >> adjusted based on kernel-added e820 holes? > > I think the second one but shouldn't current_pages be updated, and not > the target? The latter is set by Xen (toolstack, via xenstore usually). Right. Looking into a HVM domU I can't see any problem related to CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is configured with 2048 MB of RAM, 8MB being video RAM. Looking into /sys/devices/system/xen_memory/xen_memory0 I can see the current size and target size do match: both are 2088960 kB (2 GB - 8 MB). Ballooning down and up to 2048 MB again doesn't change the picture. So which additional holes are added by the kernel on AWS via which functions? Juergen