Subject: Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's
 initial state to number of existing RAM pages"
To: Dan Streetman <dan.streetman@canonical.com>
References: <CAOZ2QJMkHqeTGGVKFjGw-Gn_zMLmM6Ls27Z=k3N-MYyzz7ihfQ@mail.gmail.com>
 <0628e2af-f7e7-056a-82ec-68860f9c4f29@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
        Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
        xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org
From: Juergen Gross <jgross@suse.com>
Message-ID: <1bf56d75-4ffb-ba41-4c96-76c120c7800c@suse.com>
Date: Thu, 23 Mar 2017 08:56:20 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <0628e2af-f7e7-056a-82ec-68860f9c4f29@oracle.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3574
Lines: 80

On 23/03/17 03:13, Boris Ostrovsky wrote:
> 
> 
> On 03/22/2017 05:16 PM, Dan Streetman wrote:
>> I have a question about a problem introduced by this commit:
>> c275a57f5ec3056f732843b11659d892235faff7
>> "xen/balloon: Set balloon's initial state to number of existing RAM
>> pages"
>>
>> It changed the xen balloon current_pages calculation to start with the
>> number of physical pages in the system, instead of max_pfn.  Since
>> get_num_physpages() does not include holes, it's always less than the
>> e820 map's max_pfn.
>>
>> However, the problem that commit introduced is, if the hypervisor sets
>> the balloon target to equal to the e820 map's max_pfn, then the
>> balloon target will *always* be higher than the initial current pages.
>> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
>> the OS adds any holes, the balloon target will be higher than the
>> current pages.  This is the situation, for example, for Amazon AWS
>> instances.  The result is, the xen balloon will always immediately
>> hotplug some memory at boot, but then make only (max_pfn -
>> get_num_physpages()) available to the system.
>>
>> This balloon-hotplugged memory can cause problems, if the hypervisor
>> wasn't expecting it; specifically, the system's physical page
>> addresses now will exceed the e820 map's max_pfn, due to the
>> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
>> DMA to/from those physical pages above the e820 max_pfn, it causes
>> problems.  For example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
>>
>> The additional small amount of balloon memory can cause other problems
>> as well, for example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
>>
>> Anyway, I'd like to ask, was the original commit added because
>> hypervisors are supposed to set their balloon target to the guest
>> system's number of phys pages (max_pfn - holes)?  The mailing list
>> discussion and commit description seem to indicate that.
> 
> 
> IIRC the problem that this was trying to fix was that since max_pfn
> includes holes, upon booting we'd immediately balloon down by the
> (typically, MMIO) hole size.
> 
> If you boot a guest with ~4+GB memory you should see this.
> 
> 
>> However I'm
>> not sure how that is possible, because the kernel reserves its own
>> holes, regardless of any predefined holes in the e820 map; for
>> example, the kernel reserves 64k (by default) at phys addr 0 (the
>> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
>> the hypervisor really has no way to know what the "right" target to
>> specify is; unless it knows the exact guest OS and kernel version, and
>> kernel config values, it will never be able to correctly specify its
>> target to be exactly (e820 max_pfn - all holes).
>>
>> Should this commit be reverted?  Should the xen balloon target be
>> adjusted based on kernel-added e820 holes?
> 
> I think the second one but shouldn't current_pages be updated, and not
> the target? The latter is set by Xen (toolstack, via xenstore usually).

Right.

Looking into a HVM domU I can't see any problem related to
CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is
configured with 2048 MB of RAM, 8MB being video RAM. Looking into
/sys/devices/system/xen_memory/xen_memory0 I can see the current
size and target size do match: both are 2088960 kB (2 GB - 8 MB).

Ballooning down and up to 2048 MB again doesn't change the picture.

So which additional holes are added by the kernel on AWS via which
functions?


Juergen