Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755561AbdCWCNv (ORCPT ); Wed, 22 Mar 2017 22:13:51 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:24150 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751333AbdCWCNl (ORCPT ); Wed, 22 Mar 2017 22:13:41 -0400 Subject: Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" To: Dan Streetman , Konrad Rzeszutek Wilk References: Cc: David Vrabel , Juergen Gross , xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org From: Boris Ostrovsky Message-ID: <0628e2af-f7e7-056a-82ec-68860f9c4f29@oracle.com> Date: Wed, 22 Mar 2017 22:13:15 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3711 Lines: 81 On 03/22/2017 05:16 PM, Dan Streetman wrote: > I have a question about a problem introduced by this commit: > c275a57f5ec3056f732843b11659d892235faff7 > "xen/balloon: Set balloon's initial state to number of existing RAM pages" > > It changed the xen balloon current_pages calculation to start with the > number of physical pages in the system, instead of max_pfn. Since > get_num_physpages() does not include holes, it's always less than the > e820 map's max_pfn. > > However, the problem that commit introduced is, if the hypervisor sets > the balloon target to equal to the e820 map's max_pfn, then the > balloon target will *always* be higher than the initial current pages. > Even if the hypervisor sets the target to (e820 max_pfn - holes), if > the OS adds any holes, the balloon target will be higher than the > current pages. This is the situation, for example, for Amazon AWS > instances. The result is, the xen balloon will always immediately > hotplug some memory at boot, but then make only (max_pfn - > get_num_physpages()) available to the system. > > This balloon-hotplugged memory can cause problems, if the hypervisor > wasn't expecting it; specifically, the system's physical page > addresses now will exceed the e820 map's max_pfn, due to the > balloon-hotplugged pages; if the hypervisor isn't expecting pt-device > DMA to/from those physical pages above the e820 max_pfn, it causes > problems. For example: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 > > The additional small amount of balloon memory can cause other problems > as well, for example: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 > > Anyway, I'd like to ask, was the original commit added because > hypervisors are supposed to set their balloon target to the guest > system's number of phys pages (max_pfn - holes)? The mailing list > discussion and commit description seem to indicate that. IIRC the problem that this was trying to fix was that since max_pfn includes holes, upon booting we'd immediately balloon down by the (typically, MMIO) hole size. If you boot a guest with ~4+GB memory you should see this. > However I'm > not sure how that is possible, because the kernel reserves its own > holes, regardless of any predefined holes in the e820 map; for > example, the kernel reserves 64k (by default) at phys addr 0 (the > amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So > the hypervisor really has no way to know what the "right" target to > specify is; unless it knows the exact guest OS and kernel version, and > kernel config values, it will never be able to correctly specify its > target to be exactly (e820 max_pfn - all holes). > > Should this commit be reverted? Should the xen balloon target be > adjusted based on kernel-added e820 holes? I think the second one but shouldn't current_pages be updated, and not the target? The latter is set by Xen (toolstack, via xenstore usually). Also, the bugs above (at least one of them) talk about NVMe and I wonder whether the memory that they add is of RAM type --- I believe it has its own type and so perhaps that introduces additional inconsistencies. AWS may have added their own support for that, which we don't have upstream yet. -boris > Should something else be > done? > > For context, Amazon Linux has simply disabled Xen ballooning > completely. Likewise, we're planning to disable Xen ballooning in the > Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS > Ubuntu kernels). However, if reverting this patch makes sense in a > bigger context (i.e. Xen users besides AWS), that would allow more > Ubuntu kernels to work correctly in AWS instances. >