Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966391AbdCXVLh (ORCPT ); Fri, 24 Mar 2017 17:11:37 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:32885 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754867AbdCXVLb (ORCPT ); Fri, 24 Mar 2017 17:11:31 -0400 Date: Fri, 24 Mar 2017 17:10:16 -0400 From: Konrad Rzeszutek Wilk To: Dan Streetman Cc: Boris Ostrovsky , David Vrabel , Juergen Gross , xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org Subject: Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" Message-ID: <20170324211016.GG9755@char.us.oracle.com> References: <0628e2af-f7e7-056a-82ec-68860f9c4f29@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.1 (2016-10-04) X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5157 Lines: 115 On Fri, Mar 24, 2017 at 04:34:23PM -0400, Dan Streetman wrote: > On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky > wrote: > > > > > > On 03/22/2017 05:16 PM, Dan Streetman wrote: > >> > >> I have a question about a problem introduced by this commit: > >> c275a57f5ec3056f732843b11659d892235faff7 > >> "xen/balloon: Set balloon's initial state to number of existing RAM pages" > >> > >> It changed the xen balloon current_pages calculation to start with the > >> number of physical pages in the system, instead of max_pfn. Since > >> get_num_physpages() does not include holes, it's always less than the > >> e820 map's max_pfn. > >> > >> However, the problem that commit introduced is, if the hypervisor sets > >> the balloon target to equal to the e820 map's max_pfn, then the > >> balloon target will *always* be higher than the initial current pages. > >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if > >> the OS adds any holes, the balloon target will be higher than the > >> current pages. This is the situation, for example, for Amazon AWS > >> instances. The result is, the xen balloon will always immediately > >> hotplug some memory at boot, but then make only (max_pfn - > >> get_num_physpages()) available to the system. > >> > >> This balloon-hotplugged memory can cause problems, if the hypervisor > >> wasn't expecting it; specifically, the system's physical page > >> addresses now will exceed the e820 map's max_pfn, due to the > >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device > >> DMA to/from those physical pages above the e820 max_pfn, it causes > >> problems. For example: > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 > >> > >> The additional small amount of balloon memory can cause other problems > >> as well, for example: > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 > >> > >> Anyway, I'd like to ask, was the original commit added because > >> hypervisors are supposed to set their balloon target to the guest > >> system's number of phys pages (max_pfn - holes)? The mailing list > >> discussion and commit description seem to indicate that. > > > > > > > > IIRC the problem that this was trying to fix was that since max_pfn includes > > holes, upon booting we'd immediately balloon down by the (typically, MMIO) > > hole size. > > > > If you boot a guest with ~4+GB memory you should see this. > > > > > >> However I'm > >> not sure how that is possible, because the kernel reserves its own > >> holes, regardless of any predefined holes in the e820 map; for > >> example, the kernel reserves 64k (by default) at phys addr 0 (the > >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So > >> the hypervisor really has no way to know what the "right" target to > >> specify is; unless it knows the exact guest OS and kernel version, and > >> kernel config values, it will never be able to correctly specify its > >> target to be exactly (e820 max_pfn - all holes). > >> > >> Should this commit be reverted? Should the xen balloon target be > >> adjusted based on kernel-added e820 holes? > > > > > > I think the second one but shouldn't current_pages be updated, and not the > > target? The latter is set by Xen (toolstack, via xenstore usually). > > > > Also, the bugs above (at least one of them) talk about NVMe and I wonder > > whether the memory that they add is of RAM type --- I believe it has its own > > type and so perhaps that introduces additional inconsistencies. AWS may have > > added their own support for that, which we don't have upstream yet. > > The type of memory doesn't have anything to do with it. > > The problem with NVMe is it's a passthrough device, so the guest talks > directly to the NVMe controller and does DMA with it. But the > hypervisor does swiotlb translation between the guest physical memory, Um, the hypervisor does not have SWIOTLB support, only IOMMU support. > and the host physical memory, so that the NVMe device can correctly > DMA to the right memory in the host. > > However, the hypervisor only has the guest's physical memory up to the > max e820 pfn mapped; it didn't expect the balloon driver to hotplug > any additional memory above the e820 max pfn, so when the NVMe driver > in the guest tries to tell the NVMe controller to DMA to that > balloon-hotplugged memory, the hypervisor fails the NVMe request, But when the memory hotplug happens the hypercalls are done to raise the max pfn. > because it can't do the guest-to-host phys mem mapping, since the > guest phys address is outside the expected max range. > > > > > > > -boris > > > > > > > >> Should something else be > >> done? > >> > >> For context, Amazon Linux has simply disabled Xen ballooning > >> completely. Likewise, we're planning to disable Xen ballooning in the > >> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS > >> Ubuntu kernels). However, if reverting this patch makes sense in a > >> bigger context (i.e. Xen users besides AWS), that would allow more > >> Ubuntu kernels to work correctly in AWS instances. > >> > >