Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756655AbdGYD7U (ORCPT ); Mon, 24 Jul 2017 23:59:20 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:20652 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752728AbdGYD7M (ORCPT ); Mon, 24 Jul 2017 23:59:12 -0400 Subject: Re: [PATCH] xen: allocate page for shared info page from low memory From: Boris Ostrovsky To: Juergen Gross , linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org References: <20170612115356.8312-1-jgross@suse.com> <75df36ae-bcc2-62e9-b585-cc2b74a682de@oracle.com> Message-ID: Date: Mon, 24 Jul 2017 23:58:51 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <75df36ae-bcc2-62e9-b585-cc2b74a682de@oracle.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3445 Lines: 91 On 07/23/2017 04:25 PM, Boris Ostrovsky wrote: > > > On 06/14/2017 01:11 PM, Juergen Gross wrote: >> On 14/06/17 18:58, Boris Ostrovsky wrote: >>> On 06/12/2017 07:53 AM, Juergen Gross wrote: >>>> In a HVM guest the kernel allocates the page for mapping the shared >>>> info structure via extend_brk() today. This will lead to a drop of >>>> performance as the underlying EPT entry will have to be split up into >>>> 4kB entries as the single shared info page is located in hypervisor >>>> memory. >>>> >>>> The issue has been detected by using the libmicro munmap test: >>>> unmapping 8kB of memory was faster by nearly a factor of two when no >>>> pv interfaces were active in the HVM guest. >>>> >>>> So instead of taking a page from memory which might be mapped via >>>> large EPT entries use a page which is already mapped via a 4kB EPT >>>> entry: we can take a page from the first 1MB of memory as the video >>>> memory at 640kB disallows using larger EPT entries. >>>> >>>> Signed-off-by: Juergen Gross >>>> --- >>>> arch/x86/xen/enlighten_hvm.c | 31 ++++++++++++++++++++++++------- >>>> arch/x86/xen/enlighten_pv.c | 2 -- >>>> 2 files changed, 24 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/arch/x86/xen/enlighten_hvm.c >>>> b/arch/x86/xen/enlighten_hvm.c >>>> index a6d014f47e52..c19477b6e43a 100644 >>>> --- a/arch/x86/xen/enlighten_hvm.c >>>> +++ b/arch/x86/xen/enlighten_hvm.c >>>> @@ -1,5 +1,6 @@ >>>> #include >>>> #include >>>> +#include >>>> #include >>>> #include >>>> @@ -10,9 +11,11 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> #include >>>> #include >>>> +#include >>>> #include "xen-ops.h" >>>> #include "mmu.h" >>>> @@ -22,20 +25,34 @@ void __ref xen_hvm_init_shared_info(void) >>>> { >>>> int cpu; >>>> struct xen_add_to_physmap xatp; >>>> - static struct shared_info *shared_info_page; >>>> + u64 pa; >>>> + >>>> + if (HYPERVISOR_shared_info == &xen_dummy_shared_info) { >>>> + /* >>>> + * Search for a free page starting at 4kB physical address. >>>> + * Low memory is preferred to avoid an EPT large page split up >>>> + * by the mapping. >>>> + * Starting below X86_RESERVE_LOW (usually 64kB) is fine as >>>> + * the BIOS used for HVM guests is well behaved and won't >>>> + * clobber memory other than the first 4kB. >>>> + */ >>>> + for (pa = PAGE_SIZE; >>>> + !e820__mapped_all(pa, pa + PAGE_SIZE, E820_TYPE_RAM) || >>>> + memblock_is_reserved(pa); >>>> + pa += PAGE_SIZE) >>>> + ; >>> >>> Is it possible to never find a page here? >> >> Only if there is no memory available at all. :-) >> >> TBH: I expect this to _always_ succeed at the first loop iteration. > > This patch seems to break (64-bit only) guests on dumpdata here. No > problems on other machines. > > So far all I know is that we did get the first page (0x1000) but not > much more. I will poke at this more on Monday. So the problem is due to KASLR --- we can't use __va() before kernel_randomize_memory() is called since it will change __PAGE_OFFSET. (Setting CONFIG_RANDOMIZE_BASE will cause failure.) -boris