Message-ID: <54644E48.3040506@suse.com>
Date: Thu, 13 Nov 2014 07:23:04 +0100
From: Juergen Gross <jgross@suse.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: linux-kernel@vger.kernel.org, xen-devel@lists.xensource.com,
        david.vrabel@citrix.com, boris.ostrovsky@oracle.com, x86@kernel.org,
        tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com
Subject: Re: [PATCH V3 2/8] xen: Delay remapping memory of pv-domain
References: <1415684626-18590-1-git-send-email-jgross@suse.com> <1415684626-18590-3-git-send-email-jgross@suse.com> <20141112214506.GA5922@laptop.dumpdata.com>
In-Reply-To: <20141112214506.GA5922@laptop.dumpdata.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org

On 11/12/2014 10:45 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, Nov 11, 2014 at 06:43:40AM +0100, Juergen Gross wrote:
>> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
>> index a8a1a3d..d3e492b 100644
>> --- a/arch/x86/xen/mmu.c
>> +++ b/arch/x86/xen/mmu.c
>> @@ -1223,6 +1223,10 @@ static void __init xen_pagetable_init(void)
>>   	/* Allocate and initialize top and mid mfn levels for p2m structure */
>>   	xen_build_mfn_list_list();
>>
>> +	/* Remap memory freed because of conflicts with E820 map */
>
> s/becasue of/due to

Okay.

>>   	/* Boundary cross-over for the edges: */
>> -	p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
>> +	p2m = alloc_p2m_page();
>>
>>   	p2m_init(p2m);
>>
>> @@ -640,7 +651,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn)
>>
>>   	mid = p2m_top[topidx];
>>   	if (mid == p2m_mid_missing) {
>> -		mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
>> +		mid = alloc_p2m_page();
>>
>>   		p2m_mid_init(mid, p2m_missing);
>>
>> @@ -649,100 +660,6 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn)
>>   	return true;
>>   }
>>
>
> I would split this patch in two - one for the extend_brk/alloc_page conversation
> to alloc_p2m_page and free_page to free_p2m_page.

Okay.

>> -/* Buffer used to remap identity mapped pages */
>> -unsigned long xen_remap_buf[P2M_PER_PAGE] __initdata;
>> +/*
>> + * Buffer used to remap identity mapped pages. We only need the virtual space.
>
> Could you expand on the 'need the virtual space'?

I'll update the comment to:

/*
  * Buffer used to remap identity mapped pages. We only need the virtual
  * space. The physical page behind this address is remapped as needed to
  * different buffer pages.
  */

>
>
> .. snip..
>>   /*
>>    * This function updates the p2m and m2p tables with an identity map from
>> - * start_pfn to start_pfn+size and remaps the underlying RAM of the original
>> - * allocation at remap_pfn. It must do so carefully in P2M_PER_PAGE sized blocks
>> - * to not exhaust the reserved brk space. Doing it in properly aligned blocks
>> - * ensures we only allocate the minimum required leaf pages in the p2m table. It
>> - * copies the existing mfns from the p2m table under the 1:1 map, overwrites
>> - * them with the identity map and then updates the p2m and m2p tables with the
>> - * remapped memory.
>> + * start_pfn to start_pfn+size and prepares remapping the underlying RAM of the
>> + * original allocation at remap_pfn. The information needed for remapping is
>> + * saved in the memory itself to avoid the need for allocating buffers. The
>> + * complete remap information is contained in a list of MFNs each containing
>> + * up to REMAP_SIZE MFNs and the start target PFN for doing the remap.
>> + * This enables to preserve the original mfn sequence while doing the remapping
>
> us to

Yep.

>> + * at a time when the memory management is capable of allocating virtual and
>> + * physical memory in arbitrary amounts.
>
> You might want to add, see 'xen_remap_memory' and its callers.

Okay.

>> -		/* These two checks move from the start to end boundaries */
>> -		if (ident_boundary_pfn == ident_start_pfn_align)
>> -			ident_boundary_pfn = ident_pfn_iter;
>> -		if (remap_boundary_pfn == remap_start_pfn_align)
>> -			remap_boundary_pfn = remap_pfn_iter;
>> +		/* Map first pfn to xen_remap_buf */
>> +		mfn = pfn_to_mfn(ident_pfn_iter);
>> +		set_pte_mfn(buf, mfn, PAGE_KERNEL);
>
> So you set the buf to be point to 'mfn'.

Correct.

>>
>> -		/* Check we aren't past the end */
>> -		BUG_ON(ident_boundary_pfn >= start_pfn + size);
>> -		BUG_ON(remap_boundary_pfn >= remap_pfn + size);
>> +		/* Save mapping information in page */
>> +		xen_remap_buf.next_area_mfn = xen_remap_mfn;
>> +		xen_remap_buf.target_pfn = remap_pfn_iter;
>> +		xen_remap_buf.size = chunk;
>> +		for (i = 0; i < chunk; i++)
>> +			xen_remap_buf.mfns[i] = pfn_to_mfn(ident_pfn_iter + i);
>>
>> -		mfn = pfn_to_mfn(ident_boundary_pfn);
>> +		/* New element first in list */
>
> I don't get that comment. Don't you mean the MFN of the last chunk you
> had stashed the 'xen_remap_buf' structure in?
>
> The 'xen_remap_mfn' ends up being the the tail value of this
> "list".

I'll redo the comment:

/* Put remap buf into list. */

>> +/*
>> + * Remap the memory prepared in xen_do_set_identity_and_remap_chunk().
>> + */
>> +void __init xen_remap_memory(void)
>> +{
>> +	unsigned long buf = (unsigned long)&xen_remap_buf;
>> +	unsigned long mfn_save, mfn, pfn;
>> +	unsigned long remapped = 0, released = 0;
>> +	unsigned int i, free;
>> +	unsigned long pfn_s = ~0UL;
>> +	unsigned long len = 0;
>> +
>> +	mfn_save = virt_to_mfn(buf);
>> +
>> +	while (xen_remap_mfn != INVALID_P2M_ENTRY) {
>
> So the 'list' is constructed by going forward - that is from low-numbered
> PFNs to higher numbered ones. But the 'xen_remap_mfn' is going the
> other way - from the highest PFN to the lowest PFN.
>
> Won't that mean we will restore the chunks of memory in the wrong
> order? That is we will still restore them in chunks size, but the
> chunks will be in descending order instead of ascending?

No, the information where to put each chunk is contained in the chunk
data. I can add a comment explaining this.

>
>> +		/* Map the remap information */
>> +		set_pte_mfn(buf, xen_remap_mfn, PAGE_KERNEL);
>> +
>> +		BUG_ON(xen_remap_mfn != xen_remap_buf.mfns[0]);
>> +
>> +		free = 0;
>> +		pfn = xen_remap_buf.target_pfn;
>> +		for (i = 0; i < xen_remap_buf.size; i++) {
>> +			mfn = xen_remap_buf.mfns[i];
>> +			if (!released && xen_update_mem_tables(pfn, mfn)) {
>> +				remapped++;
>
> If we fail 'xen_update_mem_tables' we will on the next chunk (so i+1) keep on
> freeing pages instead of trying to remap. Is that intentional? Could we
> try to remap?

Hmm, I'm not sure this is worth the effort. What could lead to failure
here? I suspect we could even just BUG() on failure. What do you think?


Juergen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/