LinuxLists.cc - Re: [RFC PATCH 06/18] ARM: LPAE: Introduce the 3-level page table format definitions

2010-12-06 09:27:53

Subject: Re: [RFC PATCH 06/18] ARM: LPAE: Introduce the 3-level page table format definitions

Sorry for jumping in here at such a late hour...

>> You can look at the IPA as the virtual address translation set up by the
>> hypervisor (stage 2 translation). The guest OS only sets up stage 1
>> translations but can use 40-bit physical addresses (via stage 1) with or
>> without the hypervisor. The input to the stage 1 translations is always
>> 32-bit.
>
>
> Right, that's what I thought.
>
>> > Are there any significant differences to Linux between setting up page
>> > tables for a 32 bit VA space or a 40 bit IPA space, other than the
>> > size of the PGD?
>>
>> I think I get what you were asking :).
>>
>> >From KVM you could indeed set up stage 2 translations that a guest OS
>> can use (you need some code running in hypervisor mode to turn this on).
>> The format is pretty close to the stage 1 tables, so the Linux macros
>> could be reused. The PGD size would be different (depending on whether
>> you want to emulate 40-bit physical address space or a 32-bit one).
>> There are also a few bits (memory attributes) that may differ but you
>> could handle them in KVM.
>>
>> If KVM would reuse the existing pgd/pmd/pte Linux macros, it would
>> indeed be restricted to 32-bit IPA (sizeof(long)). You may need to
>> define different macros to use either a pfn or long long as address
>> input.

I'm not even sure it would be a big advantage to re-use the macros for
KVM. Sure, creating separate macros may duplicate some bit-shifting
logic, but my guess is that code will be easier to read if using
separate macros for the 2-nd stage translation in KVM. One might also
imagine specific virtualization-oriented bits which could be
explicitly names or directly targeted in macros that don't have to
handle both standard non-virt tables and 2-nd stage translation
tables.

At least from my experience writing KVM code, it's difficult enough to
make it clear to anyone reading the code which address space exactly
is being referenced at which time.

>> But if KVM uses qemu for platform emulation, this may only support
>> 32-bit physical address space so the guest OS could only generate 32-bit
>> IPA.
>
> Good point. At the very least, qemu would need a way to get at the highmem
> portion of the guest that is not normally part of the qemu virtual address
> space. In fact this would already be required without LPAE in order to run
> a VM with 4GB guest physical addressing.
>
> There are probable (slow) ways of doing that, e.g. remap_file_pages or
> a new syscall for accessing high guest memory. It's not entirely clear
> to me how useful that is, the most sensible way to start here is certainly
> to start out with a 32-bit IPA as you suggested and see how badly that
> limits guests in real-world setups.

So this depends on what the use would be. True, if you wanted a guest
that used more than 4GB of memory AND you wanted QEMU to be able to
readily access all of that, then yes, it would be difficult on a
32-bit architecture.

But QEMU doesn't really use the mmap'ed areas backing physical memory
for anything - it's merely a way of telling KVM how much physical
memory should be given to the guest, and the kernel side conveniently
uses get_user_pages() to access that memory. Instead, QEMU could
simply call an IOCTL to KVM telling it something like
register_user_memory(long long base_phys_addr, long long size); and
KVM could just allocate physical pages to back that without them being
mapped on the host side. An individual page could be mapped in as
needed for emulation and mapped out again. I don't see a huge
performance hit for such a solution.

But as you both suggest, 32-bit physical address space is probably
going to be more than needed for initial uses of ARM virtual machines.

-Christoffer

2010-12-06 14:21:22

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [RFC PATCH 06/18] ARM: LPAE: Introduce the 3-level page table format definitions

On Monday 06 December 2010, Christoffer Dall wrote:
> Sorry for jumping in here at such a late hour...
> >
> >> > Are there any significant differences to Linux between setting up page
> >> > tables for a 32 bit VA space or a 40 bit IPA space, other than the
> >> > size of the PGD?
> >>
> >> ...
> >>
> >> If KVM would reuse the existing pgd/pmd/pte Linux macros, it would
> >> indeed be restricted to 32-bit IPA (sizeof(long)). You may need to
> >> define different macros to use either a pfn or long long as address
> >> input.
>
> I'm not even sure it would be a big advantage to re-use the macros for
> KVM. Sure, creating separate macros may duplicate some bit-shifting
> logic, but my guess is that code will be easier to read if using
> separate macros for the 2-nd stage translation in KVM. One might also
> imagine specific virtualization-oriented bits which could be
> explicitly names or directly targeted in macros that don't have to
> handle both standard non-virt tables and 2-nd stage translation
> tables.
>
> At least from my experience writing KVM code, it's difficult enough to
> make it clear to anyone reading the code which address space exactly
> is being referenced at which time.

Good point. My thoughts were that we basically treat KVM guests as
processes with 40 bit virtual address space though, so the kernel
would be using them directly.

> So this depends on what the use would be. True, if you wanted a guest
> that used more than 4GB of memory AND you wanted QEMU to be able to
> readily access all of that, then yes, it would be difficult on a
> 32-bit architecture.
>
> But QEMU doesn't really use the mmap'ed areas backing physical memory
> for anything - it's merely a way of telling KVM how much physical
> memory should be given to the guest, and the kernel side conveniently
> uses get_user_pages() to access that memory. Instead, QEMU could
> simply call an IOCTL to KVM telling it something like
> register_user_memory(long long base_phys_addr, long long size); and
> KVM could just allocate physical pages to back that without them being
> mapped on the host side. An individual page could be mapped in as
> needed for emulation and mapped out again. I don't see a huge
> performance hit for such a solution.

Ok.

> But as you both suggest, 32-bit physical address space is probably
> going to be more than needed for initial uses of ARM virtual machines.

Right. Note that as long as we keep the guest mapped into the qemu
address space, we're limited to something between 0.5 and 3 GB of
guest physical address space, but even that is likely enough for the
near future.

Arnd