From: Arnd Bergmann <arnd@arndb.de>
To: Avi Kivity <avi@qumranet.com>
Subject: Re: [PATCH 6/13] KVM: memory slot management
Date: Fri, 27 Oct 2006 09:37:11 +0200
User-Agent: KMail/1.9.5
Cc: linux-kernel@vger.kernel.org, kvm-devel@lists.sourceforge.net
References: <4540EE2B.9020606@qumranet.com> <200610270044.31382.arnd@arndb.de> <45419D73.1070106@qumranet.com>
In-Reply-To: <45419D73.1070106@qumranet.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200610270937.11646.arnd@arndb.de>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3792
Lines: 90

On Friday 27 October 2006 07:47, Avi Kivity wrote:
> Arnd Bergmann wrote:
> > - no need to preallocate memory that the guest doesn't actually use.
> >   
> 
> Well, a fully vitrualized guest will likely use all the memory it gets.  
> Linux certainly will.

Only if it does lots of disk accesses that load stuff into
page/inode/dentry cache. Single-application guests don't necessarily
do that.

> > - guest memory can be paged to disk.
> > - you can mmap files into multiple guest for fast communication
> > - you can use mmap host files as backing store for guest blockdevices,
> >   including ext2 with the -o xip mount option to avoid double paging
> >   
> 
> What do you mean exactly? to respond to a block device read by mmap()ing 
> the backing file into the pages the host requested?
> 
> (e.g. turn a host bio read into a guest mmap)

The idea would be to mmap the file into the guest real address space.
With -o xip, the page cache for the virtual device would basically
reside in that high address range.

Guest users reading/writing files on it cause a memcopy between guest
user space and the host file mapping, done by the guest file system
implementation.

The interesting point here is how to handle a host page fault on the
file mapping. The solution on z/VM for this is to generate a special
exception for this that will be caught by the guest kernel, telling
it to wait until the page is there. The guest kernel can then put the
current thread to sleep and do something else, until a second exception
tells it that the page has been loaded by the host. The guest then
wakes up the sleeping thread.

This can work the same way for host file backed (guest block device)
and host anonymous (guest RAM) memory.

> If we allow the pages to be writable, the guest could write into the 
> virtual block device just by modifying a read page (which might have be 
> discarded and no longer related to the block device)

In your virtual mmu (or nested page table), you need to make sure that
the page is mapped with the intersection of the guest vm_prot and host
vm_prot into guest users.

> 2. The next mmu implementation, which caches guest translations.
> 
> The potential problem above now becomes acute.  The guest will have 
> kernel mappings for every page, and after a short while they'll all be 
> faulted in and locked.  This defeats the swap integration which is IMO a 
> very strong point.
> 
> We can work around that by periodically forcing out translations (some 
> kind of clock algorithm) at some rate so the host vm can have a go at 
> them.  That can turn out to be expensive as we'll need to interrupt all 
> running vcpus to flush (real) tlb entries.

Don't understand. Can't one CPU cause a TLB entry to be flushed on all
CPUs?

>   a.  we need the guest physical memory to start at virtual offset 0 
> (can probably be achieved by dynamic linker tricks)

Yes, dosemu does the same for vm86.

>   b.  we need to hide the userspace portion of the monitor from the 
> guest physical address space

That depends on your trust model. You could simply say that you expect
the guest real mode to have the same privileges as the host application
(your monitor), and not care if a guest can shoot itself in the foot
by overwriting the monitor.

>   c.  we need to extend host tlb invalidations to invalidate tlbs on guests

I don't understand much about the x86 specific memory management,
but shouldn't a TLB invalidate of a given page do the right thing
on all CPUs, even if they are currently running a guest?

	Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/