by Michael S. Tsirkin

[permalink] [raw]

Subject: Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

On Wed, Mar 09, 2016 at 08:04:39PM +0300, Roman Kagan wrote:
> On Wed, Mar 09, 2016 at 05:41:39PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Mar 09, 2016 at 05:28:54PM +0300, Roman Kagan wrote:
> > > For (1) I've been trying to make a point that skipping clean pages is
> > > much more likely to result in noticable benefit than free pages only.
> >
> > I guess when you say clean you mean zero?
>
> No I meant clean, i.e. those that could be evicted from RAM without
> causing I/O.

They must be migrated unless guest actually evicts them.
It's not at all clear to me that it's always preferable
to drop all clean pages from pagecache. It is clearly is
going to slow the guest down significantly.

> > Yea. In fact, one can zero out any number of pages
> > quickly by putting them in balloon and immediately
> > taking them out.
> >
> > Access will fault a zero page in, then COW kicks in.
>
> I must be missing something obvious, but how is that different from
> inflating and then immediately deflating the balloon?

It's exactly the same except
- we do not initiate this from host - it's guest doing
things for its own reasons
- a bit less guest/host interaction this way

> > We could have a new zero VQ (or some other option)
> > to pass these pages guest to host, but this only
> > works well if page size matches the host page size.
>
> I'm afraid I don't yet understand what kind of pages that would be and
> how they are different from ballooned pages.
>
> I still tend to think that ballooning is a sensible solution to the
> problem at hand;

I think it is, too. This does not mean we can't improve things though.
This patchset is reported to improve things, it should be
split up so we improve them for everyone and not just
one specific workload.

> it's just the granularity that makes things slow and
> stands in the way.

So we could request a specific page size/alignment from guest.
Send guest request to give us memory in aligned units of 2Mbytes,
and then host can treat each of these as a single huge page.

> Roman.
--
MST

2016-03-09 19:38:58

> > > > > > I'm just catching back up on this thread; so without
> > > > > > reference to any particular previous mail in the thread.
> > > > > >
> > > > > > 1) How many of the free pages do we tell the host about?
> > > > > > Your main change is telling the host about all the
> > > > > > free pages.
> > > > >
> > > > > Yes, all the guest's free pages.
> > > > >
> > > > > > If we tell the host about all the free pages, then we might
> > > > > > end up needing to allocate more pages and update the host
> > > > > > with pages we now want to use; that would have to wait for the
> > > > > > host to acknowledge that use of these pages, since if we don't
> > > > > > wait for it then it might have skipped migrating a page we
> > > > > > just started using (I don't understand how your series solves that).
> > > > > > So the guest probably needs to keep some free pages - how
> many?
> > > > >
> > > > > Actually, there is no need to care about whether the free pages
> > > > > will be
> > > used by the host.
> > > > > We only care about some of the free pages we get reused by the
> > > > > guest,
> > > right?
> > > > >
> > > > > The dirty page logging can be used to solve this, starting the
> > > > > dirty page logging before getting the free pages informant from guest.
> > > > > Even some of the free pages are modified by the guest during the
> > > > > process of getting the free pages information, these modified
> > > > > pages will
> > > be traced by the dirty page logging mechanism. So in the following
> > > migration_bitmap_sync() function.
> > > > > The pages in the free pages bitmap, but latter was modified,
> > > > > will be reset to dirty. We won't omit any dirtied pages.
> > > > >
> > > > > So, guest doesn't need to keep any free pages.
> > > >
> > > > OK, yes, that works; so we do:
> > > > * enable dirty logging
> > > > * ask guest for free pages
> > > > * initialise the migration bitmap as everything-free
> > > > * then later we do the normal sync-dirty bitmap stuff and it all just
> works.
> > > >
> > > > That's nice and simple.
> > >
> > > This works once, sure. But there's an issue is that you have to
> > > defer migration until you get the free page list, and this only
> > > works once. So you end up with heuristics about how long to wait.
> > >
> > > Instead I propose:
> > >
> > > - mark all pages dirty as we do now.
> > >
> > > - at start of migration, start tracking dirty
> > > pages in kvm, and tell guest to start tracking free pages
> > >
> > > we can now introduce any kind of delay, for example wait for ack
> > > from guest, or do whatever else, or even just start migrating pages
> > >
> > > - repeatedly:
> > > - get list of free pages from guest
> > > - clear them in migration bitmap
> > > - get dirty list from kvm
> > >
> > > - at end of migration, stop tracking writes in kvm,
> > > and tell guest to stop tracking free pages
> >
> > I had thought of filtering out the free pages in each migration bitmap
> synchronization.
> > The advantage is we can skip process as many free pages as possible. Not
> just once.
> > The disadvantage is that we should change the current memory
> > management code to track the free pages, instead of traversing the free
> page list to construct the free pages bitmap, to reduce the overhead to get
> the free pages bitmap.
> > I am not sure the if the Kernel people would like it.
> >
> > If keeping the traversing mechanism, because of the overhead, maybe it's
> not worth to filter out the free pages repeatedly.
>
> Well, Michael's idea of not waiting for the dirty bitmap to be filled does make
> that idea of constnatly using the free-bitmap better.
>

No wait is a good idea.
Actually, we could shorten the waiting time by pre allocating the free pages bit map
and update it when guest allocating/freeing pages. it requires to modify the mm
related code. I don't know whether the kernel people like this.

> In that case, is it easier if something (guest/host?) allocates some memory in
> the guests physical RAM space and just points the host to it, rather than
> having an explicit 'send'.
>

Good idea too.

Liang
> Dave