LinuxLists.cc - [patch 0/4] [RFC] MMU Notifiers V1

2008-01-25 05:58:18

Subject: [patch 0/4] [RFC] MMU Notifiers V1

This is a patchset implementing MMU notifier callbacks based on Andrea's
earlier work. These are needed if Linux pages are referenced from something
else than tracked by the rmaps of the kernel.

To do:

- Make locking requirements for the callbacks consistent and
document them accurately.
- Insure that the callbacks are complete
- Feedback from uses of the callbacks for KVM, RDMA, XPmem and GRU

Andrea's mmu_notifier #4 -> RFC V1

- Merge subsystem rmap based with Linux rmap based approach
- Move Linux rmap based notifiers out of macro
- Try to account for what locks are held while the notifiers are
called.
- Develop a patch sequence that separates out the different types of
hooks so that it is easier to review their use.
- Avoid adding #include to linux/mm_types.h
- Integrate RCU logic suggested by Peter.

--

2008-01-25 11:42:49

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: [patch 0/4] [RFC] MMU Notifiers V1

On Thu, Jan 24, 2008 at 09:56:06PM -0800, Christoph Lameter wrote:
> Andrea's mmu_notifier #4 -> RFC V1
>
> - Merge subsystem rmap based with Linux rmap based approach
> - Move Linux rmap based notifiers out of macro
> - Try to account for what locks are held while the notifiers are
> called.
> - Develop a patch sequence that separates out the different types of
> hooks so that it is easier to review their use.
> - Avoid adding #include to linux/mm_types.h
> - Integrate RCU logic suggested by Peter.

I'm glad you're converging on something a bit saner and much much
closer to my code, plus perfectly usable by KVM optimal rmap design
too. It would have preferred if you would have sent me patches like
Peter did for review and merging etc... that would have made review
especially easier. Anyway I'm used to that on lkml so it's ok, I just
need this patch to be included in mainline, everything else is
irrelevant to me.

On a technical merit this still partially makes me sick and I think
it's the last issue to debate.

@@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int
else
ret = try_to_unmap_file(page, migration);

+ if (unlikely(PageExternalRmap(page)))
+ mmu_rmap_notifier(invalidate_page, page);
+
if (!page_mapped(page))
ret = SWAP_SUCCESS;
return ret;

I find the above hard to accept, because the moment you work with
physical pages and not "mm+address" I think you couldn't possibly care
if page_mapped is true or false, and I think the above notifier should
be called _outside_ try_to_unmap. Infact I'd call
mmu_rmap_notifier(invalidate_page, page); only if page_unmapped is
false and the linux pte is gone already (practically just before the
page_count == 2 check and after try_to_unmap).

I also think it's still worth to debate the rmap based on virtual or
physical index. By supporting both secondary-rmap designs at the same
time you seem to agree current KVM lightweight rmap implementation is
a superior design at least for KVM. But by insisting on your rmap
based on physical for your usage, you're implicitly telling us that is
a superior design for you. But we know very little of why you can't
exactly build rmap on virtual like KVM does! (especially now that you
implicitly admitted KVM rmap design is superior at least for KVM it'd
be interesting to know why you can't do the same exactly) You said
something on that, but I certainly don't have a clear picture of why
it can't work or why it would be less efficient.

Like you said by PM I'd also like comments from Hugh, Nick and others
about this issue.

Nevertheless I'm very glad we already fully converged on the
set_page_dirty, invalidate-page after ptep_clear_flush/young,
etc... and furthermore that you only made very minor modification to
my code to add a pair of hooks for the page-based rmap notifiers on
top of my patch. So from a functionality POV this is 100% workable
already from KVM side!

Thanks!

2008-01-25 12:43:50

by Robin Holt

[permalink] [raw]

Subject: Re: [patch 0/4] [RFC] MMU Notifiers V1

On Fri, Jan 25, 2008 at 12:42:29PM +0100, Andrea Arcangeli wrote:
> On a technical merit this still partially makes me sick and I think
> it's the last issue to debate.
>
> @@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int
> else
> ret = try_to_unmap_file(page, migration);
>
> + if (unlikely(PageExternalRmap(page)))
> + mmu_rmap_notifier(invalidate_page, page);
> +
> if (!page_mapped(page))
> ret = SWAP_SUCCESS;
> return ret;
>
> I find the above hard to accept, because the moment you work with
> physical pages and not "mm+address" I think you couldn't possibly care
> if page_mapped is true or false, and I think the above notifier should
> be called _outside_ try_to_unmap. Infact I'd call
> mmu_rmap_notifier(invalidate_page, page); only if page_unmapped is
> false and the linux pte is gone already (practically just before the
> page_count == 2 check and after try_to_unmap).

How does the called process sleep or how does it coordinate async work
with try_to_unmap? We need to sleep.

On a seperate note, I think the page flag needs to be set by the process
when it is acquiring the page for export. But since the same page could
be acquired by multiple export mechanisms, we should not clear it in the
exporting driver, but rather here after all exportors have been called
to invalidate_page.

That lead me to believe we should add a flag to get_user_pages() that
indicates this is an export with external rmap. We could then set the
page flag in get_user_pages.

Thanks,
Robin

2008-01-25 18:31:35

by Christoph Lameter

[permalink] [raw]

Subject: Re: [patch 0/4] [RFC] MMU Notifiers V1

On Fri, 25 Jan 2008, Andrea Arcangeli wrote:

> On a technical merit this still partially makes me sick and I think
> it's the last issue to debate.
>
> @@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int
> else
> ret = try_to_unmap_file(page, migration);
>
> + if (unlikely(PageExternalRmap(page)))
> + mmu_rmap_notifier(invalidate_page, page);
> +
> if (!page_mapped(page))
> ret = SWAP_SUCCESS;
> return ret;
>
> I find the above hard to accept, because the moment you work with
> physical pages and not "mm+address" I think you couldn't possibly care
> if page_mapped is true or false, and I think the above notifier should
> be called _outside_ try_to_unmap. Infact I'd call
> mmu_rmap_notifier(invalidate_page, page); only if page_unmapped is
> false and the linux pte is gone already (practically just before the
> page_count == 2 check and after try_to_unmap).

try_to_unmap is called from multiple places. The placement here
also covers f.e. page migration.

We also need to do this in the page_mkclean case because the permissions
on an external pte are restricted there. So we need a refault to update
the pte.

> I also think it's still worth to debate the rmap based on virtual or
> physical index. By supporting both secondary-rmap designs at the same
> time you seem to agree current KVM lightweight rmap implementation is
> a superior design at least for KVM. But by insisting on your rmap
> based on physical for your usage, you're implicitly telling us that is
> a superior design for you. But we know very little of why you can't

We actually need both version. We have hardware that has a driver without
rmap that does not sleep. On the other hand XPmem has rmap capability and
needs to sleep for its notifications.

> Nevertheless I'm very glad we already fully converged on the
> set_page_dirty, invalidate-page after ptep_clear_flush/young,
> etc... and furthermore that you only made very minor modification to
> my code to add a pair of hooks for the page-based rmap notifiers on
> top of my patch. So from a functionality POV this is 100% workable
> already from KVM side!

Well we still have to review this stuff more and I have a vague feeling
that not all the multiple hooks that came about because I took the
mmu_notifier(invalidate_page, ...) out of the macro need to be kept
because some of them are already covered by the range operations.

2008-01-25 21:19:59

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [patch 0/4] [RFC] MMU Notifiers V1

On Fri, 2008-01-25 at 12:42 +0100, Andrea Arcangeli wrote:
> On Thu, Jan 24, 2008 at 09:56:06PM -0800, Christoph Lameter wrote:
> > Andrea's mmu_notifier #4 -> RFC V1
> >
> > - Merge subsystem rmap based with Linux rmap based approach
> > - Move Linux rmap based notifiers out of macro
> > - Try to account for what locks are held while the notifiers are
> > called.
> > - Develop a patch sequence that separates out the different types of
> > hooks so that it is easier to review their use.
> > - Avoid adding #include to linux/mm_types.h
> > - Integrate RCU logic suggested by Peter.
>
> I'm glad you're converging on something a bit saner and much much
> closer to my code, plus perfectly usable by KVM optimal rmap design
> too. It would have preferred if you would have sent me patches like
> Peter did for review and merging etc... that would have made review
> especially easier. Anyway I'm used to that on lkml so it's ok, I just
> need this patch to be included in mainline, everything else is
> irrelevant to me.

Also, wouldn't there be a problem with something trying to use that
interface to keep in sync a secondary device MMU such as the DRM or
other accelerators, which might need virtual address based
invalidation ?

Ben.

2008-01-25 21:26:00

by Christoph Lameter

[permalink] [raw]

Subject: Re: [patch 0/4] [RFC] MMU Notifiers V1

On Sat, 26 Jan 2008, Benjamin Herrenschmidt wrote:

> Also, wouldn't there be a problem with something trying to use that
> interface to keep in sync a secondary device MMU such as the DRM or
> other accelerators, which might need virtual address based
> invalidation ?

Yes just doing the rmap based solution would have required DRM etc to
maintain their own rmaps. So it looks that we need to go with both
variants. Note that secondary device MMUs that need to run code outside of
atomic context may still need to create their own rmaps.

2008-01-28 16:39:58

by izik eidus

[permalink] [raw]

Subject: Re: [patch 0/4] [RFC] MMU Notifiers V1

Andrea Arcangeli wrote:
> On Thu, Jan 24, 2008 at 09:56:06PM -0800, Christoph Lameter wrote:
>
>> Andrea's mmu_notifier #4 -> RFC V1
>>
>> - Merge subsystem rmap based with Linux rmap based approach
>> - Move Linux rmap based notifiers out of macro
>> - Try to account for what locks are held while the notifiers are
>> called.
>> - Develop a patch sequence that separates out the different types of
>> hooks so that it is easier to review their use.
>> - Avoid adding #include to linux/mm_types.h
>> - Integrate RCU logic suggested by Peter.
>>
>
> I'm glad you're converging on something a bit saner and much much
> closer to my code, plus perfectly usable by KVM optimal rmap design
> too. It would have preferred if you would have sent me patches like
> Peter did for review and merging etc... that would have made review
> especially easier. Anyway I'm used to that on lkml so it's ok, I just
> need this patch to be included in mainline, everything else is
> irrelevant to me.
>
> On a technical merit this still partially makes me sick and I think
> it's the last issue to debate.
>
> @@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int
> else
> ret = try_to_unmap_file(page, migration);
>
> + if (unlikely(PageExternalRmap(page)))
> + mmu_rmap_notifier(invalidate_page, page);
> +
> if (!page_mapped(page))
> ret = SWAP_SUCCESS;
> return ret;
>
> I find the above hard to accept, because the moment you work with
> physical pages and not "mm+address" I think you couldn't possibly care
> if page_mapped is true or false, and I think the above notifier should
> be called _outside_ try_to_unmap. Infact I'd call
> mmu_rmap_notifier(invalidate_page, page); only if page_unmapped is
> false and the linux pte is gone already (practically just before the
> page_count == 2 check and after try_to_unmap).
>

i dont understand how is that better than notification on tlb flush?
i mean cpus have very smiler problem as we do,
so why not deal with the notification at the same place as they do (tlb
flush) ?

moreover notification on tlb flush allow unmodified applications to work
with us
for example the memory merging driver that i wrote was able to work with kvm
without any change to its source code.

2008-01-28 17:25:32

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: [patch 0/4] [RFC] MMU Notifiers V1

On Mon, Jan 28, 2008 at 06:10:39PM +0200, Izik Eidus wrote:
> i dont understand how is that better than notification on tlb flush?

I certainly agree. The quoted call wasn't actually the one that could
be moved in a single place in the .h file though. But the 4/4 patch
could be reduced to a few liner patch and it would not require any
change to ksm and mm/*.c internals that way etc...

If Christoph prefers to expand the two bits I added in the .h file,
all over the .c files that's ok with us. There is no good excuse for
that IMHO because a __ptep_clear_flush could be added to optimize away
the notifiers if it's followed by the invalidate_range (not the common
case), and if we want to just mark a spte readonly instead of
invalidating it we could implement new ptep_ function that would
invoke a new mmu notifier method. So there's full room for
optimizations without requiring the expansion.

In the end it's only a coding style issue but one that will become
visible in the VM (i.e. ksm has to be modified to add a mmu_notifier
call after ptep_clear_flush, it won't work automatically without
changes anymore).

So I'd like to know what can we do to help to merge the 4 patches from
Christoph in mainline, I'd appreciate comments on them so we can help
to address any outstanding issue!

It's very important to have this code in 2.6.25 final. KVM requires
mmu notifiers for reliable swapping, madvise/ballooning, ksm etc... so
it's high priority to get something merged to provide this
functionality regardless of its coding style ;).

Thanks!
Andrea

2008-01-28 19:04:55

by Christoph Lameter

[permalink] [raw]

Subject: Re: [patch 0/4] [RFC] MMU Notifiers V1

On Mon, 28 Jan 2008, Andrea Arcangeli wrote:

> So I'd like to know what can we do to help to merge the 4 patches from
> Christoph in mainline, I'd appreciate comments on them so we can help
> to address any outstanding issue!

There are still some pending issues (RCU troubles). I will post V2 today.

> It's very important to have this code in 2.6.25 final. KVM requires
> mmu notifiers for reliable swapping, madvise/ballooning, ksm etc... so
> it's high priority to get something merged to provide this
> functionality regardless of its coding style ;).

We also need this urgently.

2008-01-28 19:40:22

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: [patch 0/4] [RFC] MMU Notifiers V1

On Mon, Jan 28, 2008 at 11:04:43AM -0800, Christoph Lameter wrote:
> On Mon, 28 Jan 2008, Andrea Arcangeli wrote:
>
> > So I'd like to know what can we do to help to merge the 4 patches from
> > Christoph in mainline, I'd appreciate comments on them so we can help
> > to address any outstanding issue!
>
> There are still some pending issues (RCU troubles). I will post V2 today.

With regard to the synchronize_rcu troubles they also be left to the
notifier-user to solve. Certainly having the synchronize_rcu like in
your last incremental patches in _release, will require less
complications (kvm pins the mm so I suppose we could batch the
call_rcu externally too). But _release is not a fast-path for KVM
usage so your V2 is sure ok (and simpler to deal with) too.

For registration synchronize_rcu is the only way, if the notifiers
have to fire synchronously before mmu_notifier_register returns but
that also can be left up to the caller if required (for example KVM
doesn't need that). Otherwise there could be two mmu_notifier_register
and mmu_notifier_register_rcu where the latter calls synchronize_rcu
before returning.

2008-01-28 20:16:49

by Christoph Lameter

[permalink] [raw]

Subject: Re: [patch 0/4] [RFC] MMU Notifiers V1

On Mon, 28 Jan 2008, Andrea Arcangeli wrote:

> With regard to the synchronize_rcu troubles they also be left to the
> notifier-user to solve. Certainly having the synchronize_rcu like in

Ahh. Ok.