On Fri, 9 May 2008, Andrea Arcangeli wrote:
>
> At least for KVM without this patch it's impossible to swap guests
> reliably. And having this feature and removing the page pin allows
> several other optimizations that simplify life considerably.
Ok, this looks ok as far as I'm concerned. I did not look at any details,
so obviously other VM people need to ack the parts they care about, but at
least I think this one is fine from a "big picture".
I do have some small nits that are just about trivial stuff.
> 1) Introduces list_del_init_rcu and documents it (fixes a comment for
> list_del_rcu too)
I think this should go in separately, and be split up into a patch of its
own, just because it's really an independent area. So make it [1/3].
> 2) mm_take_all_locks() to register the mmu notifier when the whole VM
> isn't doing anything with "mm". This allows mmu notifier users to
> keep track if the VM is in the middle of the
> invalidate_range_begin/end critical section with an atomic counter
> incraese in range_begin and decreased in range_end.
Similarly, even without any users, I think this can be posted as an
independent patch, just for setting things up, and to make the whole thing
easier to look through and review. So make this [2/3].
But before doing that, can you split up the low-level single-vma anon/file
locking/unlocking, please?
In other words, your 'mm_take_all_locks()' rigth now looks like it _works_
correctly, but it nests too deeply considering the complexity of it.
There's really subtle things going on inside that for-loop, and I think it
would be much better to split those low-level locking rules out.
IOW, instead of:
> +int mm_take_all_locks(struct mm_struct *mm)
> +{
> + struct vm_area_struct *vma;
> + int ret = -EINTR;
> +
> + BUG_ON(down_read_trylock(&mm->mmap_sem));
> +
> + mutex_lock(&mm_all_locks_mutex);
> +
> + for (vma = mm->mmap; vma; vma = vma->vm_next) {
> + struct file *filp;
> + if (signal_pending(current))
> + goto out_unlock;
> + if (vma->anon_vma && !test_bit(0, (unsigned long *)
> + &vma->anon_vma->head.next)) {
> + /*
> + * The LSB of head.next can't change from
> + * under us because we hold the
> + * global_mm_spinlock.
> + */
> + spin_lock(&vma->anon_vma->lock);
...
ie, can you please make it be
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (signal_pending(current))
goto out_unlock;
if (vma->anon_vma)
vm_lock_anon_vma(vma->anon_vma);
if (vma->vm_file && vma->vm_file->f_mapping)
vm_lock_mapping(vma->vm_file->f_mapping);
}
and the same thing for unlocking.. Doesn't that look more obvious and
easier to understand from a high-level standpoing (and then the individual
locking rules for mappings/anon_vma's will also be more obvious, just
because they are separated from the higher-level code).
The comments are fine, but even with the comments I'd prefer you to write
the code so that you don't need to break up the conditionals over multiple
lines etc.
Anyway - I didn't look very much at the actual _notifier_ stuff (ie the
thing that I think should be [patch 3/3]), so I don't have any real
comments about that part - but I don't really care either. Becasue as long
as it doesn't mess up the core VM logic, I no longer have any real
objections.
I'd obviously want to see ack's by people like Andrew, Hugh and Nick, but
as far as I am concerned, if you just do the trivial cleanup/split, you
can add an "Acked-by: Linus Torvalds <[email protected]>" to
at least the two first patches of the split-up series.
Linus
On Tue, Jun 03, 2008 at 09:26:05AM -0700, Linus Torvalds wrote:
> I'd obviously want to see ack's by people like Andrew, Hugh and Nick, but
> as far as I am concerned, if you just do the trivial cleanup/split, you
I'll do all cleanups/splits very shortly and submit a v18 with 3
patches, all good, thanks a lot!