LinuxLists.cc - Could we do immediate pte zaps in vunmap?

2010-11-26 08:10:53

Subject: Could we do immediate pte zaps in vunmap?

What if vm_unmap_ram() and co. immediately zeroed out the ptes, but
lazily deferred the tlb flushes? It seems to me there's no benefit in
batching up the pte clearing since that can't be amortized like the tlb
flush.

I think that would solve the problem we have with the interactions
between lazy unmap and Xen. The issue is having stray pte entries
around (because Xen keeps track of those as part of its page-type
mechanism), but stale tlb entries are no problem.

Thanks,
J

2010-11-27 10:37:00

by Nick Piggin

[permalink] [raw]

Subject: Re: Could we do immediate pte zaps in vunmap?

On Fri, Nov 26, 2010 at 12:10:51AM -0800, Jeremy Fitzhardinge wrote:
> What if vm_unmap_ram() and co. immediately zeroed out the ptes, but
> lazily deferred the tlb flushes? It seems to me there's no benefit in
> batching up the pte clearing since that can't be amortized like the tlb
> flush.

Yes that can be done.

> I think that would solve the problem we have with the interactions
> between lazy unmap and Xen. The issue is having stray pte entries
> around (because Xen keeps track of those as part of its page-type
> mechanism), but stale tlb entries are no problem.

It would be great if that solves it.

2010-11-29 20:32:16

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap

When unmapping a region in the vmalloc space, clear the ptes immediately.
There's no point in deferring this because there's no amortization
benefit.

The TLBs are left dirty, and they are flushed lazily to amortize the
cost of the IPIs.

This specific motivation for this patch is a regression since 2.6.36 when
using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee. XFS also uses
vm_map_ram() and could cause similar problems.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Cc: Nick Piggin <[email protected]>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a3d66b3..9960644 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -566,7 +566,6 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
if (va->va_end > *end)
*end = va->va_end;
nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
- unmap_vmap_area(va);
list_add_tail(&va->purge_list, &valist);
va->flags |= VM_LAZY_FREEING;
va->flags &= ~VM_LAZY_FREE;
@@ -616,6 +615,8 @@ static void purge_vmap_area_lazy(void)
*/
static void free_unmap_vmap_area_noflush(struct vmap_area *va)
{
+ unmap_vmap_area(va);
+
va->flags |= VM_LAZY_FREE;
atomic_add((va->va_end - va->va_start) >> PAGE_SHIFT, &vmap_lazy_nr);
if (unlikely(atomic_read(&vmap_lazy_nr) > lazy_max_pages()))
@@ -944,8 +945,10 @@ static void vb_free(const void *addr, unsigned long size)
BUG_ON(vb->free);
spin_unlock(&vb->lock);
free_vmap_block(vb);
- } else
+ } else {
spin_unlock(&vb->lock);
+ vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
+ }
}

/**
@@ -988,7 +991,6 @@ void vm_unmap_aliases(void)

s = vb->va->va_start + (i << PAGE_SHIFT);
e = vb->va->va_start + (j << PAGE_SHIFT);
- vunmap_page_range(s, e);
flush = 1;

if (s < start)

2010-11-30 12:42:56

by Nick Piggin

[permalink] [raw]

Subject: Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap

On Mon, Nov 29, 2010 at 12:32:11PM -0800, Jeremy Fitzhardinge wrote:
> When unmapping a region in the vmalloc space, clear the ptes immediately.
> There's no point in deferring this because there's no amortization
> benefit.
>
> The TLBs are left dirty, and they are flushed lazily to amortize the
> cost of the IPIs.
>
> This specific motivation for this patch is a regression since 2.6.36 when
> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee. XFS also uses
> vm_map_ram() and could cause similar problems.

I do wonder whether there are cache benefits from batching page table
updates, especially the batched per cpu maps (and in your version they
get double-cleared as well). I think this patch is good, but I think
perhaps making it configurable would be nice.

So... main question, does it allow Xen to use lazy flushing and avoid
vm_unmap_aliases() calls?

>
> Signed-off-by: Jeremy Fitzhardinge <[email protected]>
> Cc: Nick Piggin <[email protected]>
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index a3d66b3..9960644 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -566,7 +566,6 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
> if (va->va_end > *end)
> *end = va->va_end;
> nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
> - unmap_vmap_area(va);
> list_add_tail(&va->purge_list, &valist);
> va->flags |= VM_LAZY_FREEING;
> va->flags &= ~VM_LAZY_FREE;
> @@ -616,6 +615,8 @@ static void purge_vmap_area_lazy(void)
> */
> static void free_unmap_vmap_area_noflush(struct vmap_area *va)
> {
> + unmap_vmap_area(va);
> +
> va->flags |= VM_LAZY_FREE;
> atomic_add((va->va_end - va->va_start) >> PAGE_SHIFT, &vmap_lazy_nr);
> if (unlikely(atomic_read(&vmap_lazy_nr) > lazy_max_pages()))
> @@ -944,8 +945,10 @@ static void vb_free(const void *addr, unsigned long size)
> BUG_ON(vb->free);
> spin_unlock(&vb->lock);
> free_vmap_block(vb);
> - } else
> + } else {
> spin_unlock(&vb->lock);
> + vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
> + }
> }
>
> /**
> @@ -988,7 +991,6 @@ void vm_unmap_aliases(void)
>
> s = vb->va->va_start + (i << PAGE_SHIFT);
> e = vb->va->va_start + (j << PAGE_SHIFT);
> - vunmap_page_range(s, e);
> flush = 1;
>
> if (s < start)
>

2010-11-30 17:45:52

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap

On 11/30/2010 04:42 AM, Nick Piggin wrote:
> On Mon, Nov 29, 2010 at 12:32:11PM -0800, Jeremy Fitzhardinge wrote:
>> When unmapping a region in the vmalloc space, clear the ptes immediately.
>> There's no point in deferring this because there's no amortization
>> benefit.
>>
>> The TLBs are left dirty, and they are flushed lazily to amortize the
>> cost of the IPIs.
>>
>> This specific motivation for this patch is a regression since 2.6.36 when
>> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
>> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee. XFS also uses
>> vm_map_ram() and could cause similar problems.
> I do wonder whether there are cache benefits from batching page table
> updates, especially the batched per cpu maps

Perhaps. But perhaps there are cache benefits in clearing early because
the ptes are still in cache from when they were set?
> (and in your version they
> get double-cleared as well).

I thought I'd avoided that. Oh, right, in both vb_free(), and again -
eventually - in free_vmap_block->free_unmap_vmap_area_noflush.

Delta patch below.

> I think this patch is good, but I think
> perhaps making it configurable would be nice.

I'd rather not unless there's a strong reason to do so.

It occurs to me that once you remove the lazily mapped ptes, then all
that code is doing is keeping track of ranges of addresses with dirty
tlb entries. But on x86 at least, any kernel tlb flush is a global one,
so keeping track of fine-grain address information is overkill. I
wonder if the overall code can be simplified as a result?

On a more concrete level, vmap_page_range_noflush() and
vunmap_page_range() could be implemented with apply_to_page_range()
which removes a chunk of boilerplate code (however, it would result in a
callback per pte rather than one per pte page - but I'll fix that now).

> So... main question, does it allow Xen to use lazy flushing and avoid
> vm_unmap_aliases() calls?

Yes, it seems to.

Thanks,
J

Subject: [PATCH] vmalloc: avoid double-unmapping percpu blocks

The area has always been unmapped by the time free_vmap_block() is
called, so there's no need to unmap it again.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9551316..ade3302 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -520,13 +520,12 @@ static void purge_vmap_area_lazy(void)
}

/*
- * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
- * called for the correct range previously.
+ * Free a vmap area, caller ensuring that the area has been unmapped
+ * and flush_cache_vunmap had been called for the correct range
+ * previously.
*/
-static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+static void free_vmap_area_noflush(struct vmap_area *va)
{
- unmap_vmap_area(va);
-
va->flags |= VM_LAZY_FREE;
atomic_add((va->va_end - va->va_start) >> PAGE_SHIFT, &vmap_lazy_nr);
if (unlikely(atomic_read(&vmap_lazy_nr) > lazy_max_pages()))
@@ -534,6 +533,16 @@ static void free_unmap_vmap_area_noflush(struct vmap_area *va)
}

/*
+ * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
+ * called for the correct range previously.
+ */
+static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+{
+ unmap_vmap_area(va);
+ free_vmap_area_noflush(va);
+}
+
+/*
* Free and unmap a vmap area
*/
static void free_unmap_vmap_area(struct vmap_area *va)
@@ -709,7 +718,7 @@ static void free_vmap_block(struct vmap_block *vb)
spin_unlock(&vmap_block_tree_lock);
BUG_ON(tmp != vb);

- free_unmap_vmap_area_noflush(vb->va);
+ free_vmap_area_noflush(vb->va);
call_rcu(&vb->rcu_head, rcu_free_vb);
}

2010-12-01 00:30:36

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap

On Mon, 29 Nov 2010 12:32:11 -0800
Jeremy Fitzhardinge <[email protected]> wrote:

> When unmapping a region in the vmalloc space, clear the ptes immediately.
> There's no point in deferring this because there's no amortization
> benefit.
>
> The TLBs are left dirty, and they are flushed lazily to amortize the
> cost of the IPIs.
>
> This specific motivation for this patch is a regression since 2.6.36 when
> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee. XFS also uses
> vm_map_ram() and could cause similar problems.
>

Do we have any quantitative info on that regression? The patch fixed
it, I assume?

2010-12-01 03:09:47

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap

On 11/30/2010 04:29 PM, Andrew Morton wrote:
> On Mon, 29 Nov 2010 12:32:11 -0800
> Jeremy Fitzhardinge <[email protected]> wrote:
>
>> When unmapping a region in the vmalloc space, clear the ptes immediately.
>> There's no point in deferring this because there's no amortization
>> benefit.
>>
>> The TLBs are left dirty, and they are flushed lazily to amortize the
>> cost of the IPIs.
>>
>> This specific motivation for this patch is a regression since 2.6.36 when
>> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
>> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee. XFS also uses
>> vm_map_ram() and could cause similar problems.
>>
> Do we have any quantitative info on that regression?

It's pretty easy to reproduce - you get oopses very quickly while using
NFS. I haven't got any lying around right now, but I could easily
generate one if you want to decorate the changelog a bit.

> The patch fixed
> it, I assume?

Yes, the patch fixes it, and I think it is just luck that xfs doesn't
also trigger the same problem. Here's a followup patch to disable the
previous hack.

J

Subject: [PATCH] vmalloc: remove vmap_lazy_unmap flag

Now that vmunmap no longer leaves stray ptes lying around, we don't need
the vmap_lazy_unmap flag any more.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 21ed8d7..0e4ecac 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2358,8 +2358,6 @@ void __init xen_init_mmu_ops(void)
x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
pv_mmu_ops = xen_mmu_ops;

- vmap_lazy_unmap = false;
-
memset(dummy_mapping, 0xff, PAGE_SIZE);
}

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index a03dcf6..44b54f6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -7,8 +7,6 @@

struct vm_area_struct; /* vma defining user mapping in mm_types.h */

-extern bool vmap_lazy_unmap;
-
/* bits in flags of vmalloc's vm_struct below */
#define VM_IOREMAP 0x00000001 /* ioremap() and friends */
#define VM_ALLOC 0x00000002 /* vmalloc() */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ffefe70..828d95e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,8 +31,6 @@
#include <asm/tlbflush.h>
#include <asm/shmparam.h>

-bool vmap_lazy_unmap __read_mostly = true;
-
/*** Page table manipulation functions ***/

static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -503,9 +501,6 @@ static unsigned long lazy_max_pages(void)
{
unsigned int log;

- if (!vmap_lazy_unmap)
- return 0;
-
log = fls(num_online_cpus());

return log * (32UL * 1024 * 1024 / PAGE_SIZE);

2010-12-01 03:24:28

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap

On Tue, 30 Nov 2010 19:09:43 -0800 Jeremy Fitzhardinge <[email protected]> wrote:

> On 11/30/2010 04:29 PM, Andrew Morton wrote:
> > On Mon, 29 Nov 2010 12:32:11 -0800
> > Jeremy Fitzhardinge <[email protected]> wrote:
> >
> >> When unmapping a region in the vmalloc space, clear the ptes immediately.
> >> There's no point in deferring this because there's no amortization
> >> benefit.
> >>
> >> The TLBs are left dirty, and they are flushed lazily to amortize the
> >> cost of the IPIs.
> >>
> >> This specific motivation for this patch is a regression since 2.6.36 when
> >> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
> >> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee. XFS also uses
> >> vm_map_ram() and could cause similar problems.
> >>
> > Do we have any quantitative info on that regression?
>
> It's pretty easy to reproduce - you get oopses very quickly while using
> NFS.

Bah. I'd assumed that it was a performance regression and had vaguely
queued it for 2.6.37.

> I haven't got any lying around right now, but I could easily
> generate one if you want to decorate the changelog a bit.

You owe me that much ;)

Here's the current rollup of the three patches. Please check.

From: Jeremy Fitzhardinge <[email protected]>

When unmapping a region in the vmalloc space, clear the ptes immediately.
There's no point in deferring this because there's no amortization
benefit.

The TLBs are left dirty, and they are flushed lazily to amortize the cost
of the IPIs.

This specific motivation for this patch is an oops-causing regression
since 2.6.36 when using NFS under Xen, triggered by the NFS client's use
of vm_map_ram() introduced in 56e4ebf877b60 ("NFS: readdir with vmapped
pages") . XFS also uses vm_map_ram() and could cause similar problems.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Bryan Schumaker <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Alex Elder <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

arch/x86/xen/mmu.c | 2 --
include/linux/vmalloc.h | 2 --
mm/vmalloc.c | 30 ++++++++++++++++++------------
3 files changed, 18 insertions(+), 16 deletions(-)

diff -puN mm/vmalloc.c~vmalloc-eagerly-clear-ptes-on-vunmap mm/vmalloc.c
--- a/mm/vmalloc.c~vmalloc-eagerly-clear-ptes-on-vunmap
+++ a/mm/vmalloc.c
@@ -31,8 +31,6 @@
#include <asm/tlbflush.h>
#include <asm/shmparam.h>

-bool vmap_lazy_unmap __read_mostly = true;
-
/*** Page table manipulation functions ***/

static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -503,9 +501,6 @@ static unsigned long lazy_max_pages(void
{
unsigned int log;

- if (!vmap_lazy_unmap)
- return 0;
-
log = fls(num_online_cpus());

return log * (32UL * 1024 * 1024 / PAGE_SIZE);
@@ -566,7 +561,6 @@ static void __purge_vmap_area_lazy(unsig
if (va->va_end > *end)
*end = va->va_end;
nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
- unmap_vmap_area(va);
list_add_tail(&va->purge_list, &valist);
va->flags |= VM_LAZY_FREEING;
va->flags &= ~VM_LAZY_FREE;
@@ -611,10 +605,11 @@ static void purge_vmap_area_lazy(void)
}

/*
- * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
- * called for the correct range previously.
+ * Free a vmap area, caller ensuring that the area has been unmapped
+ * and flush_cache_vunmap had been called for the correct range
+ * previously.
*/
-static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+static void free_vmap_area_noflush(struct vmap_area *va)
{
va->flags |= VM_LAZY_FREE;
atomic_add((va->va_end - va->va_start) >> PAGE_SHIFT, &vmap_lazy_nr);
@@ -623,6 +618,16 @@ static void free_unmap_vmap_area_noflush
}

/*
+ * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
+ * called for the correct range previously.
+ */
+static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+{
+ unmap_vmap_area(va);
+ free_vmap_area_noflush(va);
+}
+
+/*
* Free and unmap a vmap area
*/
static void free_unmap_vmap_area(struct vmap_area *va)
@@ -798,7 +803,7 @@ static void free_vmap_block(struct vmap_
spin_unlock(&vmap_block_tree_lock);
BUG_ON(tmp != vb);

- free_unmap_vmap_area_noflush(vb->va);
+ free_vmap_area_noflush(vb->va);
call_rcu(&vb->rcu_head, rcu_free_vb);
}

@@ -944,8 +949,10 @@ static void vb_free(const void *addr, un
BUG_ON(vb->free);
spin_unlock(&vb->lock);
free_vmap_block(vb);
- } else
+ } else {
spin_unlock(&vb->lock);
+ vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
+ }
}

/**
@@ -988,7 +995,6 @@ void vm_unmap_aliases(void)

s = vb->va->va_start + (i << PAGE_SHIFT);
e = vb->va->va_start + (j << PAGE_SHIFT);
- vunmap_page_range(s, e);
flush = 1;

if (s < start)
diff -puN arch/x86/xen/mmu.c~vmalloc-eagerly-clear-ptes-on-vunmap arch/x86/xen/mmu.c
--- a/arch/x86/xen/mmu.c~vmalloc-eagerly-clear-ptes-on-vunmap
+++ a/arch/x86/xen/mmu.c
@@ -2415,8 +2415,6 @@ void __init xen_init_mmu_ops(void)
x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
pv_mmu_ops = xen_mmu_ops;

- vmap_lazy_unmap = false;
-
memset(dummy_mapping, 0xff, PAGE_SIZE);
}

diff -puN include/linux/vmalloc.h~vmalloc-eagerly-clear-ptes-on-vunmap include/linux/vmalloc.h
--- a/include/linux/vmalloc.h~vmalloc-eagerly-clear-ptes-on-vunmap
+++ a/include/linux/vmalloc.h
@@ -7,8 +7,6 @@

struct vm_area_struct; /* vma defining user mapping in mm_types.h */

-extern bool vmap_lazy_unmap;
-
/* bits in flags of vmalloc's vm_struct below */
#define VM_IOREMAP 0x00000001 /* ioremap() and friends */
#define VM_ALLOC 0x00000002 /* vmalloc() */
_

2010-12-01 08:16:43

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: [PATCH RFC] vmalloc: eagerly clear ptes on vunmap

On 11/30/2010 07:23 PM, Andrew Morton wrote:
> On Tue, 30 Nov 2010 19:09:43 -0800 Jeremy Fitzhardinge <[email protected]> wrote:
>
>> On 11/30/2010 04:29 PM, Andrew Morton wrote:
>>> On Mon, 29 Nov 2010 12:32:11 -0800
>>> Jeremy Fitzhardinge <[email protected]> wrote:
>>>
>>>> When unmapping a region in the vmalloc space, clear the ptes immediately.
>>>> There's no point in deferring this because there's no amortization
>>>> benefit.
>>>>
>>>> The TLBs are left dirty, and they are flushed lazily to amortize the
>>>> cost of the IPIs.
>>>>
>>>> This specific motivation for this patch is a regression since 2.6.36 when
>>>> using NFS under Xen, triggered by the NFS client's use of vm_map_ram()
>>>> introduced in 56e4ebf877b6043c289bda32a5a7385b80c17dee. XFS also uses
>>>> vm_map_ram() and could cause similar problems.
>>>>
>>> Do we have any quantitative info on that regression?
>> It's pretty easy to reproduce - you get oopses very quickly while using
>> NFS.
> Bah. I'd assumed that it was a performance regression and had vaguely
> queued it for 2.6.37.

I have a bunch of other cleanups for mm/vmalloc.c which I'll propose for
.37.

>> I haven't got any lying around right now, but I could easily
>> generate one if you want to decorate the changelog a bit.
> You owe me that much ;)

On stock 2.6.37-rc4, running:

# mount lilith:/export /mnt/lilith
# find /mnt/lilith/ -type f -print0 | xargs -0 file

crashes the machine fairly quickly under Xen. Often it results in oops
messages, but the couple of times I tried just now, it just hung quietly
and made Xen print some rude messages:

(XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp
3000000000000000) for mfn 1d7058 (pfn 18fa7)
(XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms
(XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp
1000000000000000) for mfn 1d2e04 (pfn 1d1fb)
(XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04

Which means the domain tried to map a pagetable page RW, which would
allow it to map arbitrary memory, so Xen stopped it. This is because
vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had
finished with them, and those pages got recycled as pagetable pages
while still having these RW aliases.

Removing those mappings immediately removes the Xen-visible aliases, and
so it has no problem with those pages being reused as pagetable pages.
Deferring the TLB flush doesn't upset Xen because it can flush the TLB
itself as needed to maintain its invariants.

> Here's the current rollup of the three patches. Please check.

It looks like "vmalloc: avoid double-unmapping percpu blocks" wasn't
complete. Fix below.

Thanks,
J

From: Jeremy Fitzhardinge <[email protected]>
Date: Tue, 30 Nov 2010 23:53:28 -0800
Subject: [PATCH] vmalloc: always unmap in vb_free()

free_vmap_block() doesn't unmap anything, so just unconditionally unmap
the region.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ffefe70..a582491 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -946,6 +946,8 @@ static void vb_free(const void *addr, unsigned long size)
rcu_read_unlock();
BUG_ON(!vb);

+ vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
+
spin_lock(&vb->lock);
BUG_ON(bitmap_allocate_region(vb->dirty_map, offset >> PAGE_SHIFT, order));

@@ -954,10 +956,8 @@ static void vb_free(const void *addr, unsigned long size)
BUG_ON(vb->free);
spin_unlock(&vb->lock);
free_vmap_block(vb);
- } else {
+ } else
spin_unlock(&vb->lock);
- vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
- }
}

/**