2008-01-28 20:31:33

by Christoph Lameter

[permalink] [raw]
Subject: [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps

These notifiers here use the Linux rmaps to perform the callbacks.
In order to walk the rmaps locks must be held. Callbacks can therefore
only operate in an atomic context.

Signed-off-by: Christoph Lameter <[email protected]>

---
mm/rmap.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c 2008-01-25 14:27:01.000000000 -0800
+++ linux-2.6/mm/rmap.c 2008-01-25 14:27:04.000000000 -0800
@@ -288,6 +288,9 @@ static int page_referenced_one(struct pa
if (ptep_clear_flush_young(vma, address, pte))
referenced++;

+ if (mmu_notifier_age_page(mm, address))
+ referenced++;
+
/* Pretend the page is referenced if the task has the
swap token and is in the middle of a page fault. */
if (mm != current->mm && has_swap_token(mm) &&
@@ -435,6 +438,7 @@ static int page_mkclean_one(struct page

flush_cache_page(vma, address, pte_pfn(*pte));
entry = ptep_clear_flush(vma, address, pte);
+ mmu_notifier(invalidate_page, mm, address);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
set_pte_at(mm, address, pte, entry);
@@ -680,7 +684,8 @@ static int try_to_unmap_one(struct page
* skipped over this mm) then we should reactivate it.
*/
if (!migration && ((vma->vm_flags & VM_LOCKED) ||
- (ptep_clear_flush_young(vma, address, pte)))) {
+ (ptep_clear_flush_young(vma, address, pte) ||
+ mmu_notifier_age_page(mm, address)))) {
ret = SWAP_FAIL;
goto out_unmap;
}
@@ -688,6 +693,7 @@ static int try_to_unmap_one(struct page
/* Nuke the page table entry. */
flush_cache_page(vma, address, page_to_pfn(page));
pteval = ptep_clear_flush(vma, address, pte);
+ mmu_notifier(invalidate_page, mm, address);

/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
@@ -815,9 +821,13 @@ static void try_to_unmap_cluster(unsigne
if (ptep_clear_flush_young(vma, address, pte))
continue;

+ if (mmu_notifier_age_page(mm, address))
+ continue;
+
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
pteval = ptep_clear_flush(vma, address, pte);
+ mmu_notifier(invalidate_page, mm, address);

/* If nonlinear, store the file page offset in the pte. */
if (page->index != linear_page_index(vma, address))

--


2008-01-29 14:03:58

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps

On Mon, Jan 28, 2008 at 12:28:44PM -0800, Christoph Lameter wrote:
> if (!migration && ((vma->vm_flags & VM_LOCKED) ||
> - (ptep_clear_flush_young(vma, address, pte)))) {
> + (ptep_clear_flush_young(vma, address, pte) ||
> + mmu_notifier_age_page(mm, address)))) {

here an example of how inferior and error prone it is to have
mmu_notifier_age_page and invalidate_page outside of pgtable.h, you
just managed to break again with the above || go figure. The
mmu_notifier_age_page has to be called unconditionally regardless of
ptep_clear_flush_young return value, we want to give only one
additional LRU scan to the referenced pages, not more than that or the
KVM guest pages will get tons more priority than the regular linux
anonymous memory.

> ret = SWAP_FAIL;
> goto out_unmap;
> }
> @@ -688,6 +693,7 @@ static int try_to_unmap_one(struct page
> /* Nuke the page table entry. */
> flush_cache_page(vma, address, page_to_pfn(page));
> pteval = ptep_clear_flush(vma, address, pte);
> + mmu_notifier(invalidate_page, mm, address);
>
> /* Move the dirty bit to the physical page now the pte is gone. */
> if (pte_dirty(pteval))
> @@ -815,9 +821,13 @@ static void try_to_unmap_cluster(unsigne
> if (ptep_clear_flush_young(vma, address, pte))
> continue;
>
> + if (mmu_notifier_age_page(mm, address))
> + continue;
> +

Here the same exact aging regression compared to my code.

2008-01-29 14:25:39

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps

This should fix the aging bugs you introduced through the faulty cpp
expansion. This is hard to write for me, given any time somebody does
a ptep_clear_flush_young w/o manually cpp-expandin "|
mmu_notifier_age_page" after it, it's always a bug that needs fixing,
similar bugs can emerge with time for ptep_clear_flush too. What will
happen is that somebody will cleanup in 26+ and we'll remain with a
#ifdef KERNEL_VERSION() < 2.6.26 in ksm.c to call
mmu_notifier(invalidate_page) explicitly. Performance and
optimizations or unnecessary invalidate_page are a red-herring, it can
be fully optimized both ways. 99% of the time when somebody calls
ptep_clear_flush and ptep_clear_flush_young, the respective mmu
notifier can't be forgotten (and calling them once more even if a
later invalidate_range is invoked, is always safer and preferable than
not calling them at all) so I fail to see how this will not be cleaned
up eventually, the same way the tlb flushes have been cleaned up
already. Nevertheless I back your implementation and I'm not even
trying at changing it with the risk to slowdown merging.

Signed-off-by: Andrea Arcangeli <[email protected]>

diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -285,10 +285,8 @@ static int page_referenced_one(struct pa
if (!pte)
goto out;

- if (ptep_clear_flush_young(vma, address, pte))
- referenced++;
-
- if (mmu_notifier_age_page(mm, address))
+ if (ptep_clear_flush_young(vma, address, pte) |
+ mmu_notifier_age_page(mm, address))
referenced++;

/* Pretend the page is referenced if the task has the
@@ -684,7 +682,7 @@ static int try_to_unmap_one(struct page
* skipped over this mm) then we should reactivate it.
*/
if (!migration && ((vma->vm_flags & VM_LOCKED) ||
- (ptep_clear_flush_young(vma, address, pte) ||
+ (ptep_clear_flush_young(vma, address, pte) |
mmu_notifier_age_page(mm, address)))) {
ret = SWAP_FAIL;
goto out_unmap;
@@ -818,10 +816,8 @@ static void try_to_unmap_cluster(unsigne
page = vm_normal_page(vma, address, *pte);
BUG_ON(!page || PageAnon(page));

- if (ptep_clear_flush_young(vma, address, pte))
- continue;
-
- if (mmu_notifier_age_page(mm, address))
+ if (ptep_clear_flush_young(vma, address, pte) |
+ mmu_notifier_age_page(mm, address))
continue;

/* Nuke the page table entry. */