In rmap.c::try_to_unmap_one of 2.6.16.29, there are some code snippets
.....
/* Nuke the page table entry. */
flush_cache_page(vma, address, page_to_pfn(page));
pteval = ptep_clear_flush(vma, address, pte);
// >>> The above line is expanded as below
// >>> pte_t __pte;
// >>> __pte = ptep_get_and_clear((__vma)->vm_mm, __address, __ptep);
// >>> flush_tlb_page(__vma, __address);
// >>> __pte;
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
set_page_dirty(page);
.....
It seems that they only can work on UP system.
On SMP, let's suppose the pte was clean, after A CPU executed
ptep_get_and_clear,
B CPU makes the pte dirty, which will make a fatal error to A CPU since it gets
a stale pte, isn't right?
From: "yunfeng zhang" <[email protected]>
Date: Fri, 20 Oct 2006 10:47:49 +0800
> In rmap.c::try_to_unmap_one of 2.6.16.29, there are some code snippets
>
> .....
> /* Nuke the page table entry. */
> flush_cache_page(vma, address, page_to_pfn(page));
> pteval = ptep_clear_flush(vma, address, pte);
> // >>> The above line is expanded as below
> // >>> pte_t __pte;
> // >>> __pte = ptep_get_and_clear((__vma)->vm_mm, __address, __ptep);
> // >>> flush_tlb_page(__vma, __address);
> // >>> __pte;
>
> /* Move the dirty bit to the physical page now the pte is gone. */
> if (pte_dirty(pteval))
> set_page_dirty(page);
> .....
>
>
> It seems that they only can work on UP system.
>
> On SMP, let's suppose the pte was clean, after A CPU executed
> ptep_get_and_clear,
> B CPU makes the pte dirty, which will make a fatal error to A CPU since it gets
> a stale pte, isn't right?
B can't make it dirty because it's been cleared to zero
and flush_tlb_page() has removed the TLB cached copy of
the PTE. B can therefore only see the new cleared PTE.
Maybe, the solution is below
...
// >>> ptep_clear((__vma)->vm_mm, __address, __ptep);
// >>> flush_tlb_page(__vma, __address);
// >>> __ptep;
...
And even so, we also get a pte with present = 0 AND its dirty = 1, an odd pte.
Remember B dirtied the pte before A executes flush_tlb_page.
On Fri, 2006-10-20 at 13:10 +0800, yunfeng zhang wrote:
> Maybe, the solution is below
>
> ...
> // >>> ptep_clear((__vma)->vm_mm, __address, __ptep);
> // >>> flush_tlb_page(__vma, __address);
> // >>> __ptep;
> ...
>
> And even so, we also get a pte with present = 0 AND its dirty = 1, an odd pte.
>
> Remember B dirtied the pte before A executes flush_tlb_page.
It's very much architecture specific. I suppose x86 must have some HW
requirements of checking if the PTE is still present atomically when
setting the dirty bit but I can't tell for sure :)
On PowerPC, we don't use HW dirty bits, we use SW for that, thus the
ptep_get_and_clear will be enough to prevent any further dirty bit to be
set.
Ben.