2012-10-25 16:45:18

by Will Deacon

[permalink] [raw]
Subject: [PATCH v3] mm: thp: Set the accessed flag for old pages on access fault.

On x86 memory accesses to pages without the ACCESSED flag set result in the
ACCESSED flag being set automatically. With the ARM architecture a page access
fault is raised instead (and it will continue to be raised until the ACCESSED
flag is set for the appropriate PTE/PMD).

For normal memory pages, handle_pte_fault will call pte_mkyoung (effectively
setting the ACCESSED flag). For transparent huge pages, pmd_mkyoung will only
be called for a write fault.

This patch ensures that faults on transparent hugepages which do not result
in a CoW update the access flags for the faulting pmd.

Cc: Chris Metcalf <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---

Ok chaps, I rebased this thing onto today's next (which basically
necessitated a rewrite) so I've reluctantly dropped my acks and kindly
ask if you could eyeball the new code, especially where the locking is
concerned. In the numa code (do_huge_pmd_prot_none), Peter checks again
that the page is not splitting, but I can't see why that is required.

Cheers,

Will

include/linux/huge_mm.h | 4 ++++
mm/huge_memory.c | 22 ++++++++++++++++++++++
mm/memory.c | 7 ++++++-
3 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4f0f948..766fb27 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -8,6 +8,10 @@ extern int do_huge_pmd_anonymous_page(struct mm_struct *mm,
extern int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
struct vm_area_struct *vma);
+extern void huge_pmd_set_accessed(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long address, pmd_t *pmd,
+ pmd_t orig_pmd, int dirty);
extern int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd,
pmd_t orig_pmd);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3c14a96..f024d98 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -932,6 +932,28 @@ out:
return ret;
}

+void huge_pmd_set_accessed(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ pmd_t *pmd, pmd_t orig_pmd,
+ int dirty)
+{
+ pmd_t entry;
+ unsigned long haddr;
+
+ spin_lock(&mm->page_table_lock);
+ if (unlikely(!pmd_same(*pmd, orig_pmd)))
+ goto unlock;
+
+ entry = pmd_mkyoung(orig_pmd);
+ haddr = address & HPAGE_PMD_MASK;
+ if (pmdp_set_access_flags(vma, haddr, pmd, entry, dirty))
+ update_mmu_cache_pmd(vma, address, pmd);
+
+unlock:
+ spin_unlock(&mm->page_table_lock);
+}
+
static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
struct vm_area_struct *vma,
unsigned long address,
diff --git a/mm/memory.c b/mm/memory.c
index f21ac1c..bcbc084 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3650,12 +3650,14 @@ retry:

barrier();
if (pmd_trans_huge(orig_pmd) && !pmd_trans_splitting(orig_pmd)) {
+ unsigned int dirty = flags & FAULT_FLAG_WRITE;
+
if (pmd_numa(vma, orig_pmd)) {
do_huge_pmd_numa_page(mm, vma, address, pmd,
flags, orig_pmd);
}

- if ((flags & FAULT_FLAG_WRITE) && !pmd_write(orig_pmd)) {
+ if (dirty && !pmd_write(orig_pmd)) {
ret = do_huge_pmd_wp_page(mm, vma, address, pmd,
orig_pmd);
/*
@@ -3665,6 +3667,9 @@ retry:
*/
if (unlikely(ret & VM_FAULT_OOM))
goto retry;
+ } else {
+ huge_pmd_set_accessed(mm, vma, address, pmd,
+ orig_pmd, dirty);
}

return ret;
--
1.7.4.1


2012-10-25 19:51:27

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v3] mm: thp: Set the accessed flag for old pages on access fault.

On Thu, Oct 25, 2012 at 05:44:31PM +0100, Will Deacon wrote:
> On x86 memory accesses to pages without the ACCESSED flag set result in the
> ACCESSED flag being set automatically. With the ARM architecture a page access
> fault is raised instead (and it will continue to be raised until the ACCESSED
> flag is set for the appropriate PTE/PMD).
>
> For normal memory pages, handle_pte_fault will call pte_mkyoung (effectively
> setting the ACCESSED flag). For transparent huge pages, pmd_mkyoung will only
> be called for a write fault.
>
> This patch ensures that faults on transparent hugepages which do not result
> in a CoW update the access flags for the faulting pmd.
>
> Cc: Chris Metcalf <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>

Acked-by: Johannes Weiner <[email protected]>

> Ok chaps, I rebased this thing onto today's next (which basically
> necessitated a rewrite) so I've reluctantly dropped my acks and kindly
> ask if you could eyeball the new code, especially where the locking is
> concerned. In the numa code (do_huge_pmd_prot_none), Peter checks again
> that the page is not splitting, but I can't see why that is required.

I don't either. If the thing was splitting when the fault happened,
that path is not taken. And the locked pmd_same() check should rule
out splitting setting in after testing pmd_trans_huge_splitting().

Peter?

2012-10-26 03:08:13

by Ni zhan Chen

[permalink] [raw]
Subject: Re: [PATCH v3] mm: thp: Set the accessed flag for old pages on access fault.

On 10/26/2012 03:51 AM, Johannes Weiner wrote:
> On Thu, Oct 25, 2012 at 05:44:31PM +0100, Will Deacon wrote:
>> On x86 memory accesses to pages without the ACCESSED flag set result in the
>> ACCESSED flag being set automatically. With the ARM architecture a page access
>> fault is raised instead (and it will continue to be raised until the ACCESSED
>> flag is set for the appropriate PTE/PMD).
>>
>> For normal memory pages, handle_pte_fault will call pte_mkyoung (effectively
>> setting the ACCESSED flag). For transparent huge pages, pmd_mkyoung will only
>> be called for a write fault.
>>
>> This patch ensures that faults on transparent hugepages which do not result
>> in a CoW update the access flags for the faulting pmd.
>>
>> Cc: Chris Metcalf <[email protected]>
>> Cc: Kirill A. Shutemov <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Signed-off-by: Will Deacon <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
>
>> Ok chaps, I rebased this thing onto today's next (which basically
>> necessitated a rewrite) so I've reluctantly dropped my acks and kindly
>> ask if you could eyeball the new code, especially where the locking is
>> concerned. In the numa code (do_huge_pmd_prot_none), Peter checks again
>> that the page is not splitting, but I can't see why that is required.
> I don't either. If the thing was splitting when the fault happened,
> that path is not taken. And the locked pmd_same() check should rule
> out splitting setting in after testing pmd_trans_huge_splitting().

Why I can't find function pmd_trans_huge_splitting() you mentioned in
latest mainline codes and linux-next?

>
> Peter?
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2012-10-26 06:20:11

by Ni zhan Chen

[permalink] [raw]
Subject: Re: [PATCH v3] mm: thp: Set the accessed flag for old pages on access fault.

On 10/26/2012 12:44 AM, Will Deacon wrote:
> On x86 memory accesses to pages without the ACCESSED flag set result in the
> ACCESSED flag being set automatically. With the ARM architecture a page access
> fault is raised instead (and it will continue to be raised until the ACCESSED
> flag is set for the appropriate PTE/PMD).
>
> For normal memory pages, handle_pte_fault will call pte_mkyoung (effectively
> setting the ACCESSED flag). For transparent huge pages, pmd_mkyoung will only
> be called for a write fault.
>
> This patch ensures that faults on transparent hugepages which do not result
> in a CoW update the access flags for the faulting pmd.

Could you write changlog?

>
> Cc: Chris Metcalf <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
>
> Ok chaps, I rebased this thing onto today's next (which basically
> necessitated a rewrite) so I've reluctantly dropped my acks and kindly
> ask if you could eyeball the new code, especially where the locking is
> concerned. In the numa code (do_huge_pmd_prot_none), Peter checks again
> that the page is not splitting, but I can't see why that is required.
>
> Cheers,
>
> Will

Could you explain why you not call pmd_trans_huge_lock to confirm the
pmd is splitting or stable as Andrea point out?

>
> include/linux/huge_mm.h | 4 ++++
> mm/huge_memory.c | 22 ++++++++++++++++++++++
> mm/memory.c | 7 ++++++-
> 3 files changed, 32 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 4f0f948..766fb27 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -8,6 +8,10 @@ extern int do_huge_pmd_anonymous_page(struct mm_struct *mm,
> extern int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
> struct vm_area_struct *vma);
> +extern void huge_pmd_set_accessed(struct mm_struct *mm,
> + struct vm_area_struct *vma,
> + unsigned long address, pmd_t *pmd,
> + pmd_t orig_pmd, int dirty);
> extern int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
> unsigned long address, pmd_t *pmd,
> pmd_t orig_pmd);
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 3c14a96..f024d98 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -932,6 +932,28 @@ out:
> return ret;
> }
>
> +void huge_pmd_set_accessed(struct mm_struct *mm,
> + struct vm_area_struct *vma,
> + unsigned long address,
> + pmd_t *pmd, pmd_t orig_pmd,
> + int dirty)
> +{
> + pmd_t entry;
> + unsigned long haddr;
> +
> + spin_lock(&mm->page_table_lock);
> + if (unlikely(!pmd_same(*pmd, orig_pmd)))
> + goto unlock;
> +
> + entry = pmd_mkyoung(orig_pmd);
> + haddr = address & HPAGE_PMD_MASK;
> + if (pmdp_set_access_flags(vma, haddr, pmd, entry, dirty))
> + update_mmu_cache_pmd(vma, address, pmd);
> +
> +unlock:
> + spin_unlock(&mm->page_table_lock);
> +}
> +
> static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
> struct vm_area_struct *vma,
> unsigned long address,
> diff --git a/mm/memory.c b/mm/memory.c
> index f21ac1c..bcbc084 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3650,12 +3650,14 @@ retry:
>
> barrier();
> if (pmd_trans_huge(orig_pmd) && !pmd_trans_splitting(orig_pmd)) {
> + unsigned int dirty = flags & FAULT_FLAG_WRITE;
> +
> if (pmd_numa(vma, orig_pmd)) {
> do_huge_pmd_numa_page(mm, vma, address, pmd,
> flags, orig_pmd);
> }
>
> - if ((flags & FAULT_FLAG_WRITE) && !pmd_write(orig_pmd)) {
> + if (dirty && !pmd_write(orig_pmd)) {
> ret = do_huge_pmd_wp_page(mm, vma, address, pmd,
> orig_pmd);
> /*
> @@ -3665,6 +3667,9 @@ retry:
> */
> if (unlikely(ret & VM_FAULT_OOM))
> goto retry;
> + } else {
> + huge_pmd_set_accessed(mm, vma, address, pmd,
> + orig_pmd, dirty);
> }
>
> return ret;

2012-10-26 07:43:18

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v3] mm: thp: Set the accessed flag for old pages on access fault.

On Thu, Oct 25, 2012 at 05:44:31PM +0100, Will Deacon wrote:
> On x86 memory accesses to pages without the ACCESSED flag set result in the
> ACCESSED flag being set automatically. With the ARM architecture a page access
> fault is raised instead (and it will continue to be raised until the ACCESSED
> flag is set for the appropriate PTE/PMD).
>
> For normal memory pages, handle_pte_fault will call pte_mkyoung (effectively
> setting the ACCESSED flag). For transparent huge pages, pmd_mkyoung will only
> be called for a write fault.
>
> This patch ensures that faults on transparent hugepages which do not result
> in a CoW update the access flags for the faulting pmd.
>
> Cc: Chris Metcalf <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
>
> Ok chaps, I rebased this thing onto today's next (which basically
> necessitated a rewrite) so I've reluctantly dropped my acks and kindly
> ask if you could eyeball the new code, especially where the locking is
> concerned. In the numa code (do_huge_pmd_prot_none), Peter checks again
> that the page is not splitting, but I can't see why that is required.

In handle_mm_fault() we check if the pmd is under splitting without
page_table_lock. It's kind of speculative cheap check. We need to re-check
if the PMD is really not under splitting after taking page_table_lock.

See section "Locking in hugepage aware code" in Documentation/vm/transhuge.txt

--
Kirill A. Shutemov

2012-10-26 09:07:45

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3] mm: thp: Set the accessed flag for old pages on access fault.

On Fri, Oct 26, 2012 at 08:44:35AM +0100, Kirill A. Shutemov wrote:
> On Thu, Oct 25, 2012 at 05:44:31PM +0100, Will Deacon wrote:
> > On x86 memory accesses to pages without the ACCESSED flag set result in the
> > ACCESSED flag being set automatically. With the ARM architecture a page access
> > fault is raised instead (and it will continue to be raised until the ACCESSED
> > flag is set for the appropriate PTE/PMD).
> >
> > For normal memory pages, handle_pte_fault will call pte_mkyoung (effectively
> > setting the ACCESSED flag). For transparent huge pages, pmd_mkyoung will only
> > be called for a write fault.
> >
> > This patch ensures that faults on transparent hugepages which do not result
> > in a CoW update the access flags for the faulting pmd.
> >
> > Cc: Chris Metcalf <[email protected]>
> > Cc: Kirill A. Shutemov <[email protected]>
> > Cc: Andrea Arcangeli <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
> > ---
> >
> > Ok chaps, I rebased this thing onto today's next (which basically
> > necessitated a rewrite) so I've reluctantly dropped my acks and kindly
> > ask if you could eyeball the new code, especially where the locking is
> > concerned. In the numa code (do_huge_pmd_prot_none), Peter checks again
> > that the page is not splitting, but I can't see why that is required.
>
> In handle_mm_fault() we check if the pmd is under splitting without
> page_table_lock. It's kind of speculative cheap check. We need to re-check
> if the PMD is really not under splitting after taking page_table_lock.

I appreciate the need to check whether the thing is splitting, but I thought
that the pmd_same(*pmd, orig_pmd) check after taking the page_table_lock
would be sufficient, because we know that the entry hasn't changed and that
it wasn't splitting before we took the lock. This also mirrors the approach
taken by do_huge_pmd_wp_page.

Is there something I'm missing?

Cheers,

Will

2012-10-26 09:34:32

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3] mm: thp: Set the accessed flag for old pages on access fault.

On Fri, Oct 26, 2012 at 07:19:55AM +0100, Ni zhan Chen wrote:
> On 10/26/2012 12:44 AM, Will Deacon wrote:
> > On x86 memory accesses to pages without the ACCESSED flag set result in the
> > ACCESSED flag being set automatically. With the ARM architecture a page access
> > fault is raised instead (and it will continue to be raised until the ACCESSED
> > flag is set for the appropriate PTE/PMD).
> >
> > For normal memory pages, handle_pte_fault will call pte_mkyoung (effectively
> > setting the ACCESSED flag). For transparent huge pages, pmd_mkyoung will only
> > be called for a write fault.
> >
> > This patch ensures that faults on transparent hugepages which do not result
> > in a CoW update the access flags for the faulting pmd.
>
> Could you write changlog?

>From v2? I included something below my SoB. The code should do exactly the
same as before, it's just rebased onto next so that I can play nicely with
Peter's patches.

> >
> > Cc: Chris Metcalf <[email protected]>
> > Cc: Kirill A. Shutemov <[email protected]>
> > Cc: Andrea Arcangeli <[email protected]>
> > Signed-off-by: Will Deacon <[email protected]>
> > ---
> >
> > Ok chaps, I rebased this thing onto today's next (which basically
> > necessitated a rewrite) so I've reluctantly dropped my acks and kindly
> > ask if you could eyeball the new code, especially where the locking is
> > concerned. In the numa code (do_huge_pmd_prot_none), Peter checks again
> > that the page is not splitting, but I can't see why that is required.
> >
> > Cheers,
> >
> > Will
>
> Could you explain why you not call pmd_trans_huge_lock to confirm the
> pmd is splitting or stable as Andrea point out?

The way handle_mm_fault is now structured after the numa changes means that
we only enter the huge pmd page aging code if the entry wasn't splitting
before taking the lock, so it seemed a bit gratuitous to jump through those
hoops again in pmd_trans_huge_lock.

Will

2012-10-26 09:49:25

by Ni zhan Chen

[permalink] [raw]
Subject: Re: [PATCH v3] mm: thp: Set the accessed flag for old pages on access fault.

On 10/26/2012 05:34 PM, Will Deacon wrote:
> On Fri, Oct 26, 2012 at 07:19:55AM +0100, Ni zhan Chen wrote:
>> On 10/26/2012 12:44 AM, Will Deacon wrote:
>>> On x86 memory accesses to pages without the ACCESSED flag set result in the
>>> ACCESSED flag being set automatically. With the ARM architecture a page access
>>> fault is raised instead (and it will continue to be raised until the ACCESSED
>>> flag is set for the appropriate PTE/PMD).
>>>
>>> For normal memory pages, handle_pte_fault will call pte_mkyoung (effectively
>>> setting the ACCESSED flag). For transparent huge pages, pmd_mkyoung will only
>>> be called for a write fault.
>>>
>>> This patch ensures that faults on transparent hugepages which do not result
>>> in a CoW update the access flags for the faulting pmd.
>> Could you write changlog?
> >From v2? I included something below my SoB. The code should do exactly the
> same as before, it's just rebased onto next so that I can play nicely with
> Peter's patches.
>
>>> Cc: Chris Metcalf <[email protected]>
>>> Cc: Kirill A. Shutemov <[email protected]>
>>> Cc: Andrea Arcangeli <[email protected]>
>>> Signed-off-by: Will Deacon <[email protected]>
>>> ---
>>>
>>> Ok chaps, I rebased this thing onto today's next (which basically
>>> necessitated a rewrite) so I've reluctantly dropped my acks and kindly
>>> ask if you could eyeball the new code, especially where the locking is
>>> concerned. In the numa code (do_huge_pmd_prot_none), Peter checks again
>>> that the page is not splitting, but I can't see why that is required.
>>>
>>> Cheers,
>>>
>>> Will
>> Could you explain why you not call pmd_trans_huge_lock to confirm the
>> pmd is splitting or stable as Andrea point out?
> The way handle_mm_fault is now structured after the numa changes means that
> we only enter the huge pmd page aging code if the entry wasn't splitting

Why you call it huge pmd page *aging* code?

Regards,
Chen

> before taking the lock, so it seemed a bit gratuitous to jump through those
> hoops again in pmd_trans_huge_lock.
>
> Will
>

2012-10-26 10:14:04

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v3] mm: thp: Set the accessed flag for old pages on access fault.

On Fri, Oct 26, 2012 at 10:07:15AM +0100, Will Deacon wrote:
> On Fri, Oct 26, 2012 at 08:44:35AM +0100, Kirill A. Shutemov wrote:
> > On Thu, Oct 25, 2012 at 05:44:31PM +0100, Will Deacon wrote:
> > > On x86 memory accesses to pages without the ACCESSED flag set result in the
> > > ACCESSED flag being set automatically. With the ARM architecture a page access
> > > fault is raised instead (and it will continue to be raised until the ACCESSED
> > > flag is set for the appropriate PTE/PMD).
> > >
> > > For normal memory pages, handle_pte_fault will call pte_mkyoung (effectively
> > > setting the ACCESSED flag). For transparent huge pages, pmd_mkyoung will only
> > > be called for a write fault.
> > >
> > > This patch ensures that faults on transparent hugepages which do not result
> > > in a CoW update the access flags for the faulting pmd.
> > >
> > > Cc: Chris Metcalf <[email protected]>
> > > Cc: Kirill A. Shutemov <[email protected]>
> > > Cc: Andrea Arcangeli <[email protected]>
> > > Signed-off-by: Will Deacon <[email protected]>
> > > ---
> > >
> > > Ok chaps, I rebased this thing onto today's next (which basically
> > > necessitated a rewrite) so I've reluctantly dropped my acks and kindly
> > > ask if you could eyeball the new code, especially where the locking is
> > > concerned. In the numa code (do_huge_pmd_prot_none), Peter checks again
> > > that the page is not splitting, but I can't see why that is required.
> >
> > In handle_mm_fault() we check if the pmd is under splitting without
> > page_table_lock. It's kind of speculative cheap check. We need to re-check
> > if the PMD is really not under splitting after taking page_table_lock.
>
> I appreciate the need to check whether the thing is splitting, but I thought
> that the pmd_same(*pmd, orig_pmd) check after taking the page_table_lock
> would be sufficient, because we know that the entry hasn't changed and that
> it wasn't splitting before we took the lock. This also mirrors the approach
> taken by do_huge_pmd_wp_page.
>
> Is there something I'm missing?

Hm.. You're correct from my POV.

Acked-by: Kirill A. Shutemov <[email protected]>

I think the check in do_huge_pmd_prot_none() is redundant. It only add
latency. I'll prepare patch to remove it.

--
Kirill A. Shutemov