by Alex Thorlton

[permalink] [raw]

Subject: Re: [PATCH 2/2] thp: support split page table lock

On Mon, Sep 09, 2013 at 10:34:45AM +0800, Daniel J Blueman wrote:
> On Saturday, 7 September 2013 02:10:02 UTC+8, Naoya Horiguchi wrote:
> >Hi Alex,
> >
> >On Fri, Sep 06, 2013 at 11:04:23AM -0500, Alex Thorlton wrote:
> >> On Thu, Sep 05, 2013 at 05:27:46PM -0400, Naoya Horiguchi wrote:
> >> > Thp related code also uses per process mm->page_table_lock now.
> >> > So making it fine-grained can provide better performance.
> >> >
> >> > This patch makes thp support split page table lock by using page->ptl
> >> > of the pages storing "pmd_trans_huge" pmds.
> >> >
> >> > Some functions like pmd_trans_huge_lock() and
> page_check_address_pmd()
> >> > are expected by their caller to pass back the pointer of ptl, so this
> >> > patch adds to those functions new arguments for that. Rather than
> that,
> >> > this patch gives only straightforward replacement.
> >> >
> >> > ChangeLog v3:
> >> > - fixed argument of huge_pmd_lockptr() in copy_huge_pmd()
> >> > - added missing declaration of ptl in do_huge_pmd_anonymous_page()
> >>
> >> I've applied these and tested them using the same tests program that I
> >> used when I was working on the same issue, and I'm running into some
> >> bugs. Here's a stack trace:
> >
> >Thank you for helping testing. This bug is new to me.
>
> With 3.11, this patch series and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS,
> I consistently hit the same failure when exiting one of my
> stress-testers [1] when using eg 24 cores.
>
> Doesn't happen with 8 cores, so likely needs enough virtual memory
> to use multiple split locks. Otherwise, this is very promising work!

Daniel,

Hillf Danton ([email protected]) suggested putting the big
page_table_lock back into the two functions seen below. I re-tested
with this change and it seems to resolve the issue.

- Alex

--- a/mm/pgtable-generic.c Sat Sep 7 15:17:52 2013
+++ b/mm/pgtable-generic.c Sat Sep 7 15:20:28 2013
@@ -127,12 +127,14 @@ void pmdp_splitting_flush(struct vm_area
void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pgtable)
{
+ spin_lock(&mm->page_table_lock);
/* FIFO */
if (!mm->pmd_huge_pte)
INIT_LIST_HEAD(&pgtable->lru);
else
list_add(&pgtable->lru, &mm->pmd_huge_pte->lru);
mm->pmd_huge_pte = pgtable;
+ spin_unlock(&mm->page_table_lock);
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#endif
@@ -144,6 +146,7 @@ pgtable_t pgtable_trans_huge_withdraw(st
{
pgtable_t pgtable;

+ spin_lock(&mm->page_table_lock);
/* FIFO */
pgtable = mm->pmd_huge_pte;
if (list_empty(&pgtable->lru))
@@ -153,6 +156,7 @@ pgtable_t pgtable_trans_huge_withdraw(st
struct page, lru);
list_del(&pgtable->lru);
}
+ spin_unlock(&mm->page_table_lock);
return pgtable;
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
--

2013-09-09 16:26:41

by Naoya Horiguchi

[permalink] [raw]

Subject: Re: [PATCH 1/2] hugetlbfs: support split page table lock

On Sun, Sep 08, 2013 at 10:23:55PM +0530, Aneesh Kumar K.V wrote:
> Naoya Horiguchi <[email protected]> writes:
>
> > Currently all of page table handling by hugetlbfs code are done under
> > mm->page_table_lock. So when a process have many threads and they heavily
> > access to the memory, lock contention happens and impacts the performance.
> >
> > This patch makes hugepage support split page table lock so that we use
> > page->ptl of the leaf node of page table tree which is pte for normal pages
> > but can be pmd and/or pud for hugepages of some architectures.
> >
> > ChangeLog v3:
> > - disable split ptl for ppc with USE_SPLIT_PTLOCKS_HUGETLB.
> > - remove replacement in some architecture dependent code. This is justified
> > because an allocation of pgd/pud/pmd/pte entry can race with other
> > allocation, not with read/write access, so we can use different locks.
> > http://thread.gmane.org/gmane.linux.kernel.mm/106292/focus=106458
> >
> > ChangeLog v2:
> > - add split ptl on other archs missed in v1
> > - drop changes on arch/{powerpc,tile}/mm/hugetlbpage.c
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > ---
> > include/linux/hugetlb.h | 20 +++++++++++
> > include/linux/mm_types.h | 2 ++
> > mm/hugetlb.c | 92 +++++++++++++++++++++++++++++-------------------
> > mm/mempolicy.c | 5 +--
> > mm/migrate.c | 4 +--
> > mm/rmap.c | 2 +-
> > 6 files changed, 84 insertions(+), 41 deletions(-)
> >
> > diff --git v3.11-rc3.orig/include/linux/hugetlb.h v3.11-rc3/include/linux/hugetlb.h
> > index 0393270..5cb8a4e 100644
> > --- v3.11-rc3.orig/include/linux/hugetlb.h
> > +++ v3.11-rc3/include/linux/hugetlb.h
> > @@ -80,6 +80,24 @@ extern const unsigned long hugetlb_zero, hugetlb_infinity;
> > extern int sysctl_hugetlb_shm_group;
> > extern struct list_head huge_boot_pages;
> >
> > +#if USE_SPLIT_PTLOCKS_HUGETLB
> > +#define huge_pte_lockptr(mm, ptep) ({__pte_lockptr(virt_to_page(ptep)); })
> > +#else /* !USE_SPLIT_PTLOCKS_HUGETLB */
> > +#define huge_pte_lockptr(mm, ptep) ({&(mm)->page_table_lock; })
> > +#endif /* USE_SPLIT_PTLOCKS_HUGETLB */
> > +
> > +#define huge_pte_offset_lock(mm, address, ptlp) \
> > +({ \
> > + pte_t *__pte = huge_pte_offset(mm, address); \
> > + spinlock_t *__ptl = NULL; \
> > + if (__pte) { \
> > + __ptl = huge_pte_lockptr(mm, __pte); \
> > + *(ptlp) = __ptl; \
> > + spin_lock(__ptl); \
> > + } \
> > + __pte; \
> > +})
>
>
> why not a static inline function ?

I'm not sure how these two make change in runtime (maybe optimized to
similar code?).
I just used the same pattern with pte_offset_map_lock(). I think that
this may show that this function is the variant of pte_offset_map_lock().
But if we have a good reason to make it static inline, I'm glad to do this.
Do you have any specific ideas?

>
> > +
> > /* arch callbacks */
> >
> > pte_t *huge_pte_alloc(struct mm_struct *mm,> @@ -164,6 +182,8 @@ static inline void __unmap_hugepage_range(struct mmu_gather *tlb,
> > BUG();
> > }
> >
> > +#define huge_pte_lockptr(mm, ptep) 0
> > +
> > #endif /* !CONFIG_HUGETLB_PAGE */
> >
> > #define HUGETLB_ANON_FILE "anon_hugepage"
> > diff --git v3.11-rc3.orig/include/linux/mm_types.h v3.11-rc3/include/linux/mm_types.h
> > index fb425aa..cfb8c6f 100644
> > --- v3.11-rc3.orig/include/linux/mm_types.h
> > +++ v3.11-rc3/include/linux/mm_types.h
> > @@ -24,6 +24,8 @@
> > struct address_space;
> >
> > #define USE_SPLIT_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
> > +#define USE_SPLIT_PTLOCKS_HUGETLB \
> > + (USE_SPLIT_PTLOCKS && !defined(CONFIG_PPC))
> >
>
> Is that a common pattern ? Don't we generally use
> HAVE_ARCH_SPLIT_PTLOCKS in arch config file ?

Do you mean that HAVE_ARCH_SPLIT_PTLOCKS enables/disables whole split
ptl mechanism (not only split ptl for hugepages) depending on archs?
If you do, we can do this with adding a line 'default "999999" if PPC'.
(Note that if we do this, we can lose the performance benefit from
existing split ptl for normal pages. So I'm not sure it's acceptible
for users of the arch.)

> Also are we sure this is
> only an issue with PPC ?

Not confirmed yet (so let me take some time to check this).
I think that split ptl for hugepage should work on any archtectures
where every entry pointing to hugepage is stored in 4kB page allocated
for page table trees. So I'll check it.

Thanks,
Naoya Horiguchi