2014-04-29 09:42:25

by Jianyu Zhan

[permalink] [raw]
Subject: [PATCH 1/3] mm/swap.c: introduce put_[un]refcounted_compound_page helpers for spliting put_compound_page

Currently, put_compound_page should carefully handle tricky case
to avoid racing with compound page releasing or spliting, which
makes it growing quite lenthy(about 200+ lines) and need deep
tab indention, which makes it quite hard to follow and maintain.

This patch(and the next patch) tries to refactor this function.
It is a prepared patch.

Based on the code skeleton of put_compound_page:

put_compound_pge:
if !PageTail(page)
put head page fastpath;
return;

/* else PageTail */
page_head = compound_head(page)
if !__compound_tail_refcounted(page_head)
put head page optimal path; <---(1)
return;
else
put head page slowpath; <--- (2)
return;

This patch introduces two helpers, put_[un]refcounted_compound_page,
handling the code path (1) and code path (2), respectively. They both
are tagged __always_inline, thus it elmiates function call overhead,
making them operating the same way as before.

They are almost copied verbatim(except one place, a "goto out_put_single"
is expanded), with some comments rephrasing.

Signed-off-by: Jianyu Zhan <[email protected]>
---
mm/swap.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 142 insertions(+)

diff --git a/mm/swap.c b/mm/swap.c
index c0cd7d0..a576449 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -79,6 +79,148 @@ static void __put_compound_page(struct page *page)
(*dtor)(page);
}

+/**
+ * Two special cases here: we could avoid taking compound_lock_irqsave
+ * and could skip the tail refcounting(in _mapcount).
+ *
+ * 1. Hugetlbfs page:
+ *
+ * PageHeadHuge will remain true until the compound page
+ * is released and enters the buddy allocator, and it could
+ * not be split by __split_huge_page_refcount().
+ *
+ * So if we see PageHeadHuge set, and we have the tail page pin,
+ * then we could safely put head page.
+ *
+ * 2. Slab THP page:
+ *
+ * PG_slab is cleared before the slab frees the head page, and
+ * tail pin cannot be the last reference left on the head page,
+ * because the slab code is free to reuse the compound page
+ * after a kfree/kmem_cache_free without having to check if
+ * there's any tail pin left. In turn all tail pinsmust be always
+ * released while the head is still pinned by the slab code
+ * and so we know PG_slab will be still set too.
+ *
+ * So if we see PageSlab set, and we have the tail page pin,
+ * then we could safely put head page.
+ */
+static __always_inline
+void put_unrefcounted_compound_page(struct page *page_head, struct page *page)
+{
+ /*
+ * If @page is a THP tail, we must read the tail page
+ * flags after the head page flags. The
+ * __split_huge_page_refcount side enforces write memory barriers
+ * between clearing PageTail and before the head page
+ * can be freed and reallocated.
+ */
+ smp_rmb();
+ if (likely(PageTail(page))) {
+ /*
+ * __split_huge_page_refcount cannot race
+ * here, see the comment above this function.
+ */
+ VM_BUG_ON_PAGE(!PageHead(page_head), page_head);
+ VM_BUG_ON_PAGE(page_mapcount(page) != 0, page);
+ if (put_page_testzero(page_head)) {
+ /*
+ * If this is the tail of a slab THP page,
+ * the tail pin must not be the last reference
+ * held on the page, because the PG_slab cannot
+ * be cleared before all tail pins (which skips
+ * the _mapcount tail refcounting) have been
+ * released.
+ *
+ * If this is the tail of a hugetlbfs page,
+ * the tail pin may be the last reference on
+ * the page instead, because PageHeadHuge will
+ * not go away until the compound page enters
+ * the buddy allocator.
+ */
+ VM_BUG_ON_PAGE(PageSlab(page_head), page_head);
+ __put_compound_page(page_head);
+ }
+ } else
+ /*
+ * __split_huge_page_refcount run before us,
+ * @page was a THP tail. The split @page_head
+ * has been freed and reallocated as slab or
+ * hugetlbfs page of smaller order (only
+ * possible if reallocated as slab on x86).
+ */
+ if (put_page_testzero(page))
+ __put_single_page(page);
+}
+
+static __always_inline
+void put_refcounted_compound_page(struct page *page_head, struct page *page)
+{
+ if (likely(page != page_head && get_page_unless_zero(page_head))) {
+ unsigned long flags;
+
+ /*
+ * @page_head wasn't a dangling pointer but it may not
+ * be a head page anymore by the time we obtain the
+ * lock. That is ok as long as it can't be freed from
+ * under us.
+ */
+ flags = compound_lock_irqsave(page_head);
+ if (unlikely(!PageTail(page))) {
+ /* __split_huge_page_refcount run before us */
+ compound_unlock_irqrestore(page_head, flags);
+ if (put_page_testzero(page_head)) {
+ /*
+ * The @page_head may have been freed
+ * and reallocated as a compound page
+ * of smaller order and then freed
+ * again. All we know is that it
+ * cannot have become: a THP page, a
+ * compound page of higher order, a
+ * tail page. That is because we
+ * still hold the refcount of the
+ * split THP tail and page_head was
+ * the THP head before the split.
+ */
+ if (PageHead(page_head))
+ __put_compound_page(page_head);
+ else
+ __put_single_page(page_head);
+ }
+out_put_single:
+ if (put_page_testzero(page))
+ __put_single_page(page);
+ return;
+ }
+ VM_BUG_ON_PAGE(page_head != page->first_page, page);
+ /*
+ * We can release the refcount taken by
+ * get_page_unless_zero() now that
+ * __split_huge_page_refcount() is blocked on the
+ * compound_lock.
+ */
+ if (put_page_testzero(page_head))
+ VM_BUG_ON_PAGE(1, page_head);
+ /* __split_huge_page_refcount will wait now */
+ VM_BUG_ON_PAGE(page_mapcount(page) <= 0, page);
+ atomic_dec(&page->_mapcount);
+ VM_BUG_ON_PAGE(atomic_read(&page_head->_count) <= 0, page_head);
+ VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page);
+ compound_unlock_irqrestore(page_head, flags);
+
+ if (put_page_testzero(page_head)) {
+ if (PageHead(page_head))
+ __put_compound_page(page_head);
+ else
+ __put_single_page(page_head);
+ }
+ } else {
+ /* @page_head is a dangling pointer */
+ VM_BUG_ON_PAGE(PageTail(page), page);
+ goto out_put_single;
+ }
+}
+
static void put_compound_page(struct page *page)
{
struct page *page_head;
--
2.0.0-rc1


2014-04-29 09:42:41

by Jianyu Zhan

[permalink] [raw]
Subject: [PATCH 2/3] mm/swap.c: split put_compound_page function

Currently, put_compound_page should carefully handle tricky case
to avoid racing with compound page releasing or spliting, which
makes it growing quite lenthy(about 200+ lines) and need deep
tab indention, which makes it quite hard to follow and maintain.

Now based on two helpers introduced in the previous patch
("mm/swap.c: introduce put_[un]refcounted_compound_page helpers
for spliting put_compound_page"), this patch just replaces those
two lenthy code path with these two helpers, respectively.
Also, it has some comment rephrasing.

After this patch, the put_compound_page() will be very compact,
thus easy to read and maintain.

After spliting, the object file is of same size as the original one.
Actually, I've diff'ed put_compound_page()'s orginal disassemble code
and the patched disassemble code, the are 100% the same!

This fact shows that this spliting has no functinal change,
but it brings readability.

This patch and the previous one blow the code by 32 lines, which
mostly credits to comments.

Signed-off-by: Jianyu Zhan <[email protected]>
---
mm/swap.c | 142 +++++++-------------------------------------------------------
1 file changed, 16 insertions(+), 126 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index a576449..d8654d8 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -225,6 +225,11 @@ static void put_compound_page(struct page *page)
{
struct page *page_head;

+ /*
+ * We see the PageCompound set and PageTail not set, so @page maybe:
+ * 1. hugetlbfs head page, or
+ * 2. THP head page.
+ */
if (likely(!PageTail(page))) {
if (put_page_testzero(page)) {
/*
@@ -239,135 +244,20 @@ static void put_compound_page(struct page *page)
return;
}

- /* __split_huge_page_refcount can run under us */
- page_head = compound_head(page);
-
/*
- * THP can not break up slab pages so avoid taking
- * compound_lock() and skip the tail page refcounting (in
- * _mapcount) too. Slab performs non-atomic bit ops on
- * page->flags for better performance. In particular
- * slab_unlock() in slub used to be a hot path. It is still
- * hot on arches that do not support
- * this_cpu_cmpxchg_double().
+ * We see the PageCompound set and PageTail set, so @page maybe:
+ * 1. a tail hugetlbfs page, or
+ * 2. a tail THP page, or
+ * 3. a split THP page.
*
- * If "page" is part of a slab or hugetlbfs page it cannot be
- * splitted and the head page cannot change from under us. And
- * if "page" is part of a THP page under splitting, if the
- * head page pointed by the THP tail isn't a THP head anymore,
- * we'll find PageTail clear after smp_rmb() and we'll treat
- * it as a single page.
+ * Case 3 is possible, as we may race with
+ * __split_huge_page_refcount tearing down a THP page.
*/
- if (!__compound_tail_refcounted(page_head)) {
- /*
- * If "page" is a THP tail, we must read the tail page
- * flags after the head page flags. The
- * split_huge_page side enforces write memory barriers
- * between clearing PageTail and before the head page
- * can be freed and reallocated.
- */
- smp_rmb();
- if (likely(PageTail(page))) {
- /*
- * __split_huge_page_refcount cannot race
- * here.
- */
- VM_BUG_ON_PAGE(!PageHead(page_head), page_head);
- VM_BUG_ON_PAGE(page_mapcount(page) != 0, page);
- if (put_page_testzero(page_head)) {
- /*
- * If this is the tail of a slab
- * compound page, the tail pin must
- * not be the last reference held on
- * the page, because the PG_slab
- * cannot be cleared before all tail
- * pins (which skips the _mapcount
- * tail refcounting) have been
- * released. For hugetlbfs the tail
- * pin may be the last reference on
- * the page instead, because
- * PageHeadHuge will not go away until
- * the compound page enters the buddy
- * allocator.
- */
- VM_BUG_ON_PAGE(PageSlab(page_head), page_head);
- __put_compound_page(page_head);
- }
- return;
- } else
- /*
- * __split_huge_page_refcount run before us,
- * "page" was a THP tail. The split page_head
- * has been freed and reallocated as slab or
- * hugetlbfs page of smaller order (only
- * possible if reallocated as slab on x86).
- */
- goto out_put_single;
- }
-
- if (likely(page != page_head && get_page_unless_zero(page_head))) {
- unsigned long flags;
-
- /*
- * page_head wasn't a dangling pointer but it may not
- * be a head page anymore by the time we obtain the
- * lock. That is ok as long as it can't be freed from
- * under us.
- */
- flags = compound_lock_irqsave(page_head);
- if (unlikely(!PageTail(page))) {
- /* __split_huge_page_refcount run before us */
- compound_unlock_irqrestore(page_head, flags);
- if (put_page_testzero(page_head)) {
- /*
- * The head page may have been freed
- * and reallocated as a compound page
- * of smaller order and then freed
- * again. All we know is that it
- * cannot have become: a THP page, a
- * compound page of higher order, a
- * tail page. That is because we
- * still hold the refcount of the
- * split THP tail and page_head was
- * the THP head before the split.
- */
- if (PageHead(page_head))
- __put_compound_page(page_head);
- else
- __put_single_page(page_head);
- }
-out_put_single:
- if (put_page_testzero(page))
- __put_single_page(page);
- return;
- }
- VM_BUG_ON_PAGE(page_head != page->first_page, page);
- /*
- * We can release the refcount taken by
- * get_page_unless_zero() now that
- * __split_huge_page_refcount() is blocked on the
- * compound_lock.
- */
- if (put_page_testzero(page_head))
- VM_BUG_ON_PAGE(1, page_head);
- /* __split_huge_page_refcount will wait now */
- VM_BUG_ON_PAGE(page_mapcount(page) <= 0, page);
- atomic_dec(&page->_mapcount);
- VM_BUG_ON_PAGE(atomic_read(&page_head->_count) <= 0, page_head);
- VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page);
- compound_unlock_irqrestore(page_head, flags);
-
- if (put_page_testzero(page_head)) {
- if (PageHead(page_head))
- __put_compound_page(page_head);
- else
- __put_single_page(page_head);
- }
- } else {
- /* page_head is a dangling pointer */
- VM_BUG_ON_PAGE(PageTail(page), page);
- goto out_put_single;
- }
+ page_head = compound_head(page);
+ if (!__compound_tail_refcounted(page_head))
+ put_unrefcounted_compound_page(page_head, page);
+ else
+ put_refcounted_compound_page(page_head, page);
}

void put_page(struct page *page)
--
2.0.0-rc1

2014-04-29 09:43:16

by Jianyu Zhan

[permalink] [raw]
Subject: [PATCH 3/3] mm: introdule compound_head_by_tail()

Currently, in put_compound_page(), we have such code

--- snipt ----
if (likely(!PageTail(page))) { <------ (1)
if (put_page_testzero(page)) {
/*
¦* By the time all refcounts have been released
¦* split_huge_page cannot run anymore from under us.
¦*/
if (PageHead(page))
__put_compound_page(page);
else
__put_single_page(page);
}
return;
}

/* __split_huge_page_refcount can run under us */
page_head = compound_head(page); <------------ (2)
--- snipt ---

if at (1) , we fail the check, this means page is *likely* a tail page.

Then at (2), as compoud_head(page) is inlined, it is :

--- snipt ---
static inline struct page *compound_head(struct page *page)
{
if (unlikely(PageTail(page))) { <----------- (3)
struct page *head = page->first_page;

smp_rmb();
if (likely(PageTail(page)))
return head;
}
return page;
}
--- snipt ---

here, the (3) unlikely in the case is a negative hint, because it
is *likely* a tail page. So the check (3) in this case is not good,
so I introduce a helper for this case.

So this patch introduces compound_head_by_tail() which deal with
a possible tail page(though it could be spilt by a racy thread),
and make compound_head() a wrapper on it.

This patch has no functional change, and it reduces the object
size slightly:
text data bss dec hex filename
11003 1328 16 12347 303b mm/swap.o.orig
10971 1328 16 12315 301b mm/swap.o.patched

I've ran "perf top -e branch-miss" to observe branch-miss in this
case. As Michael points out, it's a slow path, so only very few
times this case happens. But I grep'ed the code base, and found there
still are some other call sites could be benifited from this helper. And
given that it only bloating up the source by only 5 lines, but with
a reduced object size. I still believe this helper deserves to exsit.

Signed-off-by: Jianyu Zhan <[email protected]>
---
include/linux/mm.h | 29 +++++++++++++++++------------
mm/swap.c | 2 +-
2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf9811e..a606a3e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -405,20 +405,25 @@ static inline void compound_unlock_irqrestore(struct page *page,
#endif
}

+static inline struct page *compound_head_by_tail(struct page *tail)
+{
+ struct page *head = tail->first_page;
+
+ /*
+ * page->first_page may be a dangling pointer to an old
+ * compound page, so recheck that it is still a tail
+ * page before returning.
+ */
+ smp_rmb();
+ if (likely(PageTail(tail)))
+ return head;
+ return tail;
+}
+
static inline struct page *compound_head(struct page *page)
{
- if (unlikely(PageTail(page))) {
- struct page *head = page->first_page;
-
- /*
- * page->first_page may be a dangling pointer to an old
- * compound page, so recheck that it is still a tail
- * page before returning.
- */
- smp_rmb();
- if (likely(PageTail(page)))
- return head;
- }
+ if (unlikely(PageTail(page)))
+ return compound_head_by_tail(page);
return page;
}

diff --git a/mm/swap.c b/mm/swap.c
index d8654d8..a061107 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -253,7 +253,7 @@ static void put_compound_page(struct page *page)
* Case 3 is possible, as we may race with
* __split_huge_page_refcount tearing down a THP page.
*/
- page_head = compound_head(page);
+ page_head = compound_head_by_tail(page);
if (!__compound_tail_refcounted(page_head))
put_unrefcounted_compound_page(page_head, page);
else
--
2.0.0-rc1

2014-04-30 20:50:24

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/swap.c: introduce put_[un]refcounted_compound_page helpers for spliting put_compound_page

On Tue, Apr 29, 2014 at 05:42:07PM +0800, Jianyu Zhan wrote:
> Currently, put_compound_page should carefully handle tricky case
> to avoid racing with compound page releasing or spliting, which
> makes it growing quite lenthy(about 200+ lines) and need deep
> tab indention, which makes it quite hard to follow and maintain.
>
> This patch(and the next patch) tries to refactor this function.
> It is a prepared patch.
>
> Based on the code skeleton of put_compound_page:
>
> put_compound_pge:

Typo.

> if !PageTail(page)
> put head page fastpath;
> return;
>
> /* else PageTail */
> page_head = compound_head(page)
> if !__compound_tail_refcounted(page_head)
> put head page optimal path; <---(1)
> return;
> else
> put head page slowpath; <--- (2)
> return;
>
> This patch introduces two helpers, put_[un]refcounted_compound_page,
> handling the code path (1) and code path (2), respectively. They both
> are tagged __always_inline, thus it elmiates function call overhead,
> making them operating the same way as before.
>
> They are almost copied verbatim(except one place, a "goto out_put_single"
> is expanded), with some comments rephrasing.
>
> Signed-off-by: Jianyu Zhan <[email protected]>
> ---
> mm/swap.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 142 insertions(+)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index c0cd7d0..a576449 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -79,6 +79,148 @@ static void __put_compound_page(struct page *page)
> (*dtor)(page);
> }
>
> +/**
> + * Two special cases here: we could avoid taking compound_lock_irqsave
> + * and could skip the tail refcounting(in _mapcount).
> + *
> + * 1. Hugetlbfs page:
> + *
> + * PageHeadHuge will remain true until the compound page
> + * is released and enters the buddy allocator, and it could
> + * not be split by __split_huge_page_refcount().
> + *
> + * So if we see PageHeadHuge set, and we have the tail page pin,
> + * then we could safely put head page.
> + *
> + * 2. Slab THP page:

There's no such thing. It called Slab compound page.

> + *
> + * PG_slab is cleared before the slab frees the head page, and
> + * tail pin cannot be the last reference left on the head page,
> + * because the slab code is free to reuse the compound page
> + * after a kfree/kmem_cache_free without having to check if
> + * there's any tail pin left. In turn all tail pinsmust be always
> + * released while the head is still pinned by the slab code
> + * and so we know PG_slab will be still set too.
> + *
> + * So if we see PageSlab set, and we have the tail page pin,
> + * then we could safely put head page.
> + */
--
Kirill A. Shutemov

2014-04-30 20:54:23

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm/swap.c: split put_compound_page function

On Tue, Apr 29, 2014 at 05:42:28PM +0800, Jianyu Zhan wrote:
> Currently, put_compound_page should carefully handle tricky case
> to avoid racing with compound page releasing or spliting, which
> makes it growing quite lenthy(about 200+ lines) and need deep
> tab indention, which makes it quite hard to follow and maintain.
>
> Now based on two helpers introduced in the previous patch
> ("mm/swap.c: introduce put_[un]refcounted_compound_page helpers
> for spliting put_compound_page"), this patch just replaces those
> two lenthy code path with these two helpers, respectively.
> Also, it has some comment rephrasing.
>
> After this patch, the put_compound_page() will be very compact,
> thus easy to read and maintain.
>
> After spliting, the object file is of same size as the original one.
> Actually, I've diff'ed put_compound_page()'s orginal disassemble code
> and the patched disassemble code, the are 100% the same!
>
> This fact shows that this spliting has no functinal change,
> but it brings readability.
>
> This patch and the previous one blow the code by 32 lines, which
> mostly credits to comments.
>
> Signed-off-by: Jianyu Zhan <[email protected]>
> ---
> mm/swap.c | 142 +++++++-------------------------------------------------------
> 1 file changed, 16 insertions(+), 126 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index a576449..d8654d8 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -225,6 +225,11 @@ static void put_compound_page(struct page *page)
> {
> struct page *page_head;
>
> + /*
> + * We see the PageCompound set and PageTail not set, so @page maybe:
> + * 1. hugetlbfs head page, or
> + * 2. THP head page.

3. Head of slab compound page.

> + */
> if (likely(!PageTail(page))) {
> if (put_page_testzero(page)) {
> /*
> @@ -239,135 +244,20 @@ static void put_compound_page(struct page *page)
> return;
> }
>

...

> + * We see the PageCompound set and PageTail set, so @page maybe:
> + * 1. a tail hugetlbfs page, or
> + * 2. a tail THP page, or
> + * 3. a split THP page.

4. Tail of slab compound page

--
Kirill A. Shutemov