2014-04-27 13:36:04

by Jianyu Zhan

[permalink] [raw]
Subject: [PATCH RFC 1/2] mm/swap.c: split put_compound_page function

Currently, put_compound_page should carefully handle tricky case
to avoid racing with compound page releasing or spliting, which
makes it growing quite lenthy(about 200+ lines) and need deep
tab indention, which makes it quite hard to follow and maintain.

This patch tries to refactor this function, by extracting out the
fundamental logics into helper functions, making the main code path
more compact, thus easy to read and maintain. Two helper funcitons
are introduced, and are marked __always_inline, thus this patch
has no functional change(actually, the output object file is the
same size with the original one).

Besides, this patch rearranges/rewrites some comments(hope I don't
do it wrong).

Signed-off-by: Jianyu Zhan <[email protected]>
---
mm/swap.c | 227 ++++++++++++++++++++++++++++++++++++--------------------------
1 file changed, 131 insertions(+), 96 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index c0cd7d0..0d8d891 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -79,106 +79,100 @@ static void __put_compound_page(struct page *page)
(*dtor)(page);
}

-static void put_compound_page(struct page *page)
-{
- struct page *page_head;
-
- if (likely(!PageTail(page))) {
- if (put_page_testzero(page)) {
- /*
- * By the time all refcounts have been released
- * split_huge_page cannot run anymore from under us.
- */
- if (PageHead(page))
- __put_compound_page(page);
- else
- __put_single_page(page);
- }
- return;
- }
-
- /* __split_huge_page_refcount can run under us */
- page_head = compound_head(page);

+/**
+ * Two special cases here: we could avoid taking compound_lock_irqsave
+ * and could skip the tail refcounting(in _mapcount).
+ *
+ * 1. Hugetlbfs page:
+ *
+ * PageHeadHuge will remain true until the compound page
+ * is released and enters the buddy allocator, and it could
+ * not be split by __split_huge_page_refcount().
+ *
+ * So if we see PageHeadHuge set, and we have the tail page pin,
+ * then we could safely put head page.
+ *
+ * 2. Slab THP page:
+ *
+ * PG_slab is cleared before the slab frees the head page, and
+ * tail pin cannot be the last reference left on the head page,
+ * because the slab code is free to reuse the compound page
+ * after a kfree/kmem_cache_free without having to check if
+ * there's any tail pin left. In turn all tail pinsmust be always
+ * released while the head is still pinned by the slab code
+ * and so we know PG_slab will be still set too.
+ *
+ * So if we see PageSlab set, and we have the tail page pin,
+ * then we could safely put head page.
+ */
+static __always_inline void put_unrefcounted_compound_page(struct page *head_page,
+ struct page *page)
+{
/*
- * THP can not break up slab pages so avoid taking
- * compound_lock() and skip the tail page refcounting (in
- * _mapcount) too. Slab performs non-atomic bit ops on
- * page->flags for better performance. In particular
- * slab_unlock() in slub used to be a hot path. It is still
- * hot on arches that do not support
- * this_cpu_cmpxchg_double().
- *
- * If "page" is part of a slab or hugetlbfs page it cannot be
- * splitted and the head page cannot change from under us. And
- * if "page" is part of a THP page under splitting, if the
- * head page pointed by the THP tail isn't a THP head anymore,
- * we'll find PageTail clear after smp_rmb() and we'll treat
- * it as a single page.
+ * If @page is a THP tail, we must read the tail page
+ * flags after the head page flags. The
+ * __split_huge_page_refcount side enforces write memory barriers
+ * between clearing PageTail and before the head page
+ * can be freed and reallocated.
*/
- if (!__compound_tail_refcounted(page_head)) {
+ smp_rmb();
+ if (likely(PageTail(page))) {
/*
- * If "page" is a THP tail, we must read the tail page
- * flags after the head page flags. The
- * split_huge_page side enforces write memory barriers
- * between clearing PageTail and before the head page
- * can be freed and reallocated.
+ * __split_huge_page_refcount cannot race
+ * here, see the comment above this function.
*/
- smp_rmb();
- if (likely(PageTail(page))) {
- /*
- * __split_huge_page_refcount cannot race
- * here.
- */
- VM_BUG_ON_PAGE(!PageHead(page_head), page_head);
- VM_BUG_ON_PAGE(page_mapcount(page) != 0, page);
- if (put_page_testzero(page_head)) {
- /*
- * If this is the tail of a slab
- * compound page, the tail pin must
- * not be the last reference held on
- * the page, because the PG_slab
- * cannot be cleared before all tail
- * pins (which skips the _mapcount
- * tail refcounting) have been
- * released. For hugetlbfs the tail
- * pin may be the last reference on
- * the page instead, because
- * PageHeadHuge will not go away until
- * the compound page enters the buddy
- * allocator.
- */
- VM_BUG_ON_PAGE(PageSlab(page_head), page_head);
- __put_compound_page(page_head);
- }
- return;
- } else
+ VM_BUG_ON_PAGE(!PageHead(head_page), head_page);
+ VM_BUG_ON_PAGE(page_mapcount(page) != 0, page);
+ if (put_page_testzero(head_page)) {
/*
- * __split_huge_page_refcount run before us,
- * "page" was a THP tail. The split page_head
- * has been freed and reallocated as slab or
- * hugetlbfs page of smaller order (only
- * possible if reallocated as slab on x86).
+ * If this is the tail of a slab THP page,
+ * the tail pin must not be the last reference
+ * held on the page, because the PG_slab cannot
+ * be cleared before all tail pins (which skips
+ * the _mapcount tail refcounting) have been
+ * released.
+ *
+ * If this is the tail of a hugetlbfs page,
+ * the tail pin may be the last reference on
+ * the page instead, because PageHeadHuge will
+ * not go away until the compound page enters
+ * the buddy allocator.
*/
- goto out_put_single;
- }
+ VM_BUG_ON_PAGE(PageSlab(head_page), head_page);
+ __put_compound_page(head_page);
+ }
+ } else
+ /*
+ * __split_huge_page_refcount run before us,
+ * @page was a THP tail. The split @head_page
+ * has been freed and reallocated as slab or
+ * hugetlbfs page of smaller order (only
+ * possible if reallocated as slab on x86).
+ */
+ if (put_page_testzero(page))
+ __put_single_page(page);
+}

- if (likely(page != page_head && get_page_unless_zero(page_head))) {
+static __always_inline void put_refcounted_compound_page(struct page *head_page,
+ struct page *page)
+{
+ if (likely(page != head_page && get_page_unless_zero(head_page))) {
unsigned long flags;

/*
- * page_head wasn't a dangling pointer but it may not
+ * @page_head wasn't a dangling pointer but it may not
* be a head page anymore by the time we obtain the
* lock. That is ok as long as it can't be freed from
* under us.
*/
- flags = compound_lock_irqsave(page_head);
+ flags = compound_lock_irqsave(head_page);
if (unlikely(!PageTail(page))) {
/* __split_huge_page_refcount run before us */
- compound_unlock_irqrestore(page_head, flags);
- if (put_page_testzero(page_head)) {
+ compound_unlock_irqrestore(head_page, flags);
+ if (put_page_testzero(head_page)) {
/*
- * The head page may have been freed
+ * The @head_page may have been freed
* and reallocated as a compound page
* of smaller order and then freed
* again. All we know is that it
@@ -186,48 +180,89 @@ static void put_compound_page(struct page *page)
* compound page of higher order, a
* tail page. That is because we
* still hold the refcount of the
- * split THP tail and page_head was
+ * split THP tail and head_page was
* the THP head before the split.
*/
- if (PageHead(page_head))
- __put_compound_page(page_head);
+ if (PageHead(head_page))
+ __put_compound_page(head_page);
else
- __put_single_page(page_head);
+ __put_single_page(head_page);
}
out_put_single:
if (put_page_testzero(page))
__put_single_page(page);
return;
}
- VM_BUG_ON_PAGE(page_head != page->first_page, page);
+ VM_BUG_ON_PAGE(head_page != page->first_page, page);
/*
* We can release the refcount taken by
* get_page_unless_zero() now that
* __split_huge_page_refcount() is blocked on the
* compound_lock.
*/
- if (put_page_testzero(page_head))
- VM_BUG_ON_PAGE(1, page_head);
+ if (put_page_testzero(head_page))
+ VM_BUG_ON_PAGE(1, head_page);
/* __split_huge_page_refcount will wait now */
VM_BUG_ON_PAGE(page_mapcount(page) <= 0, page);
atomic_dec(&page->_mapcount);
- VM_BUG_ON_PAGE(atomic_read(&page_head->_count) <= 0, page_head);
+ VM_BUG_ON_PAGE(atomic_read(&head_page->_count) <= 0, head_page);
VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page);
- compound_unlock_irqrestore(page_head, flags);
+ compound_unlock_irqrestore(head_page, flags);

- if (put_page_testzero(page_head)) {
- if (PageHead(page_head))
- __put_compound_page(page_head);
+ if (put_page_testzero(head_page)) {
+ if (PageHead(head_page))
+ __put_compound_page(head_page);
else
- __put_single_page(page_head);
+ __put_single_page(head_page);
}
} else {
- /* page_head is a dangling pointer */
+ /* @head_page is a dangling pointer */
VM_BUG_ON_PAGE(PageTail(page), page);
goto out_put_single;
}
}

+
+static void put_compound_page(struct page *page)
+{
+ struct page *head_page;
+
+ /*
+ * We see the PageCompound set and PageTail not set, so @page maybe:
+ * 1. hugetlbfs head page, or
+ * 2. THP head page, or
+ */
+ if (likely(!PageTail(page))) {
+ if (put_page_testzero(page)) {
+ /*
+ * By the time all refcounts have been released
+ * __split_huge_page_refcount cannot run anymore
+ * from under us.
+ */
+ if (PageHead(page))
+ __put_compound_page(page);
+ else
+ __put_single_page(page);
+ }
+ return;
+ }
+
+ /*
+ * We see the PageCompound set and PageTail set, so @page maybe:
+ * 1. a tail hugetlbfs page, or
+ * 2. a tail THP page, or
+ * 3. a split THP page.
+ *
+ * Case 3 is possible, as we may race with
+ * __split_huge_page_refcount tearing down a THP page.
+ */
+ head_page = compound_head(page);
+ if (!__compound_tail_refcounted(head_page))
+ put_unrefcounted_compound_page(head_page, page);
+ else
+ put_refcounted_compound_page(head_page, page);
+}
+
void put_page(struct page *page)
{
if (unlikely(PageCompound(page)))
--
2.0.0-rc1


2014-04-27 13:36:56

by Jianyu Zhan

[permalink] [raw]
Subject: [PATCH RFC 2/2] mm: introdule compound_head_by_tail()

In put_comound_page(), we call compound_head() after !PageTail
check fails, so in compound_head() PageTail is quite likely to
be true, but instead it is checked with:

if (unlikely(PageTail(page)))

in this case, this unlikely macro is a negative hint for compiler.

So this patch introduce compound_head_by_tail() which deal with
a possible tail page(though it could be spilt by a racy thread),
and make compound_head() a wrapper on it.

Signed-off-by: Jianyu Zhan <[email protected]>
---
include/linux/mm.h | 34 ++++++++++++++++++++++------------
mm/swap.c | 2 +-
2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf9811e..1bc7baf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -405,20 +405,30 @@ static inline void compound_unlock_irqrestore(struct page *page,
#endif
}

+/**
+ * Note: this function must be called on a possible tail page,
+ * this tail page may not be tail anymore upon we calling this funciton,
+ * because we may race with __split_huge_page_refcount tearing down it.
+ */
+static inline struct page *compound_head_by_tail(struct page *page)
+{
+ struct page *head = page->first_page;
+
+ /*
+ * page->first_page may be a dangling pointer to an old
+ * compound page, so recheck that it is still a tail
+ * page before returning.
+ */
+ smp_rmb();
+ if (likely(PageTail(page)))
+ return head;
+ return page;
+}
+
static inline struct page *compound_head(struct page *page)
{
- if (unlikely(PageTail(page))) {
- struct page *head = page->first_page;
-
- /*
- * page->first_page may be a dangling pointer to an old
- * compound page, so recheck that it is still a tail
- * page before returning.
- */
- smp_rmb();
- if (likely(PageTail(page)))
- return head;
- }
+ if (unlikely(PageTail(page)))
+ return compound_head_by_tail(page);
return page;
}

diff --git a/mm/swap.c b/mm/swap.c
index 0d8d891..0b05355 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -256,7 +256,7 @@ static void put_compound_page(struct page *page)
* Case 3 is possible, as we may race with
* __split_huge_page_refcount tearing down a THP page.
*/
- head_page = compound_head(page);
+ head_page = compound_head_by_tail(page);
if (!__compound_tail_refcounted(head_page))
put_unrefcounted_compound_page(head_page, page);
else
--
2.0.0-rc1

2014-04-28 14:54:45

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2] mm: introdule compound_head_by_tail()

On Sun 27-04-14 21:36:24, Jianyu Zhan wrote:
> In put_comound_page(), we call compound_head() after !PageTail
> check fails, so in compound_head() PageTail is quite likely to
> be true, but instead it is checked with:
>
> if (unlikely(PageTail(page)))
>
> in this case, this unlikely macro is a negative hint for compiler.
>
> So this patch introduce compound_head_by_tail() which deal with
> a possible tail page(though it could be spilt by a racy thread),
> and make compound_head() a wrapper on it.
>
> Signed-off-by: Jianyu Zhan <[email protected]>

I really fail to see how that helps. compound_head is inlined and the
compiler should be clever enough to optimize the code properly. I
haven't tried that to be honest but this looks like it only adds a code
without any good reason. And I really hate the new name as well. What
does it suppose to mean?

> ---
> include/linux/mm.h | 34 ++++++++++++++++++++++------------
> mm/swap.c | 2 +-
> 2 files changed, 23 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bf9811e..1bc7baf 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -405,20 +405,30 @@ static inline void compound_unlock_irqrestore(struct page *page,
> #endif
> }
>
> +/**
> + * Note: this function must be called on a possible tail page,
> + * this tail page may not be tail anymore upon we calling this funciton,
> + * because we may race with __split_huge_page_refcount tearing down it.
> + */
> +static inline struct page *compound_head_by_tail(struct page *page)
> +{
> + struct page *head = page->first_page;
> +
> + /*
> + * page->first_page may be a dangling pointer to an old
> + * compound page, so recheck that it is still a tail
> + * page before returning.
> + */
> + smp_rmb();
> + if (likely(PageTail(page)))
> + return head;
> + return page;
> +}
> +
> static inline struct page *compound_head(struct page *page)
> {
> - if (unlikely(PageTail(page))) {
> - struct page *head = page->first_page;
> -
> - /*
> - * page->first_page may be a dangling pointer to an old
> - * compound page, so recheck that it is still a tail
> - * page before returning.
> - */
> - smp_rmb();
> - if (likely(PageTail(page)))
> - return head;
> - }
> + if (unlikely(PageTail(page)))
> + return compound_head_by_tail(page);
> return page;
> }
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 0d8d891..0b05355 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -256,7 +256,7 @@ static void put_compound_page(struct page *page)
> * Case 3 is possible, as we may race with
> * __split_huge_page_refcount tearing down a THP page.
> */
> - head_page = compound_head(page);
> + head_page = compound_head_by_tail(page);
> if (!__compound_tail_refcounted(head_page))
> put_unrefcounted_compound_page(head_page, page);
> else
> --
> 2.0.0-rc1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Michal Hocko
SUSE Labs

2014-04-28 15:00:41

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] mm/swap.c: split put_compound_page function

On Sun 27-04-14 21:35:50, Jianyu Zhan wrote:
> Currently, put_compound_page should carefully handle tricky case
> to avoid racing with compound page releasing or spliting, which
> makes it growing quite lenthy(about 200+ lines) and need deep
> tab indention, which makes it quite hard to follow and maintain.
>
> This patch tries to refactor this function, by extracting out the
> fundamental logics into helper functions, making the main code path
> more compact, thus easy to read and maintain. Two helper funcitons
> are introduced, and are marked __always_inline, thus this patch
> has no functional change(actually, the output object file is the
> same size with the original one).
>
> Besides, this patch rearranges/rewrites some comments(hope I don't
> do it wrong).

This is a big change and really hard to review to be honest. Maybe a
split up would make it easier to follow.

> Signed-off-by: Jianyu Zhan <[email protected]>
> ---
> mm/swap.c | 227 ++++++++++++++++++++++++++++++++++++--------------------------
> 1 file changed, 131 insertions(+), 96 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index c0cd7d0..0d8d891 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -79,106 +79,100 @@ static void __put_compound_page(struct page *page)
> (*dtor)(page);
> }
>
> -static void put_compound_page(struct page *page)
> -{
> - struct page *page_head;
> -
> - if (likely(!PageTail(page))) {
> - if (put_page_testzero(page)) {
> - /*
> - * By the time all refcounts have been released
> - * split_huge_page cannot run anymore from under us.
> - */
> - if (PageHead(page))
> - __put_compound_page(page);
> - else
> - __put_single_page(page);
> - }
> - return;
> - }
> -
> - /* __split_huge_page_refcount can run under us */
> - page_head = compound_head(page);
>
> +/**
> + * Two special cases here: we could avoid taking compound_lock_irqsave
> + * and could skip the tail refcounting(in _mapcount).
> + *
> + * 1. Hugetlbfs page:
> + *
> + * PageHeadHuge will remain true until the compound page
> + * is released and enters the buddy allocator, and it could
> + * not be split by __split_huge_page_refcount().
> + *
> + * So if we see PageHeadHuge set, and we have the tail page pin,
> + * then we could safely put head page.
> + *
> + * 2. Slab THP page:
> + *
> + * PG_slab is cleared before the slab frees the head page, and
> + * tail pin cannot be the last reference left on the head page,
> + * because the slab code is free to reuse the compound page
> + * after a kfree/kmem_cache_free without having to check if
> + * there's any tail pin left. In turn all tail pinsmust be always
> + * released while the head is still pinned by the slab code
> + * and so we know PG_slab will be still set too.
> + *
> + * So if we see PageSlab set, and we have the tail page pin,
> + * then we could safely put head page.
> + */
> +static __always_inline void put_unrefcounted_compound_page(struct page *head_page,
> + struct page *page)
> +{
> /*
> - * THP can not break up slab pages so avoid taking
> - * compound_lock() and skip the tail page refcounting (in
> - * _mapcount) too. Slab performs non-atomic bit ops on
> - * page->flags for better performance. In particular
> - * slab_unlock() in slub used to be a hot path. It is still
> - * hot on arches that do not support
> - * this_cpu_cmpxchg_double().
> - *
> - * If "page" is part of a slab or hugetlbfs page it cannot be
> - * splitted and the head page cannot change from under us. And
> - * if "page" is part of a THP page under splitting, if the
> - * head page pointed by the THP tail isn't a THP head anymore,
> - * we'll find PageTail clear after smp_rmb() and we'll treat
> - * it as a single page.
> + * If @page is a THP tail, we must read the tail page
> + * flags after the head page flags. The
> + * __split_huge_page_refcount side enforces write memory barriers
> + * between clearing PageTail and before the head page
> + * can be freed and reallocated.
> */
> - if (!__compound_tail_refcounted(page_head)) {
> + smp_rmb();
> + if (likely(PageTail(page))) {
> /*
> - * If "page" is a THP tail, we must read the tail page
> - * flags after the head page flags. The
> - * split_huge_page side enforces write memory barriers
> - * between clearing PageTail and before the head page
> - * can be freed and reallocated.
> + * __split_huge_page_refcount cannot race
> + * here, see the comment above this function.
> */
> - smp_rmb();
> - if (likely(PageTail(page))) {
> - /*
> - * __split_huge_page_refcount cannot race
> - * here.
> - */
> - VM_BUG_ON_PAGE(!PageHead(page_head), page_head);
> - VM_BUG_ON_PAGE(page_mapcount(page) != 0, page);
> - if (put_page_testzero(page_head)) {
> - /*
> - * If this is the tail of a slab
> - * compound page, the tail pin must
> - * not be the last reference held on
> - * the page, because the PG_slab
> - * cannot be cleared before all tail
> - * pins (which skips the _mapcount
> - * tail refcounting) have been
> - * released. For hugetlbfs the tail
> - * pin may be the last reference on
> - * the page instead, because
> - * PageHeadHuge will not go away until
> - * the compound page enters the buddy
> - * allocator.
> - */
> - VM_BUG_ON_PAGE(PageSlab(page_head), page_head);
> - __put_compound_page(page_head);
> - }
> - return;
> - } else
> + VM_BUG_ON_PAGE(!PageHead(head_page), head_page);
> + VM_BUG_ON_PAGE(page_mapcount(page) != 0, page);
> + if (put_page_testzero(head_page)) {
> /*
> - * __split_huge_page_refcount run before us,
> - * "page" was a THP tail. The split page_head
> - * has been freed and reallocated as slab or
> - * hugetlbfs page of smaller order (only
> - * possible if reallocated as slab on x86).
> + * If this is the tail of a slab THP page,
> + * the tail pin must not be the last reference
> + * held on the page, because the PG_slab cannot
> + * be cleared before all tail pins (which skips
> + * the _mapcount tail refcounting) have been
> + * released.
> + *
> + * If this is the tail of a hugetlbfs page,
> + * the tail pin may be the last reference on
> + * the page instead, because PageHeadHuge will
> + * not go away until the compound page enters
> + * the buddy allocator.
> */
> - goto out_put_single;
> - }
> + VM_BUG_ON_PAGE(PageSlab(head_page), head_page);
> + __put_compound_page(head_page);
> + }
> + } else
> + /*
> + * __split_huge_page_refcount run before us,
> + * @page was a THP tail. The split @head_page
> + * has been freed and reallocated as slab or
> + * hugetlbfs page of smaller order (only
> + * possible if reallocated as slab on x86).
> + */
> + if (put_page_testzero(page))
> + __put_single_page(page);
> +}
>
> - if (likely(page != page_head && get_page_unless_zero(page_head))) {
> +static __always_inline void put_refcounted_compound_page(struct page *head_page,
> + struct page *page)
> +{
> + if (likely(page != head_page && get_page_unless_zero(head_page))) {
> unsigned long flags;
>
> /*
> - * page_head wasn't a dangling pointer but it may not
> + * @page_head wasn't a dangling pointer but it may not
> * be a head page anymore by the time we obtain the
> * lock. That is ok as long as it can't be freed from
> * under us.
> */
> - flags = compound_lock_irqsave(page_head);
> + flags = compound_lock_irqsave(head_page);
> if (unlikely(!PageTail(page))) {
> /* __split_huge_page_refcount run before us */
> - compound_unlock_irqrestore(page_head, flags);
> - if (put_page_testzero(page_head)) {
> + compound_unlock_irqrestore(head_page, flags);
> + if (put_page_testzero(head_page)) {
> /*
> - * The head page may have been freed
> + * The @head_page may have been freed
> * and reallocated as a compound page
> * of smaller order and then freed
> * again. All we know is that it
> @@ -186,48 +180,89 @@ static void put_compound_page(struct page *page)
> * compound page of higher order, a
> * tail page. That is because we
> * still hold the refcount of the
> - * split THP tail and page_head was
> + * split THP tail and head_page was
> * the THP head before the split.
> */
> - if (PageHead(page_head))
> - __put_compound_page(page_head);
> + if (PageHead(head_page))
> + __put_compound_page(head_page);
> else
> - __put_single_page(page_head);
> + __put_single_page(head_page);
> }
> out_put_single:
> if (put_page_testzero(page))
> __put_single_page(page);
> return;
> }
> - VM_BUG_ON_PAGE(page_head != page->first_page, page);
> + VM_BUG_ON_PAGE(head_page != page->first_page, page);
> /*
> * We can release the refcount taken by
> * get_page_unless_zero() now that
> * __split_huge_page_refcount() is blocked on the
> * compound_lock.
> */
> - if (put_page_testzero(page_head))
> - VM_BUG_ON_PAGE(1, page_head);
> + if (put_page_testzero(head_page))
> + VM_BUG_ON_PAGE(1, head_page);
> /* __split_huge_page_refcount will wait now */
> VM_BUG_ON_PAGE(page_mapcount(page) <= 0, page);
> atomic_dec(&page->_mapcount);
> - VM_BUG_ON_PAGE(atomic_read(&page_head->_count) <= 0, page_head);
> + VM_BUG_ON_PAGE(atomic_read(&head_page->_count) <= 0, head_page);
> VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page);
> - compound_unlock_irqrestore(page_head, flags);
> + compound_unlock_irqrestore(head_page, flags);
>
> - if (put_page_testzero(page_head)) {
> - if (PageHead(page_head))
> - __put_compound_page(page_head);
> + if (put_page_testzero(head_page)) {
> + if (PageHead(head_page))
> + __put_compound_page(head_page);
> else
> - __put_single_page(page_head);
> + __put_single_page(head_page);
> }
> } else {
> - /* page_head is a dangling pointer */
> + /* @head_page is a dangling pointer */
> VM_BUG_ON_PAGE(PageTail(page), page);
> goto out_put_single;
> }
> }
>
> +
> +static void put_compound_page(struct page *page)
> +{
> + struct page *head_page;
> +
> + /*
> + * We see the PageCompound set and PageTail not set, so @page maybe:
> + * 1. hugetlbfs head page, or
> + * 2. THP head page, or
> + */
> + if (likely(!PageTail(page))) {
> + if (put_page_testzero(page)) {
> + /*
> + * By the time all refcounts have been released
> + * __split_huge_page_refcount cannot run anymore
> + * from under us.
> + */
> + if (PageHead(page))
> + __put_compound_page(page);
> + else
> + __put_single_page(page);
> + }
> + return;
> + }
> +
> + /*
> + * We see the PageCompound set and PageTail set, so @page maybe:
> + * 1. a tail hugetlbfs page, or
> + * 2. a tail THP page, or
> + * 3. a split THP page.
> + *
> + * Case 3 is possible, as we may race with
> + * __split_huge_page_refcount tearing down a THP page.
> + */
> + head_page = compound_head(page);
> + if (!__compound_tail_refcounted(head_page))
> + put_unrefcounted_compound_page(head_page, page);
> + else
> + put_refcounted_compound_page(head_page, page);
> +}
> +
> void put_page(struct page *page)
> {
> if (unlikely(PageCompound(page)))
> --
> 2.0.0-rc1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Michal Hocko
SUSE Labs

2014-04-28 15:54:14

by Jianyu Zhan

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2] mm: introdule compound_head_by_tail()

Hi, Michal,

On Mon, Apr 28, 2014 at 10:54 PM, Michal Hocko <[email protected]> wrote:
> I really fail to see how that helps. compound_head is inlined and the
> compiler should be clever enough to optimize the code properly. I
> haven't tried that to be honest but this looks like it only adds a code
> without any good reason. And I really hate the new name as well. What
> does it suppose to mean?

the code in question is as below:

--- snipt ----
if (likely(!PageTail(page))) { <------ (1)
if (put_page_testzero(page)) {
/*
¦* By the time all refcounts have been released
¦* split_huge_page cannot run anymore from under us.
¦*/
if (PageHead(page))
__put_compound_page(page);
else
__put_single_page(page);
}
return;
}

/* __split_huge_page_refcount can run under us */
page_head = compound_head(page); <------------ (2)
--- snipt ---

if at (1) , we fail the check, this means page is *likely* a tail page.

Then at (2), yes, compoud_head(page) is inlined, it is :

--- snipt ---
static inline struct page *compound_head(struct page *page)
{
if (unlikely(PageTail(page))) { <----------- (3)
struct page *head = page->first_page;

smp_rmb();
if (likely(PageTail(page)))
return head;
}
return page;
}
--- snipt ---

here, the (3) unlikely in the case is a negative hint, because it
is *likely* a tail page. So the check (3) in this case is not good,
so I introduce a helper for this case.

Actually, I checked the assembled code, the compiler is _not_
so smart to recognize this case. It just does optimization as
the hint unlikely() told it.



Thanks,
Jianyu Zhan

2014-04-28 15:55:33

by Jianyu Zhan

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] mm/swap.c: split put_compound_page function

On Mon, Apr 28, 2014 at 11:00 PM, Michal Hocko <[email protected]> wrote:
> This is a big change and really hard to review to be honest. Maybe a
> split up would make it easier to follow.

Ok, actually it is quite simple, but the diff looks messy, I will try
to split up
this patch to several phases.

Thanks,
Jianyu Zhan

2014-04-28 15:56:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2] mm: introdule compound_head_by_tail()

On Mon, Apr 28, 2014 at 11:53:28PM +0800, Jianyu Zhan wrote:
> Actually, I checked the assembled code, the compiler is _not_
> so smart to recognize this case. It just does optimization as
> the hint unlikely() told it.

What version, and why didn't your changelog include this useful
information?

2014-04-28 16:24:28

by Jianyu Zhan

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2] mm: introdule compound_head_by_tail()

On Mon, Apr 28, 2014 at 11:55 PM, Peter Zijlstra <[email protected]> wrote:
> What version,

the code snipt in question is extracted from v3.15-rc3.


for the (1) check in previous email, its assembled code looks like:

--- (1) snipt ---
mov (%rdi),%rax (a)
test $0x80,%ah (b)
jne 754 <put_compound_page+0x74> (c)
--- (1) snipt ---

(a) %rdi is the struct page pointer
(b) check if PG_tail(0x80) set(likely not set, we tell the compiler)
(c) if set, jump; not set, fall through (good, credit to our hint)

===================================================

for the (3) check in previous email, its assembled code looks like:

--- (3) snipt ---
mov (%rdi),%rax (A)
mov %rdi,%r12
test $0x80,%ah (B)
jne 8f8 <put_compound_page+0x218> (C)
--- (3) snipt ---

(A) %rdi is the struct page pointer
(B) check if PG_tail(0x80) set(likely set in this case, but we
tell compiler unlikely)
(C) if set, jump; not set, fall through (god! it would better
not jump if set, but we
tell compiler unlikely, so it happily did as we told it)


# all code are compiled by gcc (GCC) 4.8.2

> and why didn't your changelog include this useful information?

Sorry, I would have done so. I will resend the patch.

Thanks,
Jianyu Zhan

2014-04-28 16:44:24

by Khalid Aziz

[permalink] [raw]
Subject: Re: [PATCH RFC 1/2] mm/swap.c: split put_compound_page function

On 04/27/2014 07:35 AM, Jianyu Zhan wrote:
> Currently, put_compound_page should carefully handle tricky case
> to avoid racing with compound page releasing or spliting, which
> makes it growing quite lenthy(about 200+ lines) and need deep
> tab indention, which makes it quite hard to follow and maintain.
>
> This patch tries to refactor this function, by extracting out the
> fundamental logics into helper functions, making the main code path
> more compact, thus easy to read and maintain. Two helper funcitons
> are introduced, and are marked __always_inline, thus this patch
> has no functional change(actually, the output object file is the
> same size with the original one).

We recently went through a clean up of this code and it was made more
readable. I do not particularly see any strong need to refactor at this
point, but I am not opposed to it either.

Some comments inline below.

--
Khalid

>
> Besides, this patch rearranges/rewrites some comments(hope I don't
> do it wrong).
>
> Signed-off-by: Jianyu Zhan <[email protected]>
> ---
> mm/swap.c | 227 ++++++++++++++++++++++++++++++++++++--------------------------
> 1 file changed, 131 insertions(+), 96 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index c0cd7d0..0d8d891 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -79,106 +79,100 @@ static void __put_compound_page(struct page *page)
> (*dtor)(page);
> }
>
> -static void put_compound_page(struct page *page)
> -{
> - struct page *page_head;
> -
> - if (likely(!PageTail(page))) {
> - if (put_page_testzero(page)) {
> - /*
> - * By the time all refcounts have been released
> - * split_huge_page cannot run anymore from under us.
> - */
> - if (PageHead(page))
> - __put_compound_page(page);
> - else
> - __put_single_page(page);
> - }
> - return;
> - }
> -
> - /* __split_huge_page_refcount can run under us */
> - page_head = compound_head(page);
>
> +/**
> + * Two special cases here: we could avoid taking compound_lock_irqsave
> + * and could skip the tail refcounting(in _mapcount).
> + *
> + * 1. Hugetlbfs page:
> + *
> + * PageHeadHuge will remain true until the compound page
> + * is released and enters the buddy allocator, and it could
> + * not be split by __split_huge_page_refcount().
> + *
> + * So if we see PageHeadHuge set, and we have the tail page pin,
> + * then we could safely put head page.
> + *
> + * 2. Slab THP page:
> + *
> + * PG_slab is cleared before the slab frees the head page, and
> + * tail pin cannot be the last reference left on the head page,
> + * because the slab code is free to reuse the compound page
> + * after a kfree/kmem_cache_free without having to check if
> + * there's any tail pin left. In turn all tail pinsmust be always
> + * released while the head is still pinned by the slab code
> + * and so we know PG_slab will be still set too.
> + *
> + * So if we see PageSlab set, and we have the tail page pin,
> + * then we could safely put head page.
> + */
> +static __always_inline void put_unrefcounted_compound_page(struct page *head_page,
> + struct page *page)
> +{
> /*
> - * THP can not break up slab pages so avoid taking
> - * compound_lock() and skip the tail page refcounting (in
> - * _mapcount) too. Slab performs non-atomic bit ops on
> - * page->flags for better performance. In particular
> - * slab_unlock() in slub used to be a hot path. It is still
> - * hot on arches that do not support
> - * this_cpu_cmpxchg_double().
> - *
> - * If "page" is part of a slab or hugetlbfs page it cannot be
> - * splitted and the head page cannot change from under us. And
> - * if "page" is part of a THP page under splitting, if the
> - * head page pointed by the THP tail isn't a THP head anymore,
> - * we'll find PageTail clear after smp_rmb() and we'll treat
> - * it as a single page.
> + * If @page is a THP tail, we must read the tail page
> + * flags after the head page flags. The
> + * __split_huge_page_refcount side enforces write memory barriers
> + * between clearing PageTail and before the head page
> + * can be freed and reallocated.
> */
> - if (!__compound_tail_refcounted(page_head)) {
> + smp_rmb();
> + if (likely(PageTail(page))) {
> /*
> - * If "page" is a THP tail, we must read the tail page
> - * flags after the head page flags. The
> - * split_huge_page side enforces write memory barriers
> - * between clearing PageTail and before the head page
> - * can be freed and reallocated.
> + * __split_huge_page_refcount cannot race
> + * here, see the comment above this function.
> */
> - smp_rmb();
> - if (likely(PageTail(page))) {
> - /*
> - * __split_huge_page_refcount cannot race
> - * here.
> - */
> - VM_BUG_ON_PAGE(!PageHead(page_head), page_head);
> - VM_BUG_ON_PAGE(page_mapcount(page) != 0, page);
> - if (put_page_testzero(page_head)) {
> - /*
> - * If this is the tail of a slab
> - * compound page, the tail pin must
> - * not be the last reference held on
> - * the page, because the PG_slab
> - * cannot be cleared before all tail
> - * pins (which skips the _mapcount
> - * tail refcounting) have been
> - * released. For hugetlbfs the tail
> - * pin may be the last reference on
> - * the page instead, because
> - * PageHeadHuge will not go away until
> - * the compound page enters the buddy
> - * allocator.
> - */
> - VM_BUG_ON_PAGE(PageSlab(page_head), page_head);
> - __put_compound_page(page_head);
> - }
> - return;
> - } else
> + VM_BUG_ON_PAGE(!PageHead(head_page), head_page);

Any reason to rename page_head to head_page? page_head makes perfect
sense. This change just causes patch to be longer.

> + VM_BUG_ON_PAGE(page_mapcount(page) != 0, page);
> + if (put_page_testzero(head_page)) {
> /*
> - * __split_huge_page_refcount run before us,
> - * "page" was a THP tail. The split page_head
> - * has been freed and reallocated as slab or
> - * hugetlbfs page of smaller order (only
> - * possible if reallocated as slab on x86).
> + * If this is the tail of a slab THP page,
> + * the tail pin must not be the last reference
> + * held on the page, because the PG_slab cannot
> + * be cleared before all tail pins (which skips
> + * the _mapcount tail refcounting) have been
> + * released.
> + *
> + * If this is the tail of a hugetlbfs page,
> + * the tail pin may be the last reference on
> + * the page instead, because PageHeadHuge will
> + * not go away until the compound page enters
> + * the buddy allocator.
> */
> - goto out_put_single;
> - }
> + VM_BUG_ON_PAGE(PageSlab(head_page), head_page);
> + __put_compound_page(head_page);
> + }
> + } else
> + /*
> + * __split_huge_page_refcount run before us,
> + * @page was a THP tail. The split @head_page
> + * has been freed and reallocated as slab or
> + * hugetlbfs page of smaller order (only
> + * possible if reallocated as slab on x86).
> + */
> + if (put_page_testzero(page))
> + __put_single_page(page);
> +}
>
> - if (likely(page != page_head && get_page_unless_zero(page_head))) {
> +static __always_inline void put_refcounted_compound_page(struct page *head_page,
> + struct page *page)
> +{
> + if (likely(page != head_page && get_page_unless_zero(head_page))) {
> unsigned long flags;
>
> /*
> - * page_head wasn't a dangling pointer but it may not
> + * @page_head wasn't a dangling pointer but it may not
^^^^^^^^^^
Your patch renames this to head_page.

> * be a head page anymore by the time we obtain the
> * lock. That is ok as long as it can't be freed from
> * under us.
> */
> - flags = compound_lock_irqsave(page_head);
> + flags = compound_lock_irqsave(head_page);
> if (unlikely(!PageTail(page))) {
> /* __split_huge_page_refcount run before us */
> - compound_unlock_irqrestore(page_head, flags);
> - if (put_page_testzero(page_head)) {
> + compound_unlock_irqrestore(head_page, flags);
> + if (put_page_testzero(head_page)) {
> /*
> - * The head page may have been freed
> + * The @head_page may have been freed
> * and reallocated as a compound page
> * of smaller order and then freed
> * again. All we know is that it
> @@ -186,48 +180,89 @@ static void put_compound_page(struct page *page)
> * compound page of higher order, a
> * tail page. That is because we
> * still hold the refcount of the
> - * split THP tail and page_head was
> + * split THP tail and head_page was
> * the THP head before the split.
> */
> - if (PageHead(page_head))
> - __put_compound_page(page_head);
> + if (PageHead(head_page))
> + __put_compound_page(head_page);
> else
> - __put_single_page(page_head);
> + __put_single_page(head_page);
> }
> out_put_single:
> if (put_page_testzero(page))
> __put_single_page(page);
> return;
> }
> - VM_BUG_ON_PAGE(page_head != page->first_page, page);
> + VM_BUG_ON_PAGE(head_page != page->first_page, page);
> /*
> * We can release the refcount taken by
> * get_page_unless_zero() now that
> * __split_huge_page_refcount() is blocked on the
> * compound_lock.
> */
> - if (put_page_testzero(page_head))
> - VM_BUG_ON_PAGE(1, page_head);
> + if (put_page_testzero(head_page))
> + VM_BUG_ON_PAGE(1, head_page);
> /* __split_huge_page_refcount will wait now */
> VM_BUG_ON_PAGE(page_mapcount(page) <= 0, page);
> atomic_dec(&page->_mapcount);
> - VM_BUG_ON_PAGE(atomic_read(&page_head->_count) <= 0, page_head);
> + VM_BUG_ON_PAGE(atomic_read(&head_page->_count) <= 0, head_page);
> VM_BUG_ON_PAGE(atomic_read(&page->_count) != 0, page);
> - compound_unlock_irqrestore(page_head, flags);
> + compound_unlock_irqrestore(head_page, flags);
>
> - if (put_page_testzero(page_head)) {
> - if (PageHead(page_head))
> - __put_compound_page(page_head);
> + if (put_page_testzero(head_page)) {
> + if (PageHead(head_page))
> + __put_compound_page(head_page);
> else
> - __put_single_page(page_head);
> + __put_single_page(head_page);
> }
> } else {
> - /* page_head is a dangling pointer */
> + /* @head_page is a dangling pointer */
> VM_BUG_ON_PAGE(PageTail(page), page);
> goto out_put_single;
> }
> }
>
> +
> +static void put_compound_page(struct page *page)
> +{
> + struct page *head_page;
> +
> + /*
> + * We see the PageCompound set and PageTail not set, so @page maybe:
> + * 1. hugetlbfs head page, or
> + * 2. THP head page, or
> + */
> + if (likely(!PageTail(page))) {
> + if (put_page_testzero(page)) {
> + /*
> + * By the time all refcounts have been released
> + * __split_huge_page_refcount cannot run anymore
> + * from under us.
> + */
> + if (PageHead(page))
> + __put_compound_page(page);
> + else
> + __put_single_page(page);
> + }
> + return;
> + }
> +
> + /*
> + * We see the PageCompound set and PageTail set, so @page maybe:
> + * 1. a tail hugetlbfs page, or
> + * 2. a tail THP page, or
> + * 3. a split THP page.
> + *
> + * Case 3 is possible, as we may race with
> + * __split_huge_page_refcount tearing down a THP page.
> + */
> + head_page = compound_head(page);
> + if (!__compound_tail_refcounted(head_page))
> + put_unrefcounted_compound_page(head_page, page);
> + else
> + put_refcounted_compound_page(head_page, page);
> +}
> +
> void put_page(struct page *page)
> {
> if (unlikely(PageCompound(page)))
>

2014-04-28 19:22:05

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2] mm: introdule compound_head_by_tail()

On Mon 28-04-14 23:53:28, Jianyu Zhan wrote:
> Hi, Michal,
>
> On Mon, Apr 28, 2014 at 10:54 PM, Michal Hocko <[email protected]> wrote:
> > I really fail to see how that helps. compound_head is inlined and the
> > compiler should be clever enough to optimize the code properly. I
> > haven't tried that to be honest but this looks like it only adds a code
> > without any good reason. And I really hate the new name as well. What
> > does it suppose to mean?
>
> the code in question is as below:
>
> --- snipt ----
> if (likely(!PageTail(page))) { <------ (1)
> if (put_page_testzero(page)) {
> /*
> ?* By the time all refcounts have been released
> ?* split_huge_page cannot run anymore from under us.
> ?*/
> if (PageHead(page))
> __put_compound_page(page);
> else
> __put_single_page(page);
> }
> return;
> }
>
> /* __split_huge_page_refcount can run under us */
> page_head = compound_head(page); <------------ (2)
> --- snipt ---
>
> if at (1) , we fail the check, this means page is *likely* a tail page.
>
> Then at (2), yes, compoud_head(page) is inlined, it is :
>
> --- snipt ---
> static inline struct page *compound_head(struct page *page)
> {
> if (unlikely(PageTail(page))) { <----------- (3)
> struct page *head = page->first_page;
>
> smp_rmb();
> if (likely(PageTail(page)))
> return head;
> }
> return page;
> }
> --- snipt ---
>
> here, the (3) unlikely in the case is a negative hint, because it
> is *likely* a tail page. So the check (3) in this case is not good,
> so I introduce a helper for this case.
>
> Actually, I checked the assembled code, the compiler is _not_
> so smart to recognize this case. It just does optimization as
> the hint unlikely() told it.

OK, the generated code is sligly smaller:
11869 1328 32 13229 33ad mm/swap.o.after
11880 1328 32 13240 33b8 mm/swap.o.before

The another question is. Does this matter? You are optimizing a slow
path which is not bad in general but it would be much better if you
show numbers which tell us that it helps noticeably in some loads or it
helped with future readability and maintainability. My experience tells
me that having very specialized helper functions used at a single place
don't help in neither in readability nor maintainability.
--
Michal Hocko
SUSE Labs