Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755113Ab3I3NOk (ORCPT ); Mon, 30 Sep 2013 09:14:40 -0400 Received: from mail-bk0-f52.google.com ([209.85.214.52]:44994 "EHLO mail-bk0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754486Ab3I3NOg (ORCPT ); Mon, 30 Sep 2013 09:14:36 -0400 Message-ID: <5249794C.5050204@profitbricks.com> Date: Mon, 30 Sep 2013 15:14:52 +0200 From: Jack Wang User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Greg Kroah-Hartman CC: Luis Henriques , linux-kernel@vger.kernel.org, stable@vger.kernel.org, kernel-team@lists.ubuntu.com, Khalid Aziz , Pravin B Shelar , Christoph Lameter , Andrea Arcangeli , Johannes Weiner , Mel Gorman , Rik van Riel , Minchan Kim , Andi Kleen , Andrew Morton , Linus Torvalds Subject: Re: [PATCH 092/104] mm: fix aio performance regression for database caused by THP References: <1380535881-9239-1-git-send-email-luis.henriques@canonical.com> <1380535881-9239-93-git-send-email-luis.henriques@canonical.com> In-Reply-To: <1380535881-9239-93-git-send-email-luis.henriques@canonical.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6849 Lines: 194 On 09/30/2013 12:11 PM, Luis Henriques wrote: > 3.5.7.22 -stable review patch. If anyone has any objections, please let me know. > > ------------------ > > From: Khalid Aziz > > commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream. > > I am working with a tool that simulates oracle database I/O workload. > This tool (orion to be specific - > ) > allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag. It then > does aio into these pages from flash disks using various common block > sizes used by database. I am looking at performance with two of the most > common block sizes - 1M and 64K. aio performance with these two block > sizes plunged after Transparent HugePages was introduced in the kernel. > Here are performance numbers: > > pre-THP 2.6.39 3.11-rc5 > 1M read 8384 MB/s 5629 MB/s 6501 MB/s > 64K read 7867 MB/s 4576 MB/s 4251 MB/s > > I have narrowed the performance impact down to the overheads introduced by > THP in __get_page_tail() and put_compound_page() routines. perf top shows >> 40% of cycles being spent in these two routines. Every time direct I/O > to hugetlbfs pages starts, kernel calls get_page() to grab a reference to > the pages and calls put_page() when I/O completes to put the reference > away. THP introduced significant amount of locking overhead to get_page() > and put_page() when dealing with compound pages because hugepages can be > split underneath get_page() and put_page(). It added this overhead > irrespective of whether it is dealing with hugetlbfs pages or transparent > hugepages. This resulted in 20%-45% drop in aio performance when using > hugetlbfs pages. > > Since hugetlbfs pages can not be split, there is no reason to go through > all the locking overhead for these pages from what I can see. I added > code to __get_page_tail() and put_compound_page() to bypass all the > locking code when working with hugetlbfs pages. This improved performance > significantly. Performance numbers with this patch: > > pre-THP 3.11-rc5 3.11-rc5 + Patch > 1M read 8384 MB/s 6501 MB/s 8371 MB/s > 64K read 7867 MB/s 4251 MB/s 6510 MB/s > > Performance with 64K read is still lower than what it was before THP, but > still a 53% improvement. It does mean there is more work to be done but I > will take a 53% improvement for now. > > Please take a look at the following patch and let me know if it looks > reasonable. > > [akpm@linux-foundation.org: tweak comments] > Signed-off-by: Khalid Aziz > Cc: Pravin B Shelar > Cc: Christoph Lameter > Cc: Andrea Arcangeli > Cc: Johannes Weiner > Cc: Mel Gorman > Cc: Rik van Riel > Cc: Minchan Kim > Cc: Andi Kleen > Signed-off-by: Andrew Morton > Signed-off-by: Linus Torvalds > [ luis: backported to 3.5: adjusted context ] > Signed-off-by: Luis Henriques Hi Greg, I suppose this patch also needed for 3.4, right? Regards, Jack > --- > mm/swap.c | 77 ++++++++++++++++++++++++++++++++++++++++++--------------------- > 1 file changed, 52 insertions(+), 25 deletions(-) > > diff --git a/mm/swap.c b/mm/swap.c > index 4e7e2ec..0c833e8 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > > #include "internal.h" > > @@ -77,6 +78,19 @@ static void __put_compound_page(struct page *page) > > static void put_compound_page(struct page *page) > { > + /* > + * hugetlbfs pages cannot be split from under us. If this is a > + * hugetlbfs page, check refcount on head page and release the page if > + * the refcount becomes zero. > + */ > + if (PageHuge(page)) { > + page = compound_head(page); > + if (put_page_testzero(page)) > + __put_compound_page(page); > + > + return; > + } > + > if (unlikely(PageTail(page))) { > /* __split_huge_page_refcount can run under us */ > struct page *page_head = compound_trans_head(page); > @@ -180,38 +194,51 @@ bool __get_page_tail(struct page *page) > * proper PT lock that already serializes against > * split_huge_page(). > */ > - unsigned long flags; > bool got = false; > - struct page *page_head = compound_trans_head(page); > + struct page *page_head; > > - if (likely(page != page_head && get_page_unless_zero(page_head))) { > + /* > + * If this is a hugetlbfs page it cannot be split under us. Simply > + * increment refcount for the head page. > + */ > + if (PageHuge(page)) { > + page_head = compound_head(page); > + atomic_inc(&page_head->_count); > + got = true; > + } else { > + unsigned long flags; > + > + page_head = compound_trans_head(page); > + if (likely(page != page_head && > + get_page_unless_zero(page_head))) { > + > + /* Ref to put_compound_page() comment. */ > + if (PageSlab(page_head)) { > + if (likely(PageTail(page))) { > + __get_page_tail_foll(page, false); > + return true; > + } else { > + put_page(page_head); > + return false; > + } > + } > > - /* Ref to put_compound_page() comment. */ > - if (PageSlab(page_head)) { > + /* > + * page_head wasn't a dangling pointer but it > + * may not be a head page anymore by the time > + * we obtain the lock. That is ok as long as it > + * can't be freed from under us. > + */ > + flags = compound_lock_irqsave(page_head); > + /* here __split_huge_page_refcount won't run anymore */ > if (likely(PageTail(page))) { > __get_page_tail_foll(page, false); > - return true; > - } else { > - put_page(page_head); > - return false; > + got = true; > } > + compound_unlock_irqrestore(page_head, flags); > + if (unlikely(!got)) > + put_page(page_head); > } > - > - /* > - * page_head wasn't a dangling pointer but it > - * may not be a head page anymore by the time > - * we obtain the lock. That is ok as long as it > - * can't be freed from under us. > - */ > - flags = compound_lock_irqsave(page_head); > - /* here __split_huge_page_refcount won't run anymore */ > - if (likely(PageTail(page))) { > - __get_page_tail_foll(page, false); > - got = true; > - } > - compound_unlock_irqrestore(page_head, flags); > - if (unlikely(!got)) > - put_page(page_head); > } > return got; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/