Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755332Ab3I3NcS (ORCPT ); Mon, 30 Sep 2013 09:32:18 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:19477 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754665Ab3I3NcN (ORCPT ); Mon, 30 Sep 2013 09:32:13 -0400 Message-ID: <52497D37.9020706@oracle.com> Date: Mon, 30 Sep 2013 07:31:35 -0600 From: Khalid Aziz Organization: Oracle Corp User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: Greg Kroah-Hartman CC: Jack Wang , Luis Henriques , linux-kernel@vger.kernel.org, stable@vger.kernel.org, kernel-team@lists.ubuntu.com, Pravin B Shelar , Christoph Lameter , Andrea Arcangeli , Johannes Weiner , Mel Gorman , Rik van Riel , Minchan Kim , Andi Kleen , Andrew Morton , Linus Torvalds Subject: Re: [PATCH 092/104] mm: fix aio performance regression for database caused by THP References: <1380535881-9239-1-git-send-email-luis.henriques@canonical.com> <1380535881-9239-93-git-send-email-luis.henriques@canonical.com> <5249794C.5050204@profitbricks.com> <20130930132642.GA7510@kroah.com> In-Reply-To: <20130930132642.GA7510@kroah.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: ucsinet21.oracle.com [156.151.31.93] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4065 Lines: 94 On 09/30/2013 07:26 AM, Greg Kroah-Hartman wrote: > On Mon, Sep 30, 2013 at 03:14:52PM +0200, Jack Wang wrote: >> On 09/30/2013 12:11 PM, Luis Henriques wrote: >>> 3.5.7.22 -stable review patch. If anyone has any objections, please let me know. >>> >>> ------------------ >>> >>> From: Khalid Aziz >>> >>> commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream. >>> >>> I am working with a tool that simulates oracle database I/O workload. >>> This tool (orion to be specific - >>> ) >>> allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag. It then >>> does aio into these pages from flash disks using various common block >>> sizes used by database. I am looking at performance with two of the most >>> common block sizes - 1M and 64K. aio performance with these two block >>> sizes plunged after Transparent HugePages was introduced in the kernel. >>> Here are performance numbers: >>> >>> pre-THP 2.6.39 3.11-rc5 >>> 1M read 8384 MB/s 5629 MB/s 6501 MB/s >>> 64K read 7867 MB/s 4576 MB/s 4251 MB/s >>> >>> I have narrowed the performance impact down to the overheads introduced by >>> THP in __get_page_tail() and put_compound_page() routines. perf top shows >>>> 40% of cycles being spent in these two routines. Every time direct I/O >>> to hugetlbfs pages starts, kernel calls get_page() to grab a reference to >>> the pages and calls put_page() when I/O completes to put the reference >>> away. THP introduced significant amount of locking overhead to get_page() >>> and put_page() when dealing with compound pages because hugepages can be >>> split underneath get_page() and put_page(). It added this overhead >>> irrespective of whether it is dealing with hugetlbfs pages or transparent >>> hugepages. This resulted in 20%-45% drop in aio performance when using >>> hugetlbfs pages. >>> >>> Since hugetlbfs pages can not be split, there is no reason to go through >>> all the locking overhead for these pages from what I can see. I added >>> code to __get_page_tail() and put_compound_page() to bypass all the >>> locking code when working with hugetlbfs pages. This improved performance >>> significantly. Performance numbers with this patch: >>> >>> pre-THP 3.11-rc5 3.11-rc5 + Patch >>> 1M read 8384 MB/s 6501 MB/s 8371 MB/s >>> 64K read 7867 MB/s 4251 MB/s 6510 MB/s >>> >>> Performance with 64K read is still lower than what it was before THP, but >>> still a 53% improvement. It does mean there is more work to be done but I >>> will take a 53% improvement for now. >>> >>> Please take a look at the following patch and let me know if it looks >>> reasonable. >>> >>> [akpm@linux-foundation.org: tweak comments] >>> Signed-off-by: Khalid Aziz >>> Cc: Pravin B Shelar >>> Cc: Christoph Lameter >>> Cc: Andrea Arcangeli >>> Cc: Johannes Weiner >>> Cc: Mel Gorman >>> Cc: Rik van Riel >>> Cc: Minchan Kim >>> Cc: Andi Kleen >>> Signed-off-by: Andrew Morton >>> Signed-off-by: Linus Torvalds >>> [ luis: backported to 3.5: adjusted context ] >>> Signed-off-by: Luis Henriques >> Hi Greg, >> >> I suppose this patch also needed for 3.4, right? > > As it didn't originally apply there, I didn't apply it. > > If people think it should be applicable for 3.4, I'll take it. > > thanks, > > greg k-h > Hi Greg, I did send you a backported version of this patch to apply to 3.0, 3.2 and 3.4 last Monday and cc'd stable@vger.kernel.org. That patch should apply cleanly to those three kernels. -- Khalid -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/