Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934948AbcKJQZq (ORCPT ); Thu, 10 Nov 2016 11:25:46 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:35068 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934847AbcKJQZo (ORCPT ); Thu, 10 Nov 2016 11:25:44 -0500 Date: Thu, 10 Nov 2016 19:25:40 +0300 From: "Kirill A. Shutemov" To: Hugh Dickins Cc: "Kirill A. Shutemov" , Andrea Arcangeli , Andrew Morton , Andi Kleen , Dave Chinner , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCHv4] shmem: avoid huge pages for small files Message-ID: <20161110162540.GA12743@node.shutemov.name> References: <20161021185103.117938-1-kirill.shutemov@linux.intel.com> <20161021224629.tnwuvruhblkg22qj@black.fi.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6154 Lines: 168 On Mon, Nov 07, 2016 at 03:17:11PM -0800, Hugh Dickins wrote: > On Sat, 22 Oct 2016, Kirill A. Shutemov wrote: > > > > Huge pages are detrimental for small file: they causes noticible > > overhead on both allocation performance and memory footprint. > > > > This patch aimed to address this issue by avoiding huge pages until file > > grown to size of huge page. This would cover most of the cases where huge > > pages causes regressions in performance. > > > > Couple notes: > > > > - if shmem_enabled is set to 'force', the limit is ignored. We still > > want to generate as many pages as possible for functional testing. > > > > - the limit doesn't affect khugepaged behaviour: it still can collapse > > pages based on its settings; > > > > Signed-off-by: Kirill A. Shutemov > > Sorry, but NAK. I was expecting a patch to tune within_size behaviour. > > > --- > > Documentation/vm/transhuge.txt | 3 +++ > > mm/shmem.c | 5 +++++ > > 2 files changed, 8 insertions(+) > > > > diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt > > index 2ec6adb5a4ce..d1889c7c8c46 100644 > > --- a/Documentation/vm/transhuge.txt > > +++ b/Documentation/vm/transhuge.txt > > @@ -238,6 +238,9 @@ values: > > - "force": > > Force the huge option on for all - very useful for testing; > > > > +To avoid overhead for small files, we don't allocate huge pages for a file > > +until it grows to size of huge pages. > > + > > == Need of application restart == > > > > The transparent_hugepage/enabled values and tmpfs mount option only affect > > diff --git a/mm/shmem.c b/mm/shmem.c > > index ad7813d73ea7..49618d2d6330 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -1692,6 +1692,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, > > goto alloc_huge; > > /* TODO: implement fadvise() hints */ > > goto alloc_nohuge; > > + case SHMEM_HUGE_ALWAYS: > > + i_size = i_size_read(inode); > > + if (index < HPAGE_PMD_NR && i_size < HPAGE_PMD_SIZE) > > + goto alloc_nohuge; > > + break; > > } > > > > alloc_huge: > > So (eliding the SHMEM_HUGE_ADVISE case in between) you now have: > > case SHMEM_HUGE_WITHIN_SIZE: > off = round_up(index, HPAGE_PMD_NR); > i_size = round_up(i_size_read(inode), PAGE_SIZE); > if (i_size >= HPAGE_PMD_SIZE && > i_size >> PAGE_SHIFT >= off) > goto alloc_huge; > goto alloc_nohuge; > case SHMEM_HUGE_ALWAYS: > i_size = i_size_read(inode); > if (index < HPAGE_PMD_NR && i_size < HPAGE_PMD_SIZE) > goto alloc_nohuge; > goto alloc_huge; > > I'll concede that those two conditions are not the same; but again you're > messing with huge=always to make it, not always, but conditional on size. > > Please, keep huge=always as is: if I copy a 4MiB file into a huge tmpfs, > I got ShmemHugePages 4096 kB before, which is what I wanted. Whereas > with this change I get only 2048 kB, just like with huge=within_size. I don't think it's a problem really. We don't have guarantees anyway. And we can collapse the page later. But okay. > Treating the first extent differently is a hack, and does not respect > that this is a filesystem, on which size is likely to increase. > > By all means refine the condition for huge=within_size, and by all means > warn in transhuge.txt that huge=always may tend to waste valuable huge > pages if the filesystem is used for small files without good reason Would it be okay, if I just replace huge=within_size logic with what I proposed here for huge=always? That's not what I intended initially for this option, but... > (but maybe the implementation needs to reclaim those more effectively). It's more about cost of allocation than memory pressure. -----8<----- >From 287ab05c09bfd49c7356ca74b6fea36d8131edaf Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Mon, 17 Oct 2016 14:44:47 +0300 Subject: [PATCH] shmem: avoid huge pages for small files Huge pages are detrimental for small file: they causes noticible overhead on both allocation performance and memory footprint. This patch aimed to address this issue by avoiding huge pages until file grown to size of huge page if the filesystem mounted with huge=within_size option. This would cover most of the cases where huge pages causes regressions in performance. The limit doesn't affect khugepaged behaviour: it still can collapse pages based on its settings. Signed-off-by: Kirill A. Shutemov --- Documentation/vm/transhuge.txt | 7 ++++++- mm/shmem.c | 6 ++---- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt index 2ec6adb5a4ce..14c911c56f4a 100644 --- a/Documentation/vm/transhuge.txt +++ b/Documentation/vm/transhuge.txt @@ -208,11 +208,16 @@ You can control hugepage allocation policy in tmpfs with mount option - "always": Attempt to allocate huge pages every time we need a new page; + This option can lead to significant overhead if filesystem is used to + store small files. + - "never": Do not allocate huge pages; - "within_size": - Only allocate huge page if it will be fully within i_size. + Only allocate huge page if size of the file more than size of huge + page. This helps to avoid overhead for small files. + Also respect fadvise()/madvise() hints; - "advise: diff --git a/mm/shmem.c b/mm/shmem.c index ad7813d73ea7..3589d36c7c63 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1681,10 +1681,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, case SHMEM_HUGE_NEVER: goto alloc_nohuge; case SHMEM_HUGE_WITHIN_SIZE: - off = round_up(index, HPAGE_PMD_NR); - i_size = round_up(i_size_read(inode), PAGE_SIZE); - if (i_size >= HPAGE_PMD_SIZE && - i_size >> PAGE_SHIFT >= off) + i_size = i_size_read(inode); + if (index >= HPAGE_PMD_NR || i_size >= HPAGE_PMD_SIZE) goto alloc_huge; /* fallthrough */ case SHMEM_HUGE_ADVISE: -- Kirill A. Shutemov