2016-10-21 18:51:21

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv4] shmem: avoid huge pages for small files

Huge pages are detrimental for small file: they causes noticible
overhead on both allocation performance and memory footprint.

This patch aimed to address this issue by avoiding huge pages until file
grown to size of huge page. This would cover most of the cases where huge
pages causes regressions in performance.

Couple notes:

- if shmem_enabled is set to 'force', the limit is ignored. We still
want to generate as many pages as possible for functional testing.

- the limit doesn't affect khugepaged behaviour: it still can collapse
pages based on its settings;

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/vm/transhuge.txt | 3 +++
mm/shmem.c | 5 +++++
2 files changed, 8 insertions(+)

diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index 2ec6adb5a4ce..d1889c7c8c46 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -238,6 +238,9 @@ values:
- "force":
Force the huge option on for all - very useful for testing;

+To avoid overhead for small files, we don't allocate huge pages for a file
+until it grows to size of huge pages.
+
== Need of application restart ==

The transparent_hugepage/enabled values and tmpfs mount option only affect
diff --git a/mm/shmem.c b/mm/shmem.c
index ad7813d73ea7..c7b3cb5aecdc 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1692,6 +1692,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
goto alloc_huge;
/* TODO: implement fadvise() hints */
goto alloc_nohuge;
+ case SHEME_HUGE_ALWAYS:
+ i_size = i_size_read(inode);
+ if (index < HPAGE_PMD_NR && i_size < HPAGE_PMD_SIZE)
+ goto alloc_nohuge;
+ break;
}

alloc_huge:
--
2.9.3


2016-10-21 22:46:35

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv4] shmem: avoid huge pages for small files

On Fri, Oct 21, 2016 at 09:51:03PM +0300, Kirill A. Shutemov wrote:
> + case SHEME_HUGE_ALWAYS:

Oops. Forgot to commit the fixup :-/

>From 79b0a3bf4503225d0e6ba553b8496f0c4d55514e Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <[email protected]>
Date: Mon, 17 Oct 2016 14:44:47 +0300
Subject: [PATCHv4] shmem: avoid huge pages for small files

Huge pages are detrimental for small file: they causes noticible
overhead on both allocation performance and memory footprint.

This patch aimed to address this issue by avoiding huge pages until file
grown to size of huge page. This would cover most of the cases where huge
pages causes regressions in performance.

Couple notes:

- if shmem_enabled is set to 'force', the limit is ignored. We still
want to generate as many pages as possible for functional testing.

- the limit doesn't affect khugepaged behaviour: it still can collapse
pages based on its settings;

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/vm/transhuge.txt | 3 +++
mm/shmem.c | 5 +++++
2 files changed, 8 insertions(+)

diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index 2ec6adb5a4ce..d1889c7c8c46 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -238,6 +238,9 @@ values:
- "force":
Force the huge option on for all - very useful for testing;

+To avoid overhead for small files, we don't allocate huge pages for a file
+until it grows to size of huge pages.
+
== Need of application restart ==

The transparent_hugepage/enabled values and tmpfs mount option only affect
diff --git a/mm/shmem.c b/mm/shmem.c
index ad7813d73ea7..49618d2d6330 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1692,6 +1692,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
goto alloc_huge;
/* TODO: implement fadvise() hints */
goto alloc_nohuge;
+ case SHMEM_HUGE_ALWAYS:
+ i_size = i_size_read(inode);
+ if (index < HPAGE_PMD_NR && i_size < HPAGE_PMD_SIZE)
+ goto alloc_nohuge;
+ break;
}

alloc_huge:
--
Kirill A. Shutemov

2016-10-24 12:43:42

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCHv4] shmem: avoid huge pages for small files

On Sat 22-10-16 01:46:29, Kirill A. Shutemov wrote:
> On Fri, Oct 21, 2016 at 09:51:03PM +0300, Kirill A. Shutemov wrote:
> > + case SHEME_HUGE_ALWAYS:
>
> Oops. Forgot to commit the fixup :-/
>
> >From 79b0a3bf4503225d0e6ba553b8496f0c4d55514e Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <[email protected]>
> Date: Mon, 17 Oct 2016 14:44:47 +0300
> Subject: [PATCHv4] shmem: avoid huge pages for small files
>
> Huge pages are detrimental for small file: they causes noticible
> overhead on both allocation performance and memory footprint.
>
> This patch aimed to address this issue by avoiding huge pages until file
> grown to size of huge page. This would cover most of the cases where huge
> pages causes regressions in performance.
>
> Couple notes:
>
> - if shmem_enabled is set to 'force', the limit is ignored. We still
> want to generate as many pages as possible for functional testing.
>
> - the limit doesn't affect khugepaged behaviour: it still can collapse
> pages based on its settings;
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> Documentation/vm/transhuge.txt | 3 +++
> mm/shmem.c | 5 +++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
> index 2ec6adb5a4ce..d1889c7c8c46 100644
> --- a/Documentation/vm/transhuge.txt
> +++ b/Documentation/vm/transhuge.txt
> @@ -238,6 +238,9 @@ values:
> - "force":
> Force the huge option on for all - very useful for testing;
>
> +To avoid overhead for small files, we don't allocate huge pages for a file
> +until it grows to size of huge pages.
> +
> == Need of application restart ==
>
> The transparent_hugepage/enabled values and tmpfs mount option only affect
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad7813d73ea7..49618d2d6330 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1692,6 +1692,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> goto alloc_huge;
> /* TODO: implement fadvise() hints */
> goto alloc_nohuge;
> + case SHMEM_HUGE_ALWAYS:
> + i_size = i_size_read(inode);
> + if (index < HPAGE_PMD_NR && i_size < HPAGE_PMD_SIZE)
> + goto alloc_nohuge;
> + break;
> }
>
> alloc_huge:
> --
> Kirill A. Shutemov

--
Michal Hocko
SUSE Labs

2016-11-07 23:17:21

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCHv4] shmem: avoid huge pages for small files

On Sat, 22 Oct 2016, Kirill A. Shutemov wrote:
>
> Huge pages are detrimental for small file: they causes noticible
> overhead on both allocation performance and memory footprint.
>
> This patch aimed to address this issue by avoiding huge pages until file
> grown to size of huge page. This would cover most of the cases where huge
> pages causes regressions in performance.
>
> Couple notes:
>
> - if shmem_enabled is set to 'force', the limit is ignored. We still
> want to generate as many pages as possible for functional testing.
>
> - the limit doesn't affect khugepaged behaviour: it still can collapse
> pages based on its settings;
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>

Sorry, but NAK. I was expecting a patch to tune within_size behaviour.

> ---
> Documentation/vm/transhuge.txt | 3 +++
> mm/shmem.c | 5 +++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
> index 2ec6adb5a4ce..d1889c7c8c46 100644
> --- a/Documentation/vm/transhuge.txt
> +++ b/Documentation/vm/transhuge.txt
> @@ -238,6 +238,9 @@ values:
> - "force":
> Force the huge option on for all - very useful for testing;
>
> +To avoid overhead for small files, we don't allocate huge pages for a file
> +until it grows to size of huge pages.
> +
> == Need of application restart ==
>
> The transparent_hugepage/enabled values and tmpfs mount option only affect
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad7813d73ea7..49618d2d6330 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1692,6 +1692,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> goto alloc_huge;
> /* TODO: implement fadvise() hints */
> goto alloc_nohuge;
> + case SHMEM_HUGE_ALWAYS:
> + i_size = i_size_read(inode);
> + if (index < HPAGE_PMD_NR && i_size < HPAGE_PMD_SIZE)
> + goto alloc_nohuge;
> + break;
> }
>
> alloc_huge:

So (eliding the SHMEM_HUGE_ADVISE case in between) you now have:

case SHMEM_HUGE_WITHIN_SIZE:
off = round_up(index, HPAGE_PMD_NR);
i_size = round_up(i_size_read(inode), PAGE_SIZE);
if (i_size >= HPAGE_PMD_SIZE &&
i_size >> PAGE_SHIFT >= off)
goto alloc_huge;
goto alloc_nohuge;
case SHMEM_HUGE_ALWAYS:
i_size = i_size_read(inode);
if (index < HPAGE_PMD_NR && i_size < HPAGE_PMD_SIZE)
goto alloc_nohuge;
goto alloc_huge;

I'll concede that those two conditions are not the same; but again you're
messing with huge=always to make it, not always, but conditional on size.

Please, keep huge=always as is: if I copy a 4MiB file into a huge tmpfs,
I got ShmemHugePages 4096 kB before, which is what I wanted. Whereas
with this change I get only 2048 kB, just like with huge=within_size.

Treating the first extent differently is a hack, and does not respect
that this is a filesystem, on which size is likely to increase.

By all means refine the condition for huge=within_size, and by all means
warn in transhuge.txt that huge=always may tend to waste valuable huge
pages if the filesystem is used for small files without good reason
(but maybe the implementation needs to reclaim those more effectively).

Hugh

2016-11-10 16:25:46

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv4] shmem: avoid huge pages for small files

On Mon, Nov 07, 2016 at 03:17:11PM -0800, Hugh Dickins wrote:
> On Sat, 22 Oct 2016, Kirill A. Shutemov wrote:
> >
> > Huge pages are detrimental for small file: they causes noticible
> > overhead on both allocation performance and memory footprint.
> >
> > This patch aimed to address this issue by avoiding huge pages until file
> > grown to size of huge page. This would cover most of the cases where huge
> > pages causes regressions in performance.
> >
> > Couple notes:
> >
> > - if shmem_enabled is set to 'force', the limit is ignored. We still
> > want to generate as many pages as possible for functional testing.
> >
> > - the limit doesn't affect khugepaged behaviour: it still can collapse
> > pages based on its settings;
> >
> > Signed-off-by: Kirill A. Shutemov <[email protected]>
>
> Sorry, but NAK. I was expecting a patch to tune within_size behaviour.
>
> > ---
> > Documentation/vm/transhuge.txt | 3 +++
> > mm/shmem.c | 5 +++++
> > 2 files changed, 8 insertions(+)
> >
> > diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
> > index 2ec6adb5a4ce..d1889c7c8c46 100644
> > --- a/Documentation/vm/transhuge.txt
> > +++ b/Documentation/vm/transhuge.txt
> > @@ -238,6 +238,9 @@ values:
> > - "force":
> > Force the huge option on for all - very useful for testing;
> >
> > +To avoid overhead for small files, we don't allocate huge pages for a file
> > +until it grows to size of huge pages.
> > +
> > == Need of application restart ==
> >
> > The transparent_hugepage/enabled values and tmpfs mount option only affect
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index ad7813d73ea7..49618d2d6330 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1692,6 +1692,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > goto alloc_huge;
> > /* TODO: implement fadvise() hints */
> > goto alloc_nohuge;
> > + case SHMEM_HUGE_ALWAYS:
> > + i_size = i_size_read(inode);
> > + if (index < HPAGE_PMD_NR && i_size < HPAGE_PMD_SIZE)
> > + goto alloc_nohuge;
> > + break;
> > }
> >
> > alloc_huge:
>
> So (eliding the SHMEM_HUGE_ADVISE case in between) you now have:
>
> case SHMEM_HUGE_WITHIN_SIZE:
> off = round_up(index, HPAGE_PMD_NR);
> i_size = round_up(i_size_read(inode), PAGE_SIZE);
> if (i_size >= HPAGE_PMD_SIZE &&
> i_size >> PAGE_SHIFT >= off)
> goto alloc_huge;
> goto alloc_nohuge;
> case SHMEM_HUGE_ALWAYS:
> i_size = i_size_read(inode);
> if (index < HPAGE_PMD_NR && i_size < HPAGE_PMD_SIZE)
> goto alloc_nohuge;
> goto alloc_huge;
>
> I'll concede that those two conditions are not the same; but again you're
> messing with huge=always to make it, not always, but conditional on size.
>
> Please, keep huge=always as is: if I copy a 4MiB file into a huge tmpfs,
> I got ShmemHugePages 4096 kB before, which is what I wanted. Whereas
> with this change I get only 2048 kB, just like with huge=within_size.

I don't think it's a problem really. We don't have guarantees anyway.
And we can collapse the page later.

But okay.

> Treating the first extent differently is a hack, and does not respect
> that this is a filesystem, on which size is likely to increase.
>
> By all means refine the condition for huge=within_size, and by all means
> warn in transhuge.txt that huge=always may tend to waste valuable huge
> pages if the filesystem is used for small files without good reason

Would it be okay, if I just replace huge=within_size logic with what I
proposed here for huge=always?

That's not what I intended initially for this option, but...

> (but maybe the implementation needs to reclaim those more effectively).

It's more about cost of allocation than memory pressure.

-----8<-----

>From 287ab05c09bfd49c7356ca74b6fea36d8131edaf Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <[email protected]>
Date: Mon, 17 Oct 2016 14:44:47 +0300
Subject: [PATCH] shmem: avoid huge pages for small files

Huge pages are detrimental for small file: they causes noticible
overhead on both allocation performance and memory footprint.

This patch aimed to address this issue by avoiding huge pages until
file grown to size of huge page if the filesystem mounted with
huge=within_size option.

This would cover most of the cases where huge pages causes regressions
in performance.

The limit doesn't affect khugepaged behaviour: it still can collapse
pages based on its settings.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/vm/transhuge.txt | 7 ++++++-
mm/shmem.c | 6 ++----
2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index 2ec6adb5a4ce..14c911c56f4a 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -208,11 +208,16 @@ You can control hugepage allocation policy in tmpfs with mount option
- "always":
Attempt to allocate huge pages every time we need a new page;

+ This option can lead to significant overhead if filesystem is used to
+ store small files.
+
- "never":
Do not allocate huge pages;

- "within_size":
- Only allocate huge page if it will be fully within i_size.
+ Only allocate huge page if size of the file more than size of huge
+ page. This helps to avoid overhead for small files.
+
Also respect fadvise()/madvise() hints;

- "advise:
diff --git a/mm/shmem.c b/mm/shmem.c
index ad7813d73ea7..3589d36c7c63 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1681,10 +1681,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
case SHMEM_HUGE_NEVER:
goto alloc_nohuge;
case SHMEM_HUGE_WITHIN_SIZE:
- off = round_up(index, HPAGE_PMD_NR);
- i_size = round_up(i_size_read(inode), PAGE_SIZE);
- if (i_size >= HPAGE_PMD_SIZE &&
- i_size >> PAGE_SHIFT >= off)
+ i_size = i_size_read(inode);
+ if (index >= HPAGE_PMD_NR || i_size >= HPAGE_PMD_SIZE)
goto alloc_huge;
/* fallthrough */
case SHMEM_HUGE_ADVISE:
--
Kirill A. Shutemov

2016-11-10 17:43:42

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] shmem: avoid huge pages for small files

Hi Kirill,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.9-rc4 next-20161110]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/shmem-avoid-huge-pages-for-small-files/20161111-005428
config: i386-randconfig-s0-201645 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All warnings (new ones prefixed by >>):

mm/shmem.c: In function 'shmem_getpage_gfp':
>> mm/shmem.c:1680:12: warning: unused variable 'off' [-Wunused-variable]
pgoff_t off;
^~~

vim +/off +1680 mm/shmem.c

66d2f4d2 Hugh Dickins 2014-07-02 1664 mark_page_accessed(page);
66d2f4d2 Hugh Dickins 2014-07-02 1665
54af6042 Hugh Dickins 2011-08-03 1666 delete_from_swap_cache(page);
27ab7006 Hugh Dickins 2011-07-25 1667 set_page_dirty(page);
27ab7006 Hugh Dickins 2011-07-25 1668 swap_free(swap);
27ab7006 Hugh Dickins 2011-07-25 1669
54af6042 Hugh Dickins 2011-08-03 1670 } else {
800d8c63 Kirill A. Shutemov 2016-07-26 1671 /* shmem_symlink() */
800d8c63 Kirill A. Shutemov 2016-07-26 1672 if (mapping->a_ops != &shmem_aops)
800d8c63 Kirill A. Shutemov 2016-07-26 1673 goto alloc_nohuge;
657e3038 Kirill A. Shutemov 2016-07-26 1674 if (shmem_huge == SHMEM_HUGE_DENY || sgp_huge == SGP_NOHUGE)
800d8c63 Kirill A. Shutemov 2016-07-26 1675 goto alloc_nohuge;
800d8c63 Kirill A. Shutemov 2016-07-26 1676 if (shmem_huge == SHMEM_HUGE_FORCE)
800d8c63 Kirill A. Shutemov 2016-07-26 1677 goto alloc_huge;
800d8c63 Kirill A. Shutemov 2016-07-26 1678 switch (sbinfo->huge) {
800d8c63 Kirill A. Shutemov 2016-07-26 1679 loff_t i_size;
800d8c63 Kirill A. Shutemov 2016-07-26 @1680 pgoff_t off;
800d8c63 Kirill A. Shutemov 2016-07-26 1681 case SHMEM_HUGE_NEVER:
800d8c63 Kirill A. Shutemov 2016-07-26 1682 goto alloc_nohuge;
800d8c63 Kirill A. Shutemov 2016-07-26 1683 case SHMEM_HUGE_WITHIN_SIZE:
bb89f249 Kirill A. Shutemov 2016-11-10 1684 i_size = i_size_read(inode);
bb89f249 Kirill A. Shutemov 2016-11-10 1685 if (index >= HPAGE_PMD_NR || i_size >= HPAGE_PMD_SIZE)
800d8c63 Kirill A. Shutemov 2016-07-26 1686 goto alloc_huge;
800d8c63 Kirill A. Shutemov 2016-07-26 1687 /* fallthrough */
800d8c63 Kirill A. Shutemov 2016-07-26 1688 case SHMEM_HUGE_ADVISE:

:::::: The code at line 1680 was first introduced by commit
:::::: 800d8c63b2e989c2e349632d1648119bf5862f01 shmem: add huge pages support

:::::: TO: Kirill A. Shutemov <[email protected]>
:::::: CC: Linus Torvalds <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (2.90 kB)
.config.gz (20.97 kB)
Download all attachments

2016-11-10 17:52:04

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] shmem: avoid huge pages for small files

On Fri, Nov 11, 2016 at 01:42:47AM +0800, kbuild test robot wrote:
> Hi Kirill,
>
> [auto build test WARNING on linus/master]
> [also build test WARNING on v4.9-rc4 next-20161110]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/shmem-avoid-huge-pages-for-small-files/20161111-005428
> config: i386-randconfig-s0-201645 (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
> All warnings (new ones prefixed by >>):
>
> mm/shmem.c: In function 'shmem_getpage_gfp':
> >> mm/shmem.c:1680:12: warning: unused variable 'off' [-Wunused-variable]
> pgoff_t off;


>From f0a582888ac6dcb56c6134611c83edfb091bbcb6 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <[email protected]>
Date: Mon, 17 Oct 2016 14:44:47 +0300
Subject: [PATCH] shmem: avoid huge pages for small files

Huge pages are detrimental for small file: they causes noticible
overhead on both allocation performance and memory footprint.

This patch aimed to address this issue by avoiding huge pages until
file grown to size of huge page if the filesystem mounted with
huge=within_size option.

This would cover most of the cases where huge pages causes regressions
in performance.

The limit doesn't affect khugepaged behaviour: it still can collapse
pages based on its settings.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/vm/transhuge.txt | 7 ++++++-
mm/shmem.c | 7 ++-----
2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index 2ec6adb5a4ce..14c911c56f4a 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -208,11 +208,16 @@ You can control hugepage allocation policy in tmpfs with mount option
- "always":
Attempt to allocate huge pages every time we need a new page;

+ This option can lead to significant overhead if filesystem is used to
+ store small files.
+
- "never":
Do not allocate huge pages;

- "within_size":
- Only allocate huge page if it will be fully within i_size.
+ Only allocate huge page if size of the file more than size of huge
+ page. This helps to avoid overhead for small files.
+
Also respect fadvise()/madvise() hints;

- "advise:
diff --git a/mm/shmem.c b/mm/shmem.c
index ad7813d73ea7..3e2c0912c587 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1677,14 +1677,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
goto alloc_huge;
switch (sbinfo->huge) {
loff_t i_size;
- pgoff_t off;
case SHMEM_HUGE_NEVER:
goto alloc_nohuge;
case SHMEM_HUGE_WITHIN_SIZE:
- off = round_up(index, HPAGE_PMD_NR);
- i_size = round_up(i_size_read(inode), PAGE_SIZE);
- if (i_size >= HPAGE_PMD_SIZE &&
- i_size >> PAGE_SHIFT >= off)
+ i_size = i_size_read(inode);
+ if (index >= HPAGE_PMD_NR || i_size >= HPAGE_PMD_SIZE)
goto alloc_huge;
/* fallthrough */
case SHMEM_HUGE_ADVISE:
--
2.9.3

--
Kirill A. Shutemov

2016-11-11 21:47:52

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCHv4] shmem: avoid huge pages for small files

On Thu, 10 Nov 2016, Kirill A. Shutemov wrote:
> On Mon, Nov 07, 2016 at 03:17:11PM -0800, Hugh Dickins wrote:
>
> > Treating the first extent differently is a hack, and does not respect
> > that this is a filesystem, on which size is likely to increase.
> >
> > By all means refine the condition for huge=within_size, and by all means
> > warn in transhuge.txt that huge=always may tend to waste valuable huge
> > pages if the filesystem is used for small files without good reason
>
> Would it be okay, if I just replace huge=within_size logic with what I
> proposed here for huge=always?

In principle yes, that would be fine with me: I just don't care very
much about this option, since we do not force "huge=always" on anyone,
so everyone is free to use it where it's useful, and not where it's not.

But perhaps your aim is to have "huge=within_size" set by default on /tmp,
and so not behave badly there: I'd never aimed for that, and I'm a bit
sceptical about it, but if you can get good enough behaviour out of it
for that, I won't stand in your way.

>
> That's not what I intended initially for this option, but...
>
> > (but maybe the implementation needs to reclaim those more effectively).
>
> It's more about cost of allocation than memory pressure.

Regarding that issue, I think you should reconsider the GFP flags used
in shmem_alloc_hugepage(). GFP flags, and compaction latency avoidance,
have been moving targets over the last year, and I've not rechecked;
but I got the impression that your GFP flags are still asking for the
compaction stalls that are now deprecated on the anon THP fault path?
I repeat, I've not rechecked that before writing, maybe it's a libel!

>
> -----8<-----
>
> From 287ab05c09bfd49c7356ca74b6fea36d8131edaf Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <[email protected]>
> Date: Mon, 17 Oct 2016 14:44:47 +0300
> Subject: [PATCH] shmem: avoid huge pages for small files
>
> Huge pages are detrimental for small file: they causes noticible
> overhead on both allocation performance and memory footprint.
>
> This patch aimed to address this issue by avoiding huge pages until
> file grown to size of huge page if the filesystem mounted with
> huge=within_size option.
>
> This would cover most of the cases where huge pages causes regressions
> in performance.

It's not a regression if "huge=always" is worse than "huge=never" in
some cases: just cases where it's better not to mount "huge=always".

>
> The limit doesn't affect khugepaged behaviour: it still can collapse
> pages based on its settings.
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> ---
> Documentation/vm/transhuge.txt | 7 ++++++-
> mm/shmem.c | 6 ++----
> 2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
> index 2ec6adb5a4ce..14c911c56f4a 100644
> --- a/Documentation/vm/transhuge.txt
> +++ b/Documentation/vm/transhuge.txt
> @@ -208,11 +208,16 @@ You can control hugepage allocation policy in tmpfs with mount option
> - "always":
> Attempt to allocate huge pages every time we need a new page;
>

Nit: please change the semi-colon to full-stop, and delete the blank line.

> + This option can lead to significant overhead if filesystem is used to
> + store small files.
> +
> - "never":
> Do not allocate huge pages;
>
> - "within_size":
> - Only allocate huge page if it will be fully within i_size.
> + Only allocate huge page if size of the file more than size of huge
> + page. This helps to avoid overhead for small files.
> +
> Also respect fadvise()/madvise() hints;
>
> - "advise:
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad7813d73ea7..3589d36c7c63 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1681,10 +1681,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> case SHMEM_HUGE_NEVER:
> goto alloc_nohuge;
> case SHMEM_HUGE_WITHIN_SIZE:
> - off = round_up(index, HPAGE_PMD_NR);
> - i_size = round_up(i_size_read(inode), PAGE_SIZE);
> - if (i_size >= HPAGE_PMD_SIZE &&
> - i_size >> PAGE_SHIFT >= off)
> + i_size = i_size_read(inode);
> + if (index >= HPAGE_PMD_NR || i_size >= HPAGE_PMD_SIZE)
> goto alloc_huge;

I said fine in principle above, but when I look at this, I'm puzzled.

Certainly the new condition is easier to understand than the old condition:
which is a plus, even though it's hackish (I do dislike hobbling the first
extent, when it's an incomplete last extent which deserves to be hobbled -
easier said than implemented of course).

But isn't the new condition (with its ||) always weaker than the old
condition (with its &&)? Whereas I thought you were trying to change
it to be less keen to allocate hugepages, not more.

What the condition ought to say, I don't know: I got too confused,
and depressed by my confusion, so I'm just handing it back to you.

And then there's the SHMEM_HUGE_WITHIN_SIZE case in shmem_huge_enabled()
(for khugepaged), which you have explicitly not changed in this patch:
looks strange to me, is it doing the right thing?

> /* fallthrough */
> case SHMEM_HUGE_ADVISE:
> --
> Kirill A. Shutemov

2016-11-14 14:10:03

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv4] shmem: avoid huge pages for small files

On Fri, Nov 11, 2016 at 01:41:11PM -0800, Hugh Dickins wrote:
> On Thu, 10 Nov 2016, Kirill A. Shutemov wrote:
> > On Mon, Nov 07, 2016 at 03:17:11PM -0800, Hugh Dickins wrote:
> >
> > > Treating the first extent differently is a hack, and does not respect
> > > that this is a filesystem, on which size is likely to increase.
> > >
> > > By all means refine the condition for huge=within_size, and by all means
> > > warn in transhuge.txt that huge=always may tend to waste valuable huge
> > > pages if the filesystem is used for small files without good reason
> >
> > Would it be okay, if I just replace huge=within_size logic with what I
> > proposed here for huge=always?
>
> In principle yes, that would be fine with me: I just don't care very
> much about this option, since we do not force "huge=always" on anyone,
> so everyone is free to use it where it's useful, and not where it's not.
>
> But perhaps your aim is to have "huge=within_size" set by default on /tmp,
> and so not behave badly there: I'd never aimed for that, and I'm a bit
> sceptical about it, but if you can get good enough behaviour out of it
> for that, I won't stand in your way.

Yeah, I would like one day add compile-time option to choose default huge=
allocation policy.

> > That's not what I intended initially for this option, but...
> >
> > > (but maybe the implementation needs to reclaim those more effectively).
> >
> > It's more about cost of allocation than memory pressure.
>
> Regarding that issue, I think you should reconsider the GFP flags used
> in shmem_alloc_hugepage(). GFP flags, and compaction latency avoidance,
> have been moving targets over the last year, and I've not rechecked;
> but I got the impression that your GFP flags are still asking for the
> compaction stalls that are now deprecated on the anon THP fault path?
> I repeat, I've not rechecked that before writing, maybe it's a libel!

Looks like you're right, we should clear __GFP_KSWAPD_RECLAIM from gfp
flags when allocate from fault path.

Anon-THP also takes into account VM_HUGEPAGE to choose gfp. It's not easy
to get this info into shmem_alloc_hugepage()...

> > -----8<-----
> >
> > From 287ab05c09bfd49c7356ca74b6fea36d8131edaf Mon Sep 17 00:00:00 2001
> > From: "Kirill A. Shutemov" <[email protected]>
> > Date: Mon, 17 Oct 2016 14:44:47 +0300
> > Subject: [PATCH] shmem: avoid huge pages for small files
> >
> > Huge pages are detrimental for small file: they causes noticible
> > overhead on both allocation performance and memory footprint.
> >
> > This patch aimed to address this issue by avoiding huge pages until
> > file grown to size of huge page if the filesystem mounted with
> > huge=within_size option.
> >
> > This would cover most of the cases where huge pages causes regressions
> > in performance.
>
> It's not a regression if "huge=always" is worse than "huge=never" in
> some cases: just cases where it's better not to mount "huge=always".

I'm not sure what wording would be better. I mean slower comparing to
small pages.

> > The limit doesn't affect khugepaged behaviour: it still can collapse
> > pages based on its settings.
> >
> > Signed-off-by: Kirill A. Shutemov <[email protected]>
> > ---
> > Documentation/vm/transhuge.txt | 7 ++++++-
> > mm/shmem.c | 6 ++----
> > 2 files changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
> > index 2ec6adb5a4ce..14c911c56f4a 100644
> > --- a/Documentation/vm/transhuge.txt
> > +++ b/Documentation/vm/transhuge.txt
> > @@ -208,11 +208,16 @@ You can control hugepage allocation policy in tmpfs with mount option
> > - "always":
> > Attempt to allocate huge pages every time we need a new page;
> >
>
> Nit: please change the semi-colon to full-stop, and delete the blank line.

Okay.

>
> > + This option can lead to significant overhead if filesystem is used to
> > + store small files.
> > +
> > - "never":
> > Do not allocate huge pages;
> >
> > - "within_size":
> > - Only allocate huge page if it will be fully within i_size.
> > + Only allocate huge page if size of the file more than size of huge
> > + page. This helps to avoid overhead for small files.
> > +
> > Also respect fadvise()/madvise() hints;
> >
> > - "advise:
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index ad7813d73ea7..3589d36c7c63 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1681,10 +1681,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > case SHMEM_HUGE_NEVER:
> > goto alloc_nohuge;
> > case SHMEM_HUGE_WITHIN_SIZE:
> > - off = round_up(index, HPAGE_PMD_NR);
> > - i_size = round_up(i_size_read(inode), PAGE_SIZE);
> > - if (i_size >= HPAGE_PMD_SIZE &&
> > - i_size >> PAGE_SHIFT >= off)
> > + i_size = i_size_read(inode);
> > + if (index >= HPAGE_PMD_NR || i_size >= HPAGE_PMD_SIZE)
> > goto alloc_huge;
>
> I said fine in principle above, but when I look at this, I'm puzzled.
>
> Certainly the new condition is easier to understand than the old condition:
> which is a plus, even though it's hackish (I do dislike hobbling the first
> extent, when it's an incomplete last extent which deserves to be hobbled -
> easier said than implemented of course).

Well, it's just heuristic that I found useful. I don't see a reason to
make more complex if it works.

> But isn't the new condition (with its ||) always weaker than the old
> condition (with its &&)? Whereas I thought you were trying to change
> it to be less keen to allocate hugepages, not more.

I tried to make it less keen to allocate hugepages comparing to
huge=always.

Current huge=within_size is fairly restrictive: we don't allocate huge
pages to grow the file. For shmem, it means we would allocate huge pages
if user did truncate(2) to set file size, before touching data in it
(shared memory APIs do this). This policy would be more useful for
filesystem with backing storage.

The patch relaxes condition: only require file size >= HPAGE_PMD_SIZE.

> What the condition ought to say, I don't know: I got too confused,
> and depressed by my confusion, so I'm just handing it back to you.
>
> And then there's the SHMEM_HUGE_WITHIN_SIZE case in shmem_huge_enabled()
> (for khugepaged), which you have explicitly not changed in this patch:
> looks strange to me, is it doing the right thing?

I missed that.

-----8<-----
>From b2158fdd8523e3e35a548857a1cb02fe6bcd1ea4 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <[email protected]>
Date: Mon, 17 Oct 2016 14:44:47 +0300
Subject: [PATCH] shmem: avoid huge pages for small files

Huge pages are detrimental for small file: they causes noticible
overhead on both allocation performance and memory footprint.

This patch aimed to address this issue by avoiding huge pages until
file grown to size of huge page if the filesystem mounted with
huge=within_size option.

This would cover most of the cases where huge pages causes slowdown
comparing to small pages.

Later we can consider huge=within_size as the default for tmpfs.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/vm/transhuge.txt | 8 ++++++--
mm/shmem.c | 12 +++---------
2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index 2ec6adb5a4ce..7703e9c241ca 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -206,13 +206,17 @@ You can control hugepage allocation policy in tmpfs with mount option
"huge=". It can have following values:

- "always":
- Attempt to allocate huge pages every time we need a new page;
+ Attempt to allocate huge pages every time we need a new page.
+ This option can lead to significant overhead if filesystem is used to
+ store small files.

- "never":
Do not allocate huge pages;

- "within_size":
- Only allocate huge page if it will be fully within i_size.
+ Only allocate huge page if size of the file more than size of huge
+ page. This helps to avoid overhead for small files.
+
Also respect fadvise()/madvise() hints;

- "advise:
diff --git a/mm/shmem.c b/mm/shmem.c
index ad7813d73ea7..ef8fdadd0626 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1677,14 +1677,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
goto alloc_huge;
switch (sbinfo->huge) {
loff_t i_size;
- pgoff_t off;
case SHMEM_HUGE_NEVER:
goto alloc_nohuge;
case SHMEM_HUGE_WITHIN_SIZE:
- off = round_up(index, HPAGE_PMD_NR);
- i_size = round_up(i_size_read(inode), PAGE_SIZE);
- if (i_size >= HPAGE_PMD_SIZE &&
- i_size >> PAGE_SHIFT >= off)
+ i_size = i_size_read(inode);
+ if (index >= HPAGE_PMD_NR || i_size >= HPAGE_PMD_SIZE)
goto alloc_huge;
/* fallthrough */
case SHMEM_HUGE_ADVISE:
@@ -3856,7 +3853,6 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
struct inode *inode = file_inode(vma->vm_file);
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
loff_t i_size;
- pgoff_t off;

if (shmem_huge == SHMEM_HUGE_FORCE)
return true;
@@ -3868,10 +3864,8 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
case SHMEM_HUGE_ALWAYS:
return true;
case SHMEM_HUGE_WITHIN_SIZE:
- off = round_up(vma->vm_pgoff, HPAGE_PMD_NR);
i_size = round_up(i_size_read(inode), PAGE_SIZE);
- if (i_size >= HPAGE_PMD_SIZE &&
- i_size >> PAGE_SHIFT >= off)
+ if (i_size >= HPAGE_PMD_SIZE)
return true;
case SHMEM_HUGE_ADVISE:
/* TODO: implement fadvise() hints */
--
Kirill A. Shutemov

2016-11-29 03:57:20

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCHv4] shmem: avoid huge pages for small files

On Mon, 14 Nov 2016, Kirill A. Shutemov wrote:
> On Fri, Nov 11, 2016 at 01:41:11PM -0800, Hugh Dickins wrote:
> >
> > Certainly the new condition is easier to understand than the old condition:
> > which is a plus, even though it's hackish (I do dislike hobbling the first
> > extent, when it's an incomplete last extent which deserves to be hobbled -
> > easier said than implemented of course).
>
> Well, it's just heuristic that I found useful. I don't see a reason to
> make more complex if it works.

You like it because it allocates huge pages to some extents,
but not to all extents. I dislike it because it allocates
huge pages to the wrong extents.

You did much the same three or four years ago, in your THP-on-ramfs
series: I admired your resourcefulness, in getting the little files
to fit in memory; but it was not a solution I wanted to see again.

Consider copying a 2097153-byte file into such a filesystem: the first
2MB would be allocated with 4kB pages, the final byte with a 2MB page;
but it looks like I already pointed that out, and we just disagree.

This patch does not convince me at all: I expect you will come up with
some better strategy in a month or two, and I'd rather wait for that
than keep messing around with what we have. But if you can persuade
the filesystem guys that this heuristic would be a sensible mount
option for them, then in the end I shall not want tmpfs to diverge.

>
> > But isn't the new condition (with its ||) always weaker than the old
> > condition (with its &&)? Whereas I thought you were trying to change
> > it to be less keen to allocate hugepages, not more.
>
> I tried to make it less keen to allocate hugepages comparing to
> huge=always.
>
> Current huge=within_size is fairly restrictive: we don't allocate huge
> pages to grow the file. For shmem, it means we would allocate huge pages
> if user did truncate(2) to set file size, before touching data in it
> (shared memory APIs do this). This policy would be more useful for
> filesystem with backing storage.
>
> The patch relaxes condition: only require file size >= HPAGE_PMD_SIZE.
>
> > What the condition ought to say, I don't know: I got too confused,
> > and depressed by my confusion, so I'm just handing it back to you.
> >
> > And then there's the SHMEM_HUGE_WITHIN_SIZE case in shmem_huge_enabled()
> > (for khugepaged), which you have explicitly not changed in this patch:
> > looks strange to me, is it doing the right thing?
>
> I missed that.
>
> -----8<-----
> From b2158fdd8523e3e35a548857a1cb02fe6bcd1ea4 Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <[email protected]>
> Date: Mon, 17 Oct 2016 14:44:47 +0300
> Subject: [PATCH] shmem: avoid huge pages for small files
>
> Huge pages are detrimental for small file: they causes noticible
> overhead on both allocation performance and memory footprint.
>
> This patch aimed to address this issue by avoiding huge pages until
> file grown to size of huge page if the filesystem mounted with
> huge=within_size option.
>
> This would cover most of the cases where huge pages causes slowdown
> comparing to small pages.
>
> Later we can consider huge=within_size as the default for tmpfs.

I'm sceptical of that, and I do not think this implementation will
make a sensible default.

>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> ---
> Documentation/vm/transhuge.txt | 8 ++++++--
> mm/shmem.c | 12 +++---------
> 2 files changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
> index 2ec6adb5a4ce..7703e9c241ca 100644
> --- a/Documentation/vm/transhuge.txt
> +++ b/Documentation/vm/transhuge.txt
> @@ -206,13 +206,17 @@ You can control hugepage allocation policy in tmpfs with mount option
> "huge=". It can have following values:
>
> - "always":
> - Attempt to allocate huge pages every time we need a new page;
> + Attempt to allocate huge pages every time we need a new page.
> + This option can lead to significant overhead if filesystem is used to
> + store small files.

Good, yes, that part I fully agree with.

>
> - "never":
> Do not allocate huge pages;
>
> - "within_size":
> - Only allocate huge page if it will be fully within i_size.
> + Only allocate huge page if size of the file more than size of huge
> + page. This helps to avoid overhead for small files.
> +
> Also respect fadvise()/madvise() hints;
>
> - "advise:
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad7813d73ea7..ef8fdadd0626 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1677,14 +1677,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> goto alloc_huge;
> switch (sbinfo->huge) {
> loff_t i_size;
> - pgoff_t off;
> case SHMEM_HUGE_NEVER:
> goto alloc_nohuge;
> case SHMEM_HUGE_WITHIN_SIZE:
> - off = round_up(index, HPAGE_PMD_NR);
> - i_size = round_up(i_size_read(inode), PAGE_SIZE);
> - if (i_size >= HPAGE_PMD_SIZE &&
> - i_size >> PAGE_SHIFT >= off)

I certainly agree that the old test is obscure: I give up and cry each
time I try to work out exactly what it does. I wanted so much to offer
a constructive alternative before responding: how about

if (index < round_down(i_size_read(inode),
HPAGE_PMD_SIZE) >> PAGE_SHIFT))

Of course that does not give you any huge pages while a file is being
copied in (without a preparatory ftruncate), but it seems a more
comprehensible within_size implementation to me.

> + i_size = i_size_read(inode);
> + if (index >= HPAGE_PMD_NR || i_size >= HPAGE_PMD_SIZE)
> goto alloc_huge;
> /* fallthrough */
> case SHMEM_HUGE_ADVISE:
> @@ -3856,7 +3853,6 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
> struct inode *inode = file_inode(vma->vm_file);
> struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> loff_t i_size;
> - pgoff_t off;
>
> if (shmem_huge == SHMEM_HUGE_FORCE)
> return true;
> @@ -3868,10 +3864,8 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
> case SHMEM_HUGE_ALWAYS:
> return true;
> case SHMEM_HUGE_WITHIN_SIZE:
> - off = round_up(vma->vm_pgoff, HPAGE_PMD_NR);
> i_size = round_up(i_size_read(inode), PAGE_SIZE);
> - if (i_size >= HPAGE_PMD_SIZE &&
> - i_size >> PAGE_SHIFT >= off)
> + if (i_size >= HPAGE_PMD_SIZE)
> return true;

That's reasonable, given what you propose for shmem_getpage_gfp().
And given other conditions at the calling khugepaged end, it might
even be okay with my suggestion - I've not given it enough thought.
Or simply return true there, and let khugepaged work it out?
I am pretty sure the original condition was wrong.

> case SHMEM_HUGE_ADVISE:
> /* TODO: implement fadvise() hints */
> --
> Kirill A. Shutemov

2016-11-29 11:11:14

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv4] shmem: avoid huge pages for small files

On Mon, Nov 28, 2016 at 07:56:48PM -0800, Hugh Dickins wrote:
> On Mon, 14 Nov 2016, Kirill A. Shutemov wrote:
> > On Fri, Nov 11, 2016 at 01:41:11PM -0800, Hugh Dickins wrote:
> > >
> > > Certainly the new condition is easier to understand than the old condition:
> > > which is a plus, even though it's hackish (I do dislike hobbling the first
> > > extent, when it's an incomplete last extent which deserves to be hobbled -
> > > easier said than implemented of course).
> >
> > Well, it's just heuristic that I found useful. I don't see a reason to
> > make more complex if it works.
>
> You like it because it allocates huge pages to some extents,
> but not to all extents. I dislike it because it allocates
> huge pages to the wrong extents.
>
> You did much the same three or four years ago, in your THP-on-ramfs
> series: I admired your resourcefulness, in getting the little files
> to fit in memory; but it was not a solution I wanted to see again.
>
> Consider copying a 2097153-byte file into such a filesystem: the first
> 2MB would be allocated with 4kB pages, the final byte with a 2MB page;
> but it looks like I already pointed that out, and we just disagree.

I agree with you that's not elegant. But it works.

Small files tend to be well within size of extent. And they contribute the
most to overhead just because they are small and you can fit a lot of them
onto a filesystem of a size.

And what you've described is the worst case. There are not that many files
on the border of one extent.

Usually files that benefit the most from huge pages are at least several
extents in size -- media, databases, etc. And ratio of "allocation of huge
pages to wrong extents" diminish as file grows.

Let's agree to disagree. (which means I loose as you're the maintainer) :)

>
> This patch does not convince me at all: I expect you will come up with
> some better strategy in a month or two, and I'd rather wait for that
> than keep messing around with what we have. But if you can persuade
> the filesystem guys that this heuristic would be a sensible mount
> option for them, then in the end I shall not want tmpfs to diverge.

For a filesystem with backing storage, I think the old heuristic for
huge=within_size is more appropriate as we don't start with empty
filesystem every time.

> > > But isn't the new condition (with its ||) always weaker than the old
> > > condition (with its &&)? Whereas I thought you were trying to change
> > > it to be less keen to allocate hugepages, not more.
> >
> > I tried to make it less keen to allocate hugepages comparing to
> > huge=always.
> >
> > Current huge=within_size is fairly restrictive: we don't allocate huge
> > pages to grow the file. For shmem, it means we would allocate huge pages
> > if user did truncate(2) to set file size, before touching data in it
> > (shared memory APIs do this). This policy would be more useful for
> > filesystem with backing storage.
> >
> > The patch relaxes condition: only require file size >= HPAGE_PMD_SIZE.
> >
> > > What the condition ought to say, I don't know: I got too confused,
> > > and depressed by my confusion, so I'm just handing it back to you.
> > >
> > > And then there's the SHMEM_HUGE_WITHIN_SIZE case in shmem_huge_enabled()
> > > (for khugepaged), which you have explicitly not changed in this patch:
> > > looks strange to me, is it doing the right thing?
> >
> > I missed that.
> >
> > -----8<-----
> > From b2158fdd8523e3e35a548857a1cb02fe6bcd1ea4 Mon Sep 17 00:00:00 2001
> > From: "Kirill A. Shutemov" <[email protected]>
> > Date: Mon, 17 Oct 2016 14:44:47 +0300
> > Subject: [PATCH] shmem: avoid huge pages for small files
> >
> > Huge pages are detrimental for small file: they causes noticible
> > overhead on both allocation performance and memory footprint.
> >
> > This patch aimed to address this issue by avoiding huge pages until
> > file grown to size of huge page if the filesystem mounted with
> > huge=within_size option.
> >
> > This would cover most of the cases where huge pages causes slowdown
> > comparing to small pages.
> >
> > Later we can consider huge=within_size as the default for tmpfs.
>
> I'm sceptical of that, and I do not think this implementation will
> make a sensible default.
>
> >
> > Signed-off-by: Kirill A. Shutemov <[email protected]>
> > ---
> > Documentation/vm/transhuge.txt | 8 ++++++--
> > mm/shmem.c | 12 +++---------
> > 2 files changed, 9 insertions(+), 11 deletions(-)
> >
> > diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
> > index 2ec6adb5a4ce..7703e9c241ca 100644
> > --- a/Documentation/vm/transhuge.txt
> > +++ b/Documentation/vm/transhuge.txt
> > @@ -206,13 +206,17 @@ You can control hugepage allocation policy in tmpfs with mount option
> > "huge=". It can have following values:
> >
> > - "always":
> > - Attempt to allocate huge pages every time we need a new page;
> > + Attempt to allocate huge pages every time we need a new page.
> > + This option can lead to significant overhead if filesystem is used to
> > + store small files.
>
> Good, yes, that part I fully agree with.
>
> >
> > - "never":
> > Do not allocate huge pages;
> >
> > - "within_size":
> > - Only allocate huge page if it will be fully within i_size.
> > + Only allocate huge page if size of the file more than size of huge
> > + page. This helps to avoid overhead for small files.
> > +
> > Also respect fadvise()/madvise() hints;
> >
> > - "advise:
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index ad7813d73ea7..ef8fdadd0626 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1677,14 +1677,11 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> > goto alloc_huge;
> > switch (sbinfo->huge) {
> > loff_t i_size;
> > - pgoff_t off;
> > case SHMEM_HUGE_NEVER:
> > goto alloc_nohuge;
> > case SHMEM_HUGE_WITHIN_SIZE:
> > - off = round_up(index, HPAGE_PMD_NR);
> > - i_size = round_up(i_size_read(inode), PAGE_SIZE);
> > - if (i_size >= HPAGE_PMD_SIZE &&
> > - i_size >> PAGE_SHIFT >= off)
>
> I certainly agree that the old test is obscure: I give up and cry each
> time I try to work out exactly what it does. I wanted so much to offer
> a constructive alternative before responding: how about
>
> if (index < round_down(i_size_read(inode),
> HPAGE_PMD_SIZE) >> PAGE_SHIFT))

I tried to be cleaver here and allocate huge pages when size is more or
equal HPAGE_PMD_SIZE - PAGE_SIZE + 1, so we would catch a little bit more
cases where huge page allocation makes sense.

I did badly on writing it clearly. Maybe something like this (untested):

i_size = round_up(i_size_read(inode), PAGE_SIZE);
if (index < round_down(i_size, HPAGE_PMD_SIZE) >>
PAGE_SHIFT)

> Of course that does not give you any huge pages while a file is being
> copied in (without a preparatory ftruncate), but it seems a more
> comprehensible within_size implementation to me.
>
> > + i_size = i_size_read(inode);
> > + if (index >= HPAGE_PMD_NR || i_size >= HPAGE_PMD_SIZE)
> > goto alloc_huge;
> > /* fallthrough */
> > case SHMEM_HUGE_ADVISE:
> > @@ -3856,7 +3853,6 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
> > struct inode *inode = file_inode(vma->vm_file);
> > struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > loff_t i_size;
> > - pgoff_t off;
> >
> > if (shmem_huge == SHMEM_HUGE_FORCE)
> > return true;
> > @@ -3868,10 +3864,8 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
> > case SHMEM_HUGE_ALWAYS:
> > return true;
> > case SHMEM_HUGE_WITHIN_SIZE:
> > - off = round_up(vma->vm_pgoff, HPAGE_PMD_NR);
> > i_size = round_up(i_size_read(inode), PAGE_SIZE);
> > - if (i_size >= HPAGE_PMD_SIZE &&
> > - i_size >> PAGE_SHIFT >= off)
> > + if (i_size >= HPAGE_PMD_SIZE)
> > return true;
>
> That's reasonable, given what you propose for shmem_getpage_gfp().
> And given other conditions at the calling khugepaged end, it might
> even be okay with my suggestion - I've not given it enough thought.
> Or simply return true there, and let khugepaged work it out?

Hm. Return true, seems do the job.

> I am pretty sure the original condition was wrong.
>
> > case SHMEM_HUGE_ADVISE:
> > /* TODO: implement fadvise() hints */
> > --
> > Kirill A. Shutemov
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kirill A. Shutemov