2016-04-28 13:31:59

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 0/19] get rid of superfluous __GFP_REPEAT

Hi,
this is the thrid version of the patchset previously sent [1]. I have
basically only rebased it on top of next-20160428 tree and dropped
"crypto: get rid of superfluous __GFP_REPEAT" which went through crypto
tree. I have added two more md patches as I couldn't resist more alloc
related cleanups at that area.

Motivation:
While working on something unrelated I've checked the current usage
of __GFP_REPEAT in the tree. It seems that a majority of the usage is
and always has been bogus because __GFP_REPEAT has always been about
costly high order allocations while we are using it for order-0 or very
small orders very often. It seems that a big pile of them is just a
copy&paste when a code has been adopted from one arch to another.

I think it makes some sense to get rid of them because they are just
making the semantic more unclear. Please note that GFP_REPEAT is
documented as
* __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
* _might_ fail. This depends upon the particular VM implementation.
while !costly requests have basically nofail semantic. So one could
reasonably expect that order-0 request with __GFP_REPEAT will not loop
for ever. This is not implemented right now though.

I would like to move on with __GFP_REPEAT and define a better
semantic for it. One thought was to rename it to __GFP_BEST_EFFORT
which would behave consistently for all orders and guarantee that the
allocation would try as long as it seem feasible or fail eventually.
!costly request would then finally get a request context which neiter
fails too early (GFP_NORETRY) nor endlessly loops in the allocator for
ever (default behavior). Costly high order requests would keep the
current semantic.
We have discussed that at LSF/MM this year and another suggestion was
to introduce __GFP_TRYHARD instead which would be implicit for all
orders and users would opt out by ~__GFP_TRYHARD instead. I am not
sure which way is better right now but I plan to do the clean up first
before going further with semantic changes.

$ git grep __GFP_REPEAT next/master | wc -l
109
$ git grep __GFP_REPEAT | wc -l
35

So we are down to the third after this patch series. The remaining places
really seem to be relying on __GFP_REPEAT due to large allocation requests.
This still needs some double checking which I will do later after all the
simple ones are sorted out.

I am touching a lot of arch specific code here and I hope I got it right
but as a matter of fact I even didn't compile test for some archs as I
do not have cross compiler for them. Patches should be quite trivial to
review for stupid compile mistakes though. The tricky parts are usually
hidden by macro definitions and thats where I would appreciate help from
arch maintainers.

[1] http://lkml.kernel.org/r/[email protected]


2016-04-28 13:24:22

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 07/20] mips: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pte_alloc_one{_kernel}, pmd_alloc_one allocate PTE_ORDER resp. PMD_ORDER
but both are not larger than 1. This means that this flag has never
been actually useful here because it has always been used only for
PAGE_ALLOC_COSTLY requests.

Cc: John Crispin <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/mips/include/asm/pgalloc.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
index b336037e8768..93c079a1cfc8 100644
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -69,7 +69,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
{
pte_t *pte;

- pte = (pte_t *) __get_free_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, PTE_ORDER);
+ pte = (pte_t *) __get_free_pages(GFP_KERNEL|__GFP_ZERO, PTE_ORDER);

return pte;
}
@@ -79,7 +79,7 @@ static inline struct page *pte_alloc_one(struct mm_struct *mm,
{
struct page *pte;

- pte = alloc_pages(GFP_KERNEL | __GFP_REPEAT, PTE_ORDER);
+ pte = alloc_pages(GFP_KERNEL, PTE_ORDER);
if (!pte)
return NULL;
clear_highpage(pte);
@@ -113,7 +113,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
{
pmd_t *pmd;

- pmd = (pmd_t *) __get_free_pages(GFP_KERNEL|__GFP_REPEAT, PMD_ORDER);
+ pmd = (pmd_t *) __get_free_pages(GFP_KERNEL, PMD_ORDER);
if (pmd)
pmd_init((unsigned long)pmd, (unsigned long)invalid_pte_table);
return pmd;
--
2.8.0.rc3

2016-04-28 13:24:29

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 12/20] sparc: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

{pud,pmd}_alloc_one is using __GFP_REPEAT but it always allocates from
pgtable_cache which is initialzed to PAGE_SIZE objects. This means that
this flag has never been actually useful here because it has always been
used only for PAGE_ALLOC_COSTLY requests.

Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/sparc/include/asm/pgalloc_64.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/include/asm/pgalloc_64.h b/arch/sparc/include/asm/pgalloc_64.h
index 5e3187185b4a..3529f1378cd8 100644
--- a/arch/sparc/include/asm/pgalloc_64.h
+++ b/arch/sparc/include/asm/pgalloc_64.h
@@ -41,8 +41,7 @@ static inline void __pud_populate(pud_t *pud, pmd_t *pmd)

static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return kmem_cache_alloc(pgtable_cache,
- GFP_KERNEL|__GFP_REPEAT);
+ return kmem_cache_alloc(pgtable_cache, GFP_KERNEL);
}

static inline void pud_free(struct mm_struct *mm, pud_t *pud)
@@ -52,8 +51,7 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)

static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return kmem_cache_alloc(pgtable_cache,
- GFP_KERNEL|__GFP_REPEAT);
+ return kmem_cache_alloc(pgtable_cache, GFP_KERNEL);
}

static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
--
2.8.0.rc3

2016-04-28 13:24:42

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 20/20] jbd2: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

jbd2_alloc is explicit about its allocation preferences wrt. the
allocation size. Sub page allocations go to the slab allocator
and larger are using either the page allocator or vmalloc. This
is all good but the logic is unnecessarily complex.
1) as per Ted, the vmalloc fallback is a left-over:
: jbd2_alloc is only passed in the bh->b_size, which can't be >
: PAGE_SIZE, so the code path that calls vmalloc() should never get
: called. When we conveted jbd2_alloc() to suppor sub-page size
: allocations in commit d2eecb039368, there was an assumption that it
: could be called with a size greater than PAGE_SIZE, but that's
: certaily not true today.
Moreover vmalloc allocation might even lead to a deadlock because
the callers expect GFP_NOFS context while vmalloc is GFP_KERNEL.

2) __GFP_REPEAT for requests <= PAGE_ALLOC_COSTLY_ORDER is ignored
since the flag was introduced.

Let's simplify the code flow and use the slab allocator for sub-page
requests and the page allocator for others. Even though order > 0 is
not currently used as per above leave that option open.

Cc: "Theodore Ts'o" <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
fs/jbd2/journal.c | 32 +++++++-------------------------
1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b31852f76f46..e3ca4b4cac84 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2329,18 +2329,10 @@ void *jbd2_alloc(size_t size, gfp_t flags)

BUG_ON(size & (size-1)); /* Must be a power of 2 */

- flags |= __GFP_REPEAT;
- if (size == PAGE_SIZE)
- ptr = (void *)__get_free_pages(flags, 0);
- else if (size > PAGE_SIZE) {
- int order = get_order(size);
-
- if (order < 3)
- ptr = (void *)__get_free_pages(flags, order);
- else
- ptr = vmalloc(size);
- } else
+ if (size < PAGE_SIZE)
ptr = kmem_cache_alloc(get_slab(size), flags);
+ else
+ ptr = (void *)__get_free_pages(flags, get_order(size));

/* Check alignment; SLUB has gotten this wrong in the past,
* and this can lead to user data corruption! */
@@ -2351,20 +2343,10 @@ void *jbd2_alloc(size_t size, gfp_t flags)

void jbd2_free(void *ptr, size_t size)
{
- if (size == PAGE_SIZE) {
- free_pages((unsigned long)ptr, 0);
- return;
- }
- if (size > PAGE_SIZE) {
- int order = get_order(size);
-
- if (order < 3)
- free_pages((unsigned long)ptr, order);
- else
- vfree(ptr);
- return;
- }
- kmem_cache_free(get_slab(size), ptr);
+ if (size < PAGE_SIZE)
+ kmem_cache_free(get_slab(size), ptr);
+ else
+ free_pages((unsigned long)ptr, get_order(size));
};

/*
--
2.8.0.rc3

2016-04-28 13:24:44

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 19/20] md: simplify free_params for kmalloc vs vmalloc fallback

From: Michal Hocko <[email protected]>

Use kvfree rather than DM_PARAMS_[KV]MALLOC specific param flags.

Cc: Shaohua Li <[email protected]>
Cc: Mikulas Patocka <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
drivers/md/dm-ioctl.c | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index fe0b57d7573c..2b48c49774bc 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1670,19 +1670,14 @@ static int check_version(unsigned int cmd, struct dm_ioctl __user *user)
return r;
}

-#define DM_PARAMS_KMALLOC 0x0001 /* Params alloced with kmalloc */
-#define DM_PARAMS_VMALLOC 0x0002 /* Params alloced with vmalloc */
-#define DM_WIPE_BUFFER 0x0010 /* Wipe input buffer before returning from ioctl */
+#define DM_WIPE_BUFFER 0x0001 /* Wipe input buffer before returning from ioctl */

static void free_params(struct dm_ioctl *param, size_t param_size, int param_flags)
{
if (param_flags & DM_WIPE_BUFFER)
memset(param, 0, param_size);

- if (param_flags & DM_PARAMS_KMALLOC)
- kfree(param);
- if (param_flags & DM_PARAMS_VMALLOC)
- vfree(param);
+ kvfree(param);
}

static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kernel,
@@ -1714,17 +1709,11 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
* Use kmalloc() rather than vmalloc() when we can.
*/
dmi = NULL;
- if (param_kernel->data_size <= KMALLOC_MAX_SIZE) {
+ if (param_kernel->data_size <= KMALLOC_MAX_SIZE)
dmi = kmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
- if (dmi)
- *param_flags |= DM_PARAMS_KMALLOC;
- }

- if (!dmi) {
+ if (!dmi)
dmi = __vmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
- if (dmi)
- *param_flags |= DM_PARAMS_VMALLOC;
- }

if (!dmi) {
if (secure_data && clear_user(user, param_kernel->data_size))
--
2.8.0.rc3

2016-04-28 13:24:27

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 13/20] s390: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

page_table_alloc then uses the flag for a single page allocation. This
means that this flag has never been actually useful here because it has
always been used only for PAGE_ALLOC_COSTLY requests.

Cc: Martin Schwidefsky <[email protected]>
Cc: [email protected]
Acked-by: Heiko Carstens <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
arch/s390/mm/pgalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index f6c3de26cda8..3f716741797a 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -198,7 +198,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
return table;
}
/* Allocate a fresh page */
- page = alloc_page(GFP_KERNEL|__GFP_REPEAT);
+ page = alloc_page(GFP_KERNEL);
if (!page)
return NULL;
if (!pgtable_page_ctor(page)) {
--
2.8.0.rc3

2016-04-28 13:26:06

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 18/20] dm: clean up GFP_NIO usage

From: Michal Hocko <[email protected]>

copy_params uses GFP_NOIO for explicit allocation requests because this
might be called from the suspend path. To quote Mikulas:
: The LVM tool calls suspend and resume ioctls on device mapper block
: devices.
:
: When a device is suspended, any bio sent to the device is held. If the
: resume ioctl did GFP_KERNEL allocation, the allocation could get stuck
: trying to write some dirty cached pages to the suspended device.
:
: The LVM tool and the dmeventd daemon use mlock to lock its address space,
: so the copy_from_user/copy_to_user call cannot trigger a page fault.

Relying on the mlock is quite fragile and we have a better way in kernel
to enfore NOIO which is already used for the vmalloc fallback. Just use
memalloc_noio_{save,restore} around the whole copy_params function which
will force the same also to the page fult paths via copy_{from,to}_user.

While we are there we can also remove __GFP_NOMEMALLOC because copy_params
is never called from MEMALLOC context (e.g. during the reclaim).

Cc: Shaohua Li <[email protected]>
Cc: Mikulas Patocka <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
drivers/md/dm-ioctl.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 2c7ca258c4e4..fe0b57d7573c 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1715,16 +1715,13 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
*/
dmi = NULL;
if (param_kernel->data_size <= KMALLOC_MAX_SIZE) {
- dmi = kmalloc(param_kernel->data_size, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
+ dmi = kmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
if (dmi)
*param_flags |= DM_PARAMS_KMALLOC;
}

if (!dmi) {
- unsigned noio_flag;
- noio_flag = memalloc_noio_save();
- dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
- memalloc_noio_restore(noio_flag);
+ dmi = __vmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
if (dmi)
*param_flags |= DM_PARAMS_VMALLOC;
}
@@ -1801,6 +1798,7 @@ static int ctl_ioctl(uint command, struct dm_ioctl __user *user)
ioctl_fn fn = NULL;
size_t input_param_size;
struct dm_ioctl param_kernel;
+ unsigned noio_flag;

/* only root can play with this */
if (!capable(CAP_SYS_ADMIN))
@@ -1832,9 +1830,12 @@ static int ctl_ioctl(uint command, struct dm_ioctl __user *user)
}

/*
- * Copy the parameters into kernel space.
+ * Copy the parameters into kernel space. Make sure that no IO is triggered
+ * from the allocation paths because this might be called during the suspend.
*/
+ noio_flag = memalloc_noio_save();
r = copy_params(user, &param_kernel, ioctl_flags, &param, &param_flags);
+ memalloc_noio_restore(noio_flag);

if (r)
return r;
--
2.8.0.rc3

2016-04-28 13:26:34

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 16/20] unicore32: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

PGALLOC_GFP uses __GFP_REPEAT but it is only used in pte_alloc_one,
pte_alloc_one_kernel which does order-0 request. This means that this
flag has never been actually useful here because it has always been used
only for PAGE_ALLOC_COSTLY requests.

Cc: Guan Xuetao <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/unicore32/include/asm/pgalloc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/unicore32/include/asm/pgalloc.h b/arch/unicore32/include/asm/pgalloc.h
index 2e02d1356fdf..26775793c204 100644
--- a/arch/unicore32/include/asm/pgalloc.h
+++ b/arch/unicore32/include/asm/pgalloc.h
@@ -28,7 +28,7 @@ extern void free_pgd_slow(struct mm_struct *mm, pgd_t *pgd);
#define pgd_alloc(mm) get_pgd_slow(mm)
#define pgd_free(mm, pgd) free_pgd_slow(mm, pgd)

-#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
+#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)

/*
* Allocate one PTE table.
--
2.8.0.rc3

2016-04-28 13:26:33

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 17/20] dm: get rid of superfluous gfp flags

From: Michal Hocko <[email protected]>

copy_params seems to be little bit confused about which allocation flags
to use. It enforces GFP_NOIO even though it uses
memalloc_noio_{save,restore} which enforces GFP_NOIO at the page
allocator level automatically (via memalloc_noio_flags). It also
uses __GFP_REPEAT for the __vmalloc request which doesn't make much
sense either because vmalloc doesn't rely on costly high order
allocations. Let's just drop the __GFP_REPEAT and leave the further
cleanup to later changes.

Cc: Shaohua Li <[email protected]>
Cc: Mikulas Patocka <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
drivers/md/dm-ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 2adf81d81fca..2c7ca258c4e4 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1723,7 +1723,7 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
if (!dmi) {
unsigned noio_flag;
noio_flag = memalloc_noio_save();
- dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_REPEAT | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
+ dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
memalloc_noio_restore(noio_flag);
if (dmi)
*param_flags |= DM_PARAMS_VMALLOC;
--
2.8.0.rc3

2016-04-28 13:27:52

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 15/20] tile: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pgtable_alloc_one uses __GFP_REPEAT flag for L2_USER_PGTABLE_ORDER but
the order is either 0 or 3 if L2_KERNEL_PGTABLE_SHIFT for HPAGE_SHIFT.
This means that this flag has never been actually useful here because it
has always been used only for PAGE_ALLOC_COSTLY requests.

Cc: Chris Metcalf <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/tile/mm/pgtable.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index 7bf2491a9c1f..c4d5bf841a7f 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -231,7 +231,7 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
struct page *pgtable_alloc_one(struct mm_struct *mm, unsigned long address,
int order)
{
- gfp_t flags = GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO;
+ gfp_t flags = GFP_KERNEL|__GFP_ZERO;
struct page *p;
int i;

--
2.8.0.rc3

2016-04-28 13:28:09

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 14/20] sh: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

PGALLOC_GFP uses __GFP_REPEAT but {pgd,pmd}_alloc allocate from
{pgd,pmd}_cache but both caches are allocating up to PAGE_SIZE
objects. This means that this flag has never been actually useful here
because it has always been used only for PAGE_ALLOC_COSTLY requests.

Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/sh/mm/pgtable.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sh/mm/pgtable.c b/arch/sh/mm/pgtable.c
index 26e03a1f7ca4..a62bd8696779 100644
--- a/arch/sh/mm/pgtable.c
+++ b/arch/sh/mm/pgtable.c
@@ -1,7 +1,7 @@
#include <linux/mm.h>
#include <linux/slab.h>

-#define PGALLOC_GFP GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO
+#define PGALLOC_GFP GFP_KERNEL | __GFP_ZERO

static struct kmem_cache *pgd_cachep;
#if PAGETABLE_LEVELS > 2
--
2.8.0.rc3

2016-04-28 13:28:29

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 11/20] powerpc: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

{pud,pmd}_alloc_one are allocating from {PGT,PUD}_CACHE initialized in
pgtable_cache_init which doesn't have larger than sizeof(void *) << 12
size and that fits into !costly allocation request size. This means that
this flag has never been actually useful here because it has always been
used only for PAGE_ALLOC_COSTLY requests.

Cc: Benjamin Herrenschmidt <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/powerpc/include/asm/pgalloc-64.h | 6 +++---
arch/powerpc/mm/hugetlbpage.c | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/pgalloc-64.h b/arch/powerpc/include/asm/pgalloc-64.h
index 5dcfde5dc673..aaa6d3bc5d86 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -58,7 +58,7 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
{
return kmem_cache_alloc(PGT_CACHE(PUD_INDEX_SIZE),
- GFP_KERNEL|__GFP_REPEAT);
+ GFP_KERNEL);
}

static inline void pud_free(struct mm_struct *mm, pud_t *pud)
@@ -181,7 +181,7 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
{
return kmem_cache_alloc(PGT_CACHE(PUD_INDEX_SIZE),
- GFP_KERNEL|__GFP_REPEAT);
+ GFP_KERNEL);
}

static inline void pud_free(struct mm_struct *mm, pud_t *pud)
@@ -245,7 +245,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
return kmem_cache_alloc(PGT_CACHE(PMD_CACHE_INDEX),
- GFP_KERNEL|__GFP_REPEAT);
+ GFP_KERNEL);
}

static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index a4a90a869999..6cfc48c94d4b 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -73,7 +73,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
cachep = PGT_CACHE(pdshift - pshift);
#endif

- new = kmem_cache_zalloc(cachep, GFP_KERNEL|__GFP_REPEAT);
+ new = kmem_cache_zalloc(cachep, GFP_KERNEL);

BUG_ON(pshift > HUGEPD_SHIFT_MASK);
BUG_ON((unsigned long)new & HUGEPD_SHIFT_MASK);
--
2.8.0.rc3

2016-04-28 13:28:46

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 10/20] score: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pte_alloc_one{_kernel} allocate PTE_ORDER which is 0. This means that
this flag has never been actually useful here because it has always been
used only for PAGE_ALLOC_COSTLY requests.

Cc: Chen Liqin <[email protected]>
Cc: Lennox Wu <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/score/include/asm/pgalloc.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/score/include/asm/pgalloc.h b/arch/score/include/asm/pgalloc.h
index 2e067657db98..49b012d78c1a 100644
--- a/arch/score/include/asm/pgalloc.h
+++ b/arch/score/include/asm/pgalloc.h
@@ -42,8 +42,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
{
pte_t *pte;

- pte = (pte_t *) __get_free_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO,
- PTE_ORDER);
+ pte = (pte_t *) __get_free_pages(GFP_KERNEL|__GFP_ZERO, PTE_ORDER);

return pte;
}
@@ -53,7 +52,7 @@ static inline struct page *pte_alloc_one(struct mm_struct *mm,
{
struct page *pte;

- pte = alloc_pages(GFP_KERNEL | __GFP_REPEAT, PTE_ORDER);
+ pte = alloc_pages(GFP_KERNEL, PTE_ORDER);
if (!pte)
return NULL;
clear_highpage(pte);
--
2.8.0.rc3

2016-04-28 13:29:03

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 09/20] parisc: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pmd_alloc_one allocate PMD_ORDER which is 1. This means that this flag
has never been actually useful here because it has always been used only
for PAGE_ALLOC_COSTLY requests.

Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/parisc/include/asm/pgalloc.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/parisc/include/asm/pgalloc.h b/arch/parisc/include/asm/pgalloc.h
index 52c3defb40c9..f08dda3f0995 100644
--- a/arch/parisc/include/asm/pgalloc.h
+++ b/arch/parisc/include/asm/pgalloc.h
@@ -63,8 +63,7 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmd)

static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
{
- pmd_t *pmd = (pmd_t *)__get_free_pages(GFP_KERNEL|__GFP_REPEAT,
- PMD_ORDER);
+ pmd_t *pmd = (pmd_t *)__get_free_pages(GFP_KERNEL, PMD_ORDER);
if (pmd)
memset(pmd, 0, PAGE_SIZE<<PMD_ORDER);
return pmd;
--
2.8.0.rc3

2016-04-28 13:24:20

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 01/20] tree wide: get rid of __GFP_REPEAT for order-0 allocations part I

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations. Yet we have
the full kernel tree with its usage for apparently order-0 allocations.
This is really confusing because __GFP_REPEAT is explicitly documented
to allow allocation failures which is a weaker semantic than the current
order-0 has (basically nofail).

Let's simply drop __GFP_REPEAT from those places. This would allow
to identify place which really need allocator to retry harder and
formulate a more specific semantic for what the flag is supposed to do
actually.

Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/alpha/include/asm/pgalloc.h | 4 ++--
arch/arm/include/asm/pgalloc.h | 2 +-
arch/avr32/include/asm/pgalloc.h | 6 +++---
arch/cris/include/asm/pgalloc.h | 4 ++--
arch/frv/mm/pgalloc.c | 6 +++---
arch/hexagon/include/asm/pgalloc.h | 4 ++--
arch/m68k/include/asm/mcf_pgalloc.h | 4 ++--
arch/m68k/include/asm/motorola_pgalloc.h | 4 ++--
arch/m68k/include/asm/sun3_pgalloc.h | 4 ++--
arch/metag/include/asm/pgalloc.h | 5 ++---
arch/microblaze/include/asm/pgalloc.h | 4 ++--
arch/microblaze/mm/pgtable.c | 3 +--
arch/mn10300/mm/pgtable.c | 6 +++---
arch/openrisc/include/asm/pgalloc.h | 2 +-
arch/openrisc/mm/ioremap.c | 2 +-
arch/parisc/include/asm/pgalloc.h | 4 ++--
arch/powerpc/include/asm/pgalloc-64.h | 2 +-
arch/powerpc/mm/pgtable_32.c | 4 ++--
arch/powerpc/mm/pgtable_64.c | 3 +--
arch/sh/include/asm/pgalloc.h | 4 ++--
arch/sparc/mm/init_64.c | 6 ++----
arch/um/kernel/mem.c | 4 ++--
arch/x86/include/asm/pgalloc.h | 4 ++--
arch/x86/xen/p2m.c | 2 +-
arch/xtensa/include/asm/pgalloc.h | 2 +-
drivers/block/aoe/aoecmd.c | 2 +-
26 files changed, 46 insertions(+), 51 deletions(-)

diff --git a/arch/alpha/include/asm/pgalloc.h b/arch/alpha/include/asm/pgalloc.h
index aab14a019c20..c2ebb6f36c9d 100644
--- a/arch/alpha/include/asm/pgalloc.h
+++ b/arch/alpha/include/asm/pgalloc.h
@@ -40,7 +40,7 @@ pgd_free(struct mm_struct *mm, pgd_t *pgd)
static inline pmd_t *
pmd_alloc_one(struct mm_struct *mm, unsigned long address)
{
- pmd_t *ret = (pmd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+ pmd_t *ret = (pmd_t *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
return ret;
}

@@ -53,7 +53,7 @@ pmd_free(struct mm_struct *mm, pmd_t *pmd)
static inline pte_t *
pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
- pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+ pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
return pte;
}

diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index 19cfab526d13..20febb368844 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -29,7 +29,7 @@

static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return (pmd_t *)get_zeroed_page(GFP_KERNEL | __GFP_REPEAT);
+ return (pmd_t *)get_zeroed_page(GFP_KERNEL);
}

static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
diff --git a/arch/avr32/include/asm/pgalloc.h b/arch/avr32/include/asm/pgalloc.h
index 1aba19d68c5e..db039cb368be 100644
--- a/arch/avr32/include/asm/pgalloc.h
+++ b/arch/avr32/include/asm/pgalloc.h
@@ -43,7 +43,7 @@ static inline void pgd_ctor(void *x)
*/
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
- return quicklist_alloc(QUICK_PGD, GFP_KERNEL | __GFP_REPEAT, pgd_ctor);
+ return quicklist_alloc(QUICK_PGD, GFP_KERNEL, pgd_ctor);
}

static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
@@ -54,7 +54,7 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- return quicklist_alloc(QUICK_PT, GFP_KERNEL | __GFP_REPEAT, NULL);
+ return quicklist_alloc(QUICK_PT, GFP_KERNEL, NULL);
}

static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
@@ -63,7 +63,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
struct page *page;
void *pg;

- pg = quicklist_alloc(QUICK_PT, GFP_KERNEL | __GFP_REPEAT, NULL);
+ pg = quicklist_alloc(QUICK_PT, GFP_KERNEL, NULL);
if (!pg)
return NULL;

diff --git a/arch/cris/include/asm/pgalloc.h b/arch/cris/include/asm/pgalloc.h
index 235ece437ddd..42f1affb9c2d 100644
--- a/arch/cris/include/asm/pgalloc.h
+++ b/arch/cris/include/asm/pgalloc.h
@@ -24,14 +24,14 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)

static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
- pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+ pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
return pte;
}

static inline pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
{
struct page *pte;
- pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, 0);
+ pte = alloc_pages(GFP_KERNEL|__GFP_ZERO, 0);
if (!pte)
return NULL;
if (!pgtable_page_ctor(pte)) {
diff --git a/arch/frv/mm/pgalloc.c b/arch/frv/mm/pgalloc.c
index 41907d25ed38..c9ed14f6c67d 100644
--- a/arch/frv/mm/pgalloc.c
+++ b/arch/frv/mm/pgalloc.c
@@ -22,7 +22,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __attribute__((aligned(PAGE_SIZE)));

pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
- pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
+ pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL);
if (pte)
clear_page(pte);
return pte;
@@ -33,9 +33,9 @@ pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
struct page *page;

#ifdef CONFIG_HIGHPTE
- page = alloc_pages(GFP_KERNEL|__GFP_HIGHMEM|__GFP_REPEAT, 0);
+ page = alloc_pages(GFP_KERNEL|__GFP_HIGHMEM, 0);
#else
- page = alloc_pages(GFP_KERNEL|__GFP_REPEAT, 0);
+ page = alloc_pages(GFP_KERNEL, 0);
#endif
if (!page)
return NULL;
diff --git a/arch/hexagon/include/asm/pgalloc.h b/arch/hexagon/include/asm/pgalloc.h
index 77da3b0ae3c2..eeebf862c46c 100644
--- a/arch/hexagon/include/asm/pgalloc.h
+++ b/arch/hexagon/include/asm/pgalloc.h
@@ -64,7 +64,7 @@ static inline struct page *pte_alloc_one(struct mm_struct *mm,
{
struct page *pte;

- pte = alloc_page(GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO);
+ pte = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!pte)
return NULL;
if (!pgtable_page_ctor(pte)) {
@@ -78,7 +78,7 @@ static inline struct page *pte_alloc_one(struct mm_struct *mm,
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- gfp_t flags = GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO;
+ gfp_t flags = GFP_KERNEL | __GFP_ZERO;
return (pte_t *) __get_free_page(flags);
}

diff --git a/arch/m68k/include/asm/mcf_pgalloc.h b/arch/m68k/include/asm/mcf_pgalloc.h
index f9924fbcfe42..fb95aed5f428 100644
--- a/arch/m68k/include/asm/mcf_pgalloc.h
+++ b/arch/m68k/include/asm/mcf_pgalloc.h
@@ -14,7 +14,7 @@ extern const char bad_pmd_string[];
extern inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- unsigned long page = __get_free_page(GFP_DMA|__GFP_REPEAT);
+ unsigned long page = __get_free_page(GFP_DMA);

if (!page)
return NULL;
@@ -51,7 +51,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t page,
static inline struct page *pte_alloc_one(struct mm_struct *mm,
unsigned long address)
{
- struct page *page = alloc_pages(GFP_DMA|__GFP_REPEAT, 0);
+ struct page *page = alloc_pages(GFP_DMA, 0);
pte_t *pte;

if (!page)
diff --git a/arch/m68k/include/asm/motorola_pgalloc.h b/arch/m68k/include/asm/motorola_pgalloc.h
index 24bcba496c75..c895b987202c 100644
--- a/arch/m68k/include/asm/motorola_pgalloc.h
+++ b/arch/m68k/include/asm/motorola_pgalloc.h
@@ -11,7 +11,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long ad
{
pte_t *pte;

- pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+ pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
if (pte) {
__flush_page_to_ram(pte);
flush_tlb_kernel_page(pte);
@@ -32,7 +32,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addres
struct page *page;
pte_t *pte;

- page = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, 0);
+ page = alloc_pages(GFP_KERNEL|__GFP_ZERO, 0);
if(!page)
return NULL;
if (!pgtable_page_ctor(page)) {
diff --git a/arch/m68k/include/asm/sun3_pgalloc.h b/arch/m68k/include/asm/sun3_pgalloc.h
index 0931388de47f..1901f61f926f 100644
--- a/arch/m68k/include/asm/sun3_pgalloc.h
+++ b/arch/m68k/include/asm/sun3_pgalloc.h
@@ -37,7 +37,7 @@ do { \
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- unsigned long page = __get_free_page(GFP_KERNEL|__GFP_REPEAT);
+ unsigned long page = __get_free_page(GFP_KERNEL);

if (!page)
return NULL;
@@ -49,7 +49,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
unsigned long address)
{
- struct page *page = alloc_pages(GFP_KERNEL|__GFP_REPEAT, 0);
+ struct page *page = alloc_pages(GFP_KERNEL, 0);

if (page == NULL)
return NULL;
diff --git a/arch/metag/include/asm/pgalloc.h b/arch/metag/include/asm/pgalloc.h
index 3104df0a4822..c2caa1ee4360 100644
--- a/arch/metag/include/asm/pgalloc.h
+++ b/arch/metag/include/asm/pgalloc.h
@@ -42,8 +42,7 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL | __GFP_REPEAT |
- __GFP_ZERO);
+ pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
return pte;
}

@@ -51,7 +50,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
unsigned long address)
{
struct page *pte;
- pte = alloc_pages(GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO, 0);
+ pte = alloc_pages(GFP_KERNEL | __GFP_ZERO, 0);
if (!pte)
return NULL;
if (!pgtable_page_ctor(pte)) {
diff --git a/arch/microblaze/include/asm/pgalloc.h b/arch/microblaze/include/asm/pgalloc.h
index 61436d69775c..7c89390c0c13 100644
--- a/arch/microblaze/include/asm/pgalloc.h
+++ b/arch/microblaze/include/asm/pgalloc.h
@@ -116,9 +116,9 @@ static inline struct page *pte_alloc_one(struct mm_struct *mm,
struct page *ptepage;

#ifdef CONFIG_HIGHPTE
- int flags = GFP_KERNEL | __GFP_HIGHMEM | __GFP_REPEAT;
+ int flags = GFP_KERNEL | __GFP_HIGHMEM;
#else
- int flags = GFP_KERNEL | __GFP_REPEAT;
+ int flags = GFP_KERNEL;
#endif

ptepage = alloc_pages(flags, 0);
diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
index 4f4520e779a5..eb99fcc76088 100644
--- a/arch/microblaze/mm/pgtable.c
+++ b/arch/microblaze/mm/pgtable.c
@@ -239,8 +239,7 @@ __init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
{
pte_t *pte;
if (mem_init_done) {
- pte = (pte_t *)__get_free_page(GFP_KERNEL |
- __GFP_REPEAT | __GFP_ZERO);
+ pte = (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
} else {
pte = (pte_t *)early_get_page();
if (pte)
diff --git a/arch/mn10300/mm/pgtable.c b/arch/mn10300/mm/pgtable.c
index e77a7c728081..9577cf768875 100644
--- a/arch/mn10300/mm/pgtable.c
+++ b/arch/mn10300/mm/pgtable.c
@@ -63,7 +63,7 @@ void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags)

pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
- pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
+ pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL);
if (pte)
clear_page(pte);
return pte;
@@ -74,9 +74,9 @@ struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
struct page *pte;

#ifdef CONFIG_HIGHPTE
- pte = alloc_pages(GFP_KERNEL|__GFP_HIGHMEM|__GFP_REPEAT, 0);
+ pte = alloc_pages(GFP_KERNEL|__GFP_HIGHMEM, 0);
#else
- pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT, 0);
+ pte = alloc_pages(GFP_KERNEL, 0);
#endif
if (!pte)
return NULL;
diff --git a/arch/openrisc/include/asm/pgalloc.h b/arch/openrisc/include/asm/pgalloc.h
index 21484e5b9e9a..87eebd185089 100644
--- a/arch/openrisc/include/asm/pgalloc.h
+++ b/arch/openrisc/include/asm/pgalloc.h
@@ -77,7 +77,7 @@ static inline struct page *pte_alloc_one(struct mm_struct *mm,
unsigned long address)
{
struct page *pte;
- pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT, 0);
+ pte = alloc_pages(GFP_KERNEL, 0);
if (!pte)
return NULL;
clear_page(page_address(pte));
diff --git a/arch/openrisc/mm/ioremap.c b/arch/openrisc/mm/ioremap.c
index 62b08ef392be..5b2a95116e8f 100644
--- a/arch/openrisc/mm/ioremap.c
+++ b/arch/openrisc/mm/ioremap.c
@@ -122,7 +122,7 @@ pte_t __init_refok *pte_alloc_one_kernel(struct mm_struct *mm,
pte_t *pte;

if (likely(mem_init_done)) {
- pte = (pte_t *) __get_free_page(GFP_KERNEL | __GFP_REPEAT);
+ pte = (pte_t *) __get_free_page(GFP_KERNEL);
} else {
pte = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE);
#if 0
diff --git a/arch/parisc/include/asm/pgalloc.h b/arch/parisc/include/asm/pgalloc.h
index f2fd327dce2e..52c3defb40c9 100644
--- a/arch/parisc/include/asm/pgalloc.h
+++ b/arch/parisc/include/asm/pgalloc.h
@@ -124,7 +124,7 @@ pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte)
static inline pgtable_t
pte_alloc_one(struct mm_struct *mm, unsigned long address)
{
- struct page *page = alloc_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+ struct page *page = alloc_page(GFP_KERNEL|__GFP_ZERO);
if (!page)
return NULL;
if (!pgtable_page_ctor(page)) {
@@ -137,7 +137,7 @@ pte_alloc_one(struct mm_struct *mm, unsigned long address)
static inline pte_t *
pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr)
{
- pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+ pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
return pte;
}

diff --git a/arch/powerpc/include/asm/pgalloc-64.h b/arch/powerpc/include/asm/pgalloc-64.h
index 8d5fc3ac43da..5dcfde5dc673 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -88,7 +88,7 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO);
+ return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
}

static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index bf7bf32b54f8..7f922f557936 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -84,7 +84,7 @@ __init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long add
pte_t *pte;

if (slab_is_available()) {
- pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+ pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
} else {
pte = __va(memblock_alloc(PAGE_SIZE, PAGE_SIZE));
if (pte)
@@ -97,7 +97,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
{
struct page *ptepage;

- gfp_t flags = GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO;
+ gfp_t flags = GFP_KERNEL | __GFP_ZERO;

ptepage = alloc_pages(flags, 0);
if (!ptepage)
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 347106080bb1..03cc73e86675 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -386,8 +386,7 @@ static pte_t *get_from_cache(struct mm_struct *mm)
static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
{
void *ret = NULL;
- struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK |
- __GFP_REPEAT | __GFP_ZERO);
+ struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
if (!page)
return NULL;
if (!kernel && !pgtable_page_ctor(page)) {
diff --git a/arch/sh/include/asm/pgalloc.h b/arch/sh/include/asm/pgalloc.h
index a33673b3687d..f3f42c84c40f 100644
--- a/arch/sh/include/asm/pgalloc.h
+++ b/arch/sh/include/asm/pgalloc.h
@@ -34,7 +34,7 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- return quicklist_alloc(QUICK_PT, GFP_KERNEL | __GFP_REPEAT, NULL);
+ return quicklist_alloc(QUICK_PT, GFP_KERNEL, NULL);
}

static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
@@ -43,7 +43,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
struct page *page;
void *pg;

- pg = quicklist_alloc(QUICK_PT, GFP_KERNEL | __GFP_REPEAT, NULL);
+ pg = quicklist_alloc(QUICK_PT, GFP_KERNEL, NULL);
if (!pg)
return NULL;
page = virt_to_page(pg);
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 09e838801e39..4b905ea1f1c8 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2716,8 +2716,7 @@ void __flush_tlb_all(void)
pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK |
- __GFP_REPEAT | __GFP_ZERO);
+ struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
pte_t *pte = NULL;

if (page)
@@ -2729,8 +2728,7 @@ pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
pgtable_t pte_alloc_one(struct mm_struct *mm,
unsigned long address)
{
- struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK |
- __GFP_REPEAT | __GFP_ZERO);
+ struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
if (!page)
return NULL;
if (!pgtable_page_ctor(page)) {
diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index b2a2dff50b4e..e7437ec62710 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -204,7 +204,7 @@ pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
pte_t *pte;

- pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+ pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
return pte;
}

@@ -212,7 +212,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
{
struct page *pte;

- pte = alloc_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+ pte = alloc_page(GFP_KERNEL|__GFP_ZERO);
if (!pte)
return NULL;
if (!pgtable_page_ctor(pte)) {
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index bf7f8b55b0f9..574c23cf761a 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -81,7 +81,7 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
struct page *page;
- page = alloc_pages(GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO, 0);
+ page = alloc_pages(GFP_KERNEL | __GFP_ZERO, 0);
if (!page)
return NULL;
if (!pgtable_pmd_page_ctor(page)) {
@@ -125,7 +125,7 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)

static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+ return (pud_t *)get_zeroed_page(GFP_KERNEL);
}

static inline void pud_free(struct mm_struct *mm, pud_t *pud)
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index cab9f766bb06..dd2a49a8aacc 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -182,7 +182,7 @@ static void * __ref alloc_p2m_page(void)
if (unlikely(!slab_is_available()))
return alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);

- return (void *)__get_free_page(GFP_KERNEL | __GFP_REPEAT);
+ return (void *)__get_free_page(GFP_KERNEL);
}

static void __ref free_p2m_page(void *p)
diff --git a/arch/xtensa/include/asm/pgalloc.h b/arch/xtensa/include/asm/pgalloc.h
index d38eb9237e64..1065bc8bcae5 100644
--- a/arch/xtensa/include/asm/pgalloc.h
+++ b/arch/xtensa/include/asm/pgalloc.h
@@ -44,7 +44,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
pte_t *ptep;
int i;

- ptep = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
+ ptep = (pte_t *)__get_free_page(GFP_KERNEL);
if (!ptep)
return NULL;
for (i = 0; i < 1024; i++)
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index d597e432e195..ab19adb07a12 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -1750,7 +1750,7 @@ aoecmd_init(void)
int ret;

/* get_zeroed_page returns page with ref count 1 */
- p = (void *) get_zeroed_page(GFP_KERNEL | __GFP_REPEAT);
+ p = (void *) get_zeroed_page(GFP_KERNEL);
if (!p)
return -ENOMEM;
empty_page = virt_to_page(p);
--
2.8.0.rc3

2016-04-28 13:29:42

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 08/20] nios2: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pte_alloc_one{_kernel} allocate PTE_ORDER which is 0. This means that
this flag has never been actually useful here because it has always been
used only for PAGE_ALLOC_COSTLY requests.

Cc: Ley Foon Tan <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/nios2/include/asm/pgalloc.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/nios2/include/asm/pgalloc.h b/arch/nios2/include/asm/pgalloc.h
index 6e2985e0a7b9..bb47d08c8ef7 100644
--- a/arch/nios2/include/asm/pgalloc.h
+++ b/arch/nios2/include/asm/pgalloc.h
@@ -42,8 +42,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
{
pte_t *pte;

- pte = (pte_t *) __get_free_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO,
- PTE_ORDER);
+ pte = (pte_t *) __get_free_pages(GFP_KERNEL|__GFP_ZERO, PTE_ORDER);

return pte;
}
@@ -53,7 +52,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
{
struct page *pte;

- pte = alloc_pages(GFP_KERNEL | __GFP_REPEAT, PTE_ORDER);
+ pte = alloc_pages(GFP_KERNEL, PTE_ORDER);
if (pte) {
if (!pgtable_page_ctor(pte)) {
__free_page(pte);
--
2.8.0.rc3

2016-04-28 13:24:18

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 02/20] x86: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses this
flag is for more than order-0. This means that this flag has never
been actually useful here because it has always been used only for
PAGE_ALLOC_COSTLY requests.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/x86/kernel/espfix_64.c | 2 +-
arch/x86/mm/pgtable.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index 4d38416e2a7f..04f89caef9c4 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -57,7 +57,7 @@
# error "Need more than one PGD for the ESPFIX hack"
#endif

-#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
+#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)

/* This contains the *bottom* address of the espfix stack */
DEFINE_PER_CPU_READ_MOSTLY(unsigned long, espfix_stack);
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 4eb287e25043..aa0ff4b02a96 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -6,7 +6,7 @@
#include <asm/fixmap.h>
#include <asm/mtrr.h>

-#define PGALLOC_GFP GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO
+#define PGALLOC_GFP GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO

#ifdef CONFIG_HIGHPTE
#define PGALLOC_USER_GFP __GFP_HIGHMEM
--
2.8.0.rc3

2016-04-28 13:32:04

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 05/20] arm64: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

{pte,pmd,pud}_alloc_one{_kernel}, late_pgtable_alloc use PGALLOC_GFP for
__get_free_page (aka order-0).

pgd_alloc is slightly more complex because it allocates from pgd_cache
if PGD_SIZE != PAGE_SIZE and PGD_SIZE depends on the configuration
(CONFIG_ARM64_VA_BITS, PAGE_SHIFT and CONFIG_PGTABLE_LEVELS).

As per
config PGTABLE_LEVELS
int
default 2 if ARM64_16K_PAGES && ARM64_VA_BITS_36
default 2 if ARM64_64K_PAGES && ARM64_VA_BITS_42
default 3 if ARM64_64K_PAGES && ARM64_VA_BITS_48
default 3 if ARM64_4K_PAGES && ARM64_VA_BITS_39
default 3 if ARM64_16K_PAGES && ARM64_VA_BITS_47
default 4 if !ARM64_64K_PAGES && ARM64_VA_BITS_48

we should have the following options

CONFIG_ARM64_VA_BITS:48 CONFIG_PGTABLE_LEVELS:4 PAGE_SIZE:4k size:4096 pages:1
CONFIG_ARM64_VA_BITS:48 CONFIG_PGTABLE_LEVELS:4 PAGE_SIZE:16k size:16 pages:1
CONFIG_ARM64_VA_BITS:48 CONFIG_PGTABLE_LEVELS:3 PAGE_SIZE:64k size:512 pages:1
CONFIG_ARM64_VA_BITS:47 CONFIG_PGTABLE_LEVELS:3 PAGE_SIZE:16k size:16384 pages:1
CONFIG_ARM64_VA_BITS:42 CONFIG_PGTABLE_LEVELS:2 PAGE_SIZE:64k size:65536 pages:1
CONFIG_ARM64_VA_BITS:39 CONFIG_PGTABLE_LEVELS:3 PAGE_SIZE:4k size:4096 pages:1
CONFIG_ARM64_VA_BITS:36 CONFIG_PGTABLE_LEVELS:2 PAGE_SIZE:16k size:16384 pages:1

All of them fit into a single page (aka order-0). This means that this
flag has never been actually useful here because it has always been used
only for PAGE_ALLOC_COSTLY requests.

Cc: Catalin Marinas <[email protected]>
Cc: [email protected]
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
arch/arm64/include/asm/pgalloc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index ff98585d085a..d25f4f137c2a 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -26,7 +26,7 @@

#define check_pgt_cache() do { } while (0)

-#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
+#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)
#define PGD_SIZE (PTRS_PER_PGD * sizeof(pgd_t))

#if CONFIG_PGTABLE_LEVELS > 2
--
2.8.0.rc3

2016-04-28 13:32:01

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 06/20] arc: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

pte_alloc_one_kernel uses __get_order_pte but this is obviously
always zero because BITS_FOR_PTE is not larger than 9 yet the page
size is always larger than 4K. This means that this flag has never
been actually useful here because it has always been used only for
PAGE_ALLOC_COSTLY requests.

Cc: [email protected]
Acked-by: Vineet Gupta <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
arch/arc/include/asm/pgalloc.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arc/include/asm/pgalloc.h b/arch/arc/include/asm/pgalloc.h
index 86ed671286df..3749234b7419 100644
--- a/arch/arc/include/asm/pgalloc.h
+++ b/arch/arc/include/asm/pgalloc.h
@@ -95,7 +95,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
{
pte_t *pte;

- pte = (pte_t *) __get_free_pages(GFP_KERNEL | __GFP_REPEAT | __GFP_ZERO,
+ pte = (pte_t *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
__get_order_pte());

return pte;
@@ -107,7 +107,7 @@ pte_alloc_one(struct mm_struct *mm, unsigned long address)
pgtable_t pte_pg;
struct page *page;

- pte_pg = (pgtable_t)__get_free_pages(GFP_KERNEL | __GFP_REPEAT, __get_order_pte());
+ pte_pg = (pgtable_t)__get_free_pages(GFP_KERNEL, __get_order_pte());
if (!pte_pg)
return 0;
memzero((void *)pte_pg, PTRS_PER_PTE * sizeof(pte_t));
--
2.8.0.rc3

2016-04-28 13:33:17

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 03/20] x86/efi: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

efi_alloc_page_tables uses __GFP_REPEAT but it allocates an order-0
page. This means that this flag has never been actually useful here
because it has always been used only for PAGE_ALLOC_COSTLY requests.

Cc: [email protected]
Acked-by: Matt Fleming <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
arch/x86/platform/efi/efi_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 49e4dd4a1f58..a7ee3f08074f 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -141,7 +141,7 @@ int __init efi_alloc_page_tables(void)
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;

- gfp_mask = GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO;
+ gfp_mask = GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO;
efi_pgd = (pgd_t *)__get_free_page(gfp_mask);
if (!efi_pgd)
return -ENOMEM;
--
2.8.0.rc3

2016-04-28 13:33:15

by Michal Hocko

[permalink] [raw]
Subject: [PATCH 04/20] arm: get rid of superfluous __GFP_REPEAT

From: Michal Hocko <[email protected]>

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses
this flag is for more than order-2. This means that this flag has never
been actually useful here because it has always been used only for
PAGE_ALLOC_COSTLY requests.

Cc: Russell King <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
arch/arm/include/asm/pgalloc.h | 2 +-
arch/arm/mm/pgd.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index 20febb368844..b2902a5cd780 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -57,7 +57,7 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
extern pgd_t *pgd_alloc(struct mm_struct *mm);
extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);

-#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
+#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)

static inline void clean_pte_table(pte_t *pte)
{
diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c
index b8d477321730..c1c1a5c67da1 100644
--- a/arch/arm/mm/pgd.c
+++ b/arch/arm/mm/pgd.c
@@ -23,7 +23,7 @@
#define __pgd_alloc() kmalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL)
#define __pgd_free(pgd) kfree(pgd)
#else
-#define __pgd_alloc() (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_REPEAT, 2)
+#define __pgd_alloc() (pgd_t *)__get_free_pages(GFP_KERNEL, 2)
#define __pgd_free(pgd) free_pages((unsigned long)pgd, 2)
#endif

--
2.8.0.rc3

2016-04-28 14:20:13

by Mikulas Patocka

[permalink] [raw]
Subject: Re: [PATCH 18/20] dm: clean up GFP_NIO usage



On Thu, 28 Apr 2016, Michal Hocko wrote:

> From: Michal Hocko <[email protected]>
>
> copy_params uses GFP_NOIO for explicit allocation requests because this
> might be called from the suspend path. To quote Mikulas:
> : The LVM tool calls suspend and resume ioctls on device mapper block
> : devices.
> :
> : When a device is suspended, any bio sent to the device is held. If the
> : resume ioctl did GFP_KERNEL allocation, the allocation could get stuck
> : trying to write some dirty cached pages to the suspended device.
> :
> : The LVM tool and the dmeventd daemon use mlock to lock its address space,
> : so the copy_from_user/copy_to_user call cannot trigger a page fault.
>
> Relying on the mlock is quite fragile and we have a better way in kernel
> to enfore NOIO which is already used for the vmalloc fallback. Just use
> memalloc_noio_{save,restore} around the whole copy_params function which
> will force the same also to the page fult paths via copy_{from,to}_user.

The userspace memory is locked, so we don't need to use memalloc_noio_save
around copy_from_user. If the memory weren't locked, memalloc_noio_save
wouldn't help us to prevent the IO.

We don't need this change (unless you show that it fixes real bug).

Mikulas

> While we are there we can also remove __GFP_NOMEMALLOC because copy_params
> is never called from MEMALLOC context (e.g. during the reclaim).
>
> Cc: Shaohua Li <[email protected]>
> Cc: Mikulas Patocka <[email protected]>
> Cc: [email protected]
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> drivers/md/dm-ioctl.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> index 2c7ca258c4e4..fe0b57d7573c 100644
> --- a/drivers/md/dm-ioctl.c
> +++ b/drivers/md/dm-ioctl.c
> @@ -1715,16 +1715,13 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
> */
> dmi = NULL;
> if (param_kernel->data_size <= KMALLOC_MAX_SIZE) {
> - dmi = kmalloc(param_kernel->data_size, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
> + dmi = kmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
> if (dmi)
> *param_flags |= DM_PARAMS_KMALLOC;
> }
>
> if (!dmi) {
> - unsigned noio_flag;
> - noio_flag = memalloc_noio_save();
> - dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
> - memalloc_noio_restore(noio_flag);
> + dmi = __vmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
> if (dmi)
> *param_flags |= DM_PARAMS_VMALLOC;
> }
> @@ -1801,6 +1798,7 @@ static int ctl_ioctl(uint command, struct dm_ioctl __user *user)
> ioctl_fn fn = NULL;
> size_t input_param_size;
> struct dm_ioctl param_kernel;
> + unsigned noio_flag;
>
> /* only root can play with this */
> if (!capable(CAP_SYS_ADMIN))
> @@ -1832,9 +1830,12 @@ static int ctl_ioctl(uint command, struct dm_ioctl __user *user)
> }
>
> /*
> - * Copy the parameters into kernel space.
> + * Copy the parameters into kernel space. Make sure that no IO is triggered
> + * from the allocation paths because this might be called during the suspend.
> */
> + noio_flag = memalloc_noio_save();
> r = copy_params(user, &param_kernel, ioctl_flags, &param, &param_flags);
> + memalloc_noio_restore(noio_flag);
>
> if (r)
> return r;
> --
> 2.8.0.rc3
>

2016-04-28 14:41:19

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 18/20] dm: clean up GFP_NIO usage

On Thu 28-04-16 10:20:09, Mikulas Patocka wrote:
>
>
> On Thu, 28 Apr 2016, Michal Hocko wrote:
>
> > From: Michal Hocko <[email protected]>
> >
> > copy_params uses GFP_NOIO for explicit allocation requests because this
> > might be called from the suspend path. To quote Mikulas:
> > : The LVM tool calls suspend and resume ioctls on device mapper block
> > : devices.
> > :
> > : When a device is suspended, any bio sent to the device is held. If the
> > : resume ioctl did GFP_KERNEL allocation, the allocation could get stuck
> > : trying to write some dirty cached pages to the suspended device.
> > :
> > : The LVM tool and the dmeventd daemon use mlock to lock its address space,
> > : so the copy_from_user/copy_to_user call cannot trigger a page fault.
> >
> > Relying on the mlock is quite fragile and we have a better way in kernel
> > to enfore NOIO which is already used for the vmalloc fallback. Just use
> > memalloc_noio_{save,restore} around the whole copy_params function which
> > will force the same also to the page fult paths via copy_{from,to}_user.
>
> The userspace memory is locked, so we don't need to use memalloc_noio_save
> around copy_from_user. If the memory weren't locked, memalloc_noio_save
> wouldn't help us to prevent the IO.

OK, you are right. Got your point. You would have to read from disk to
fault memory in so this is not just about not performing IO during the
reclaim.

So scratch this patch then.

Thanks!
--
Michal Hocko
SUSE Labs

2016-04-28 14:51:25

by Michal Hocko

[permalink] [raw]
Subject: [PATCH] md: simplify free_params for kmalloc vs vmalloc fallback

From: Michal Hocko <[email protected]>

Use kvfree rather than DM_PARAMS_[KV]MALLOC specific param flags.

Cc: Shaohua Li <[email protected]>
Cc: Mikulas Patocka <[email protected]>
Cc: [email protected]
Signed-off-by: Michal Hocko <[email protected]>
---
Hi,
this is a rebase on top of dropped "dm: clean up GFP_NIO usage" which
should be dropped as per the feedback from Mikulas.

drivers/md/dm-ioctl.c | 16 +++-------------
1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 2c7ca258c4e4..e66e5b43bc18 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1670,19 +1670,14 @@ static int check_version(unsigned int cmd, struct dm_ioctl __user *user)
return r;
}

-#define DM_PARAMS_KMALLOC 0x0001 /* Params alloced with kmalloc */
-#define DM_PARAMS_VMALLOC 0x0002 /* Params alloced with vmalloc */
-#define DM_WIPE_BUFFER 0x0010 /* Wipe input buffer before returning from ioctl */
+#define DM_WIPE_BUFFER 0x0001 /* Wipe input buffer before returning from ioctl */

static void free_params(struct dm_ioctl *param, size_t param_size, int param_flags)
{
if (param_flags & DM_WIPE_BUFFER)
memset(param, 0, param_size);

- if (param_flags & DM_PARAMS_KMALLOC)
- kfree(param);
- if (param_flags & DM_PARAMS_VMALLOC)
- vfree(param);
+ kvfree(param);
}

static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kernel,
@@ -1714,19 +1709,14 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
* Use kmalloc() rather than vmalloc() when we can.
*/
dmi = NULL;
- if (param_kernel->data_size <= KMALLOC_MAX_SIZE) {
+ if (param_kernel->data_size <= KMALLOC_MAX_SIZE)
dmi = kmalloc(param_kernel->data_size, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
- if (dmi)
- *param_flags |= DM_PARAMS_KMALLOC;
- }

if (!dmi) {
unsigned noio_flag;
noio_flag = memalloc_noio_save();
dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
memalloc_noio_restore(noio_flag);
- if (dmi)
- *param_flags |= DM_PARAMS_VMALLOC;
}

if (!dmi) {
--
2.8.0.rc3

2016-04-28 14:55:59

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 04/20] arm: get rid of superfluous __GFP_REPEAT

On Thu, Apr 28, 2016 at 03:23:50PM +0200, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> __GFP_REPEAT has a rather weak semantic but since it has been introduced
> around 2.6.12 it has been ignored for low order allocations.
>
> PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses
> this flag is for more than order-2. This means that this flag has never
> been actually useful here because it has always been used only for
> PAGE_ALLOC_COSTLY requests.

I'm unconvinced. Back in 2013, I was seeing a lot of failures, so:

commit 8c65da6dc89ccb605d73773b1dd617e72982d971
Author: Russell King <[email protected]>
Date: Sat Nov 30 12:52:31 2013 +0000

ARM: pgd allocation: retry on failure

Make pgd allocation retry on failure; we really need this to succeed
otherwise fork() can trigger OOMs.

Signed-off-by: Russell King <[email protected]>

Maybe something has changed again in the MM layer which makes this flag
unnecessary again, and it was a temporary blip around that time, I don't
know.

--
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2016-04-28 15:04:09

by Mikulas Patocka

[permalink] [raw]
Subject: Re: [PATCH] md: simplify free_params for kmalloc vs vmalloc fallback

Acked-by: Mikulas Patocka <[email protected]>

BTW. we could also use kvmalloc to complement kvfree, proposed here:
https://www.redhat.com/archives/dm-devel/2015-July/msg00046.html

Mikulas

On Thu, 28 Apr 2016, Michal Hocko wrote:

> From: Michal Hocko <[email protected]>
>
> Use kvfree rather than DM_PARAMS_[KV]MALLOC specific param flags.
>
> Cc: Shaohua Li <[email protected]>
> Cc: Mikulas Patocka <[email protected]>
> Cc: [email protected]
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> Hi,
> this is a rebase on top of dropped "dm: clean up GFP_NIO usage" which
> should be dropped as per the feedback from Mikulas.
>
> drivers/md/dm-ioctl.c | 16 +++-------------
> 1 file changed, 3 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> index 2c7ca258c4e4..e66e5b43bc18 100644
> --- a/drivers/md/dm-ioctl.c
> +++ b/drivers/md/dm-ioctl.c
> @@ -1670,19 +1670,14 @@ static int check_version(unsigned int cmd, struct dm_ioctl __user *user)
> return r;
> }
>
> -#define DM_PARAMS_KMALLOC 0x0001 /* Params alloced with kmalloc */
> -#define DM_PARAMS_VMALLOC 0x0002 /* Params alloced with vmalloc */
> -#define DM_WIPE_BUFFER 0x0010 /* Wipe input buffer before returning from ioctl */
> +#define DM_WIPE_BUFFER 0x0001 /* Wipe input buffer before returning from ioctl */
>
> static void free_params(struct dm_ioctl *param, size_t param_size, int param_flags)
> {
> if (param_flags & DM_WIPE_BUFFER)
> memset(param, 0, param_size);
>
> - if (param_flags & DM_PARAMS_KMALLOC)
> - kfree(param);
> - if (param_flags & DM_PARAMS_VMALLOC)
> - vfree(param);
> + kvfree(param);
> }
>
> static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kernel,
> @@ -1714,19 +1709,14 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
> * Use kmalloc() rather than vmalloc() when we can.
> */
> dmi = NULL;
> - if (param_kernel->data_size <= KMALLOC_MAX_SIZE) {
> + if (param_kernel->data_size <= KMALLOC_MAX_SIZE)
> dmi = kmalloc(param_kernel->data_size, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
> - if (dmi)
> - *param_flags |= DM_PARAMS_KMALLOC;
> - }
>
> if (!dmi) {
> unsigned noio_flag;
> noio_flag = memalloc_noio_save();
> dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
> memalloc_noio_restore(noio_flag);
> - if (dmi)
> - *param_flags |= DM_PARAMS_VMALLOC;
> }
>
> if (!dmi) {
> --
> 2.8.0.rc3
>

2016-04-28 15:08:35

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 04/20] arm: get rid of superfluous __GFP_REPEAT

On Thu 28-04-16 15:55:45, Russell King - ARM Linux wrote:
> On Thu, Apr 28, 2016 at 03:23:50PM +0200, Michal Hocko wrote:
> > From: Michal Hocko <[email protected]>
> >
> > __GFP_REPEAT has a rather weak semantic but since it has been introduced
> > around 2.6.12 it has been ignored for low order allocations.
> >
> > PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses
> > this flag is for more than order-2. This means that this flag has never
> > been actually useful here because it has always been used only for
> > PAGE_ALLOC_COSTLY requests.
>
> I'm unconvinced. Back in 2013, I was seeing a lot of failures, so:
>
> commit 8c65da6dc89ccb605d73773b1dd617e72982d971
> Author: Russell King <[email protected]>
> Date: Sat Nov 30 12:52:31 2013 +0000
>
> ARM: pgd allocation: retry on failure
>
> Make pgd allocation retry on failure; we really need this to succeed
> otherwise fork() can trigger OOMs.
>
> Signed-off-by: Russell King <[email protected]>
>
> Maybe something has changed again in the MM layer which makes this flag
> unnecessary again, and it was a temporary blip around that time, I don't
> know.

PAGE_ALLOC_COSTLY_ORDER is defined to order 3 since 2007 and even before
the code was doing
- if ((order <= 3) || (gfp_mask & __GFP_REPEAT))
+ if ((order <= PAGE_ALLOC_COSTLY_ORDER) ||
+ (gfp_mask & __GFP_REPEAT))
do_retry = 1;

So an order-2 allocation which is the case for this particular code now
will trigger the OOM killer and fail only when the current task is
killed by the OOM killer. Other than that order-2 is basically
GFP_NOFAIL. Have a look at __alloc_pages_slowpath() for more details.
--
Michal Hocko
SUSE Labs

2016-04-28 15:28:17

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] md: simplify free_params for kmalloc vs vmalloc fallback

On Thu 28-04-16 11:04:05, Mikulas Patocka wrote:
> Acked-by: Mikulas Patocka <[email protected]>

Thanks!

> BTW. we could also use kvmalloc to complement kvfree, proposed here:
> https://www.redhat.com/archives/dm-devel/2015-July/msg00046.html

If there are sufficient users (I haven't checked other than quick git
grep on KMALLOC_MAX_SIZE and there do not seem that many) who are
sharing the same fallback strategy then why not. But I suspect that some
would rather fallback earlier and even do not attempt larger than e.g.
order-1 requests.
--
Michal Hocko
SUSE Labs

2016-04-28 15:37:41

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 19/20] md: simplify free_params for kmalloc vs vmalloc fallback

On Thu, Apr 28 2016 at 9:24am -0400,
Michal Hocko <[email protected]> wrote:

> From: Michal Hocko <[email protected]>
>
> Use kvfree rather than DM_PARAMS_[KV]MALLOC specific param flags.
>
> Cc: Shaohua Li <[email protected]>
> Cc: Mikulas Patocka <[email protected]>
> Cc: [email protected]
> Signed-off-by: Michal Hocko <[email protected]>

Nack, seriously, this is the 3rd time this patch has been attempted.
Did you actually test the change? It'll crash very quickly, see:

https://www.redhat.com/archives/dm-devel/2016-April/msg00103.html

2016-04-28 15:41:04

by Mikulas Patocka

[permalink] [raw]
Subject: Re: [PATCH] md: simplify free_params for kmalloc vs vmalloc fallback



On Thu, 28 Apr 2016, Michal Hocko wrote:

> On Thu 28-04-16 11:04:05, Mikulas Patocka wrote:
> > Acked-by: Mikulas Patocka <[email protected]>
>
> Thanks!
>
> > BTW. we could also use kvmalloc to complement kvfree, proposed here:
> > https://www.redhat.com/archives/dm-devel/2015-July/msg00046.html
>
> If there are sufficient users (I haven't checked other than quick git
> grep on KMALLOC_MAX_SIZE

the problem is that kmallocs with large sizes near KMALLOC_MAX_SIZE are
unreliable, they'll randomly fail if memory is too fragmented.

> and there do not seem that many) who are
> sharing the same fallback strategy then why not. But I suspect that some
> would rather fallback earlier and even do not attempt larger than e.g.
> order-1 requests.
> --
> Michal Hocko
> SUSE Labs

There are many users that use one of these patterns:

if (size <= some_threshold)
p = kmalloc(size);
else
p = vmalloc(size);

or

p = kmalloc(size);
if (!p)
p = vmalloc(size);


For example: alloc_fdmem, seq_buf_alloc, setxattr, getxattr, ipc_alloc,
pidlist_allocate, get_pages_array, alloc_bucket_locks,
frame_vector_create. If you grep the kernel for vmalloc, you'll find this
pattern over and over again.

In alloc_large_system_hash, there is
table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL);
- that is clearly wrong because __vmalloc doesn't respect GFP_ATOMIC

Mikulas

2016-04-28 16:00:25

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 19/20] md: simplify free_params for kmalloc vs vmalloc fallback

On Thu 28-04-16 11:37:31, Mike Snitzer wrote:
> On Thu, Apr 28 2016 at 9:24am -0400,
> Michal Hocko <[email protected]> wrote:
>
> > From: Michal Hocko <[email protected]>
> >
> > Use kvfree rather than DM_PARAMS_[KV]MALLOC specific param flags.
> >
> > Cc: Shaohua Li <[email protected]>
> > Cc: Mikulas Patocka <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Michal Hocko <[email protected]>
>
> Nack, seriously, this is the 3rd time this patch has been attempted.
> Did you actually test the change? It'll crash very quickly, see:
>
> https://www.redhat.com/archives/dm-devel/2016-April/msg00103.html

You are right! My bad I should have checked the other callers!

--
Michal Hocko
SUSE Labs

2016-04-28 16:21:32

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH 15/20] tile: get rid of superfluous __GFP_REPEAT

On 4/28/2016 9:24 AM, Michal Hocko wrote:
> From: Michal Hocko<[email protected]>
>
> __GFP_REPEAT has a rather weak semantic but since it has been introduced
> around 2.6.12 it has been ignored for low order allocations.
>
> pgtable_alloc_one uses __GFP_REPEAT flag for L2_USER_PGTABLE_ORDER but
> the order is either 0 or 3 if L2_KERNEL_PGTABLE_SHIFT for HPAGE_SHIFT.
> This means that this flag has never been actually useful here because it
> has always been used only for PAGE_ALLOC_COSTLY requests.
>
> Cc: Chris Metcalf<[email protected]>
> Cc:[email protected]
> Signed-off-by: Michal Hocko<[email protected]>
> ---
> arch/tile/mm/pgtable.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

This seems OK as far as I can tell from code review.

Acked-by: Chris Metcalf <[email protected]> [for tile]

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2016-04-28 16:59:38

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] md: simplify free_params for kmalloc vs vmalloc fallback

On Thu 28-04-16 11:40:59, Mikulas Patocka wrote:
[...]
> There are many users that use one of these patterns:
>
> if (size <= some_threshold)
> p = kmalloc(size);
> else
> p = vmalloc(size);
>
> or
>
> p = kmalloc(size);
> if (!p)
> p = vmalloc(size);
>
>
> For example: alloc_fdmem, seq_buf_alloc, setxattr, getxattr, ipc_alloc,
> pidlist_allocate, get_pages_array, alloc_bucket_locks,
> frame_vector_create. If you grep the kernel for vmalloc, you'll find this
> pattern over and over again.

It is certainly good to address a common pattern by a helper if it makes
to code easier to follo IMHO.

>
> In alloc_large_system_hash, there is
> table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL);
> - that is clearly wrong because __vmalloc doesn't respect GFP_ATOMIC

I have seen this code some time already. I guess it was Al complaining
about it but then I just forgot about it. I have no idea why GFP_ATOMIC
was used there. This predates git times but it should be
https://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10/2.6.10-mm1/broken-out/alloc_large_system_hash-numa-interleaving.patch
The changelog is quite verbose but no mention about this ugliness.

So I do agree that the above should be fixed and a common helper might
be interesting but I am afraid we are getting off topic here.

Thanks!
--
Michal Hocko
SUSE Labs

2016-04-29 09:41:56

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 04/20] arm: get rid of superfluous __GFP_REPEAT

On Thu 28-04-16 17:08:31, Michal Hocko wrote:
> On Thu 28-04-16 15:55:45, Russell King - ARM Linux wrote:
> > On Thu, Apr 28, 2016 at 03:23:50PM +0200, Michal Hocko wrote:
> > > From: Michal Hocko <[email protected]>
> > >
> > > __GFP_REPEAT has a rather weak semantic but since it has been introduced
> > > around 2.6.12 it has been ignored for low order allocations.
> > >
> > > PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses
> > > this flag is for more than order-2. This means that this flag has never
> > > been actually useful here because it has always been used only for
> > > PAGE_ALLOC_COSTLY requests.
> >
> > I'm unconvinced. Back in 2013, I was seeing a lot of failures, so:
> >
> > commit 8c65da6dc89ccb605d73773b1dd617e72982d971
> > Author: Russell King <[email protected]>
> > Date: Sat Nov 30 12:52:31 2013 +0000
> >
> > ARM: pgd allocation: retry on failure
> >
> > Make pgd allocation retry on failure; we really need this to succeed
> > otherwise fork() can trigger OOMs.
> >
> > Signed-off-by: Russell King <[email protected]>
> >
> > Maybe something has changed again in the MM layer which makes this flag
> > unnecessary again, and it was a temporary blip around that time, I don't
> > know.
>
> PAGE_ALLOC_COSTLY_ORDER is defined to order 3 since 2007 and even before
> the code was doing
> - if ((order <= 3) || (gfp_mask & __GFP_REPEAT))
> + if ((order <= PAGE_ALLOC_COSTLY_ORDER) ||
> + (gfp_mask & __GFP_REPEAT))
> do_retry = 1;
>
> So an order-2 allocation which is the case for this particular code now
> will trigger the OOM killer and fail only when the current task is
> killed by the OOM killer. Other than that order-2 is basically
> GFP_NOFAIL. Have a look at __alloc_pages_slowpath() for more details.

Does this explanation help?
--
Michal Hocko
SUSE Labs

2016-04-29 18:54:55

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 17/20] dm: get rid of superfluous gfp flags

On Thu, Apr 28 2016 at 9:24am -0400,
Michal Hocko <[email protected]> wrote:

> From: Michal Hocko <[email protected]>
>
> copy_params seems to be little bit confused about which allocation flags
> to use. It enforces GFP_NOIO even though it uses
> memalloc_noio_{save,restore} which enforces GFP_NOIO at the page
> allocator level automatically (via memalloc_noio_flags). It also
> uses __GFP_REPEAT for the __vmalloc request which doesn't make much
> sense either because vmalloc doesn't rely on costly high order
> allocations. Let's just drop the __GFP_REPEAT and leave the further
> cleanup to later changes.
>
> Cc: Shaohua Li <[email protected]>
> Cc: Mikulas Patocka <[email protected]>
> Cc: [email protected]
> Signed-off-by: Michal Hocko <[email protected]>

I've taken this patch for 4.7 but editted the header, see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.7&id=0222c76e96163355620224625c1cd80991086dc7