LinuxLists.cc - [PATCH v2 0/4] extend vmalloc support for constrained allocations

2021-11-22 15:32:57

Subject: [PATCH v2 0/4] extend vmalloc support for constrained allocations

Hi,
The previous version has been posted here [1]

I hope I have addressed all the feedback. There were some suggestions
for further improvements but I would rather make this smaller as I
cannot really invest more time and I believe further changes can be done
on top.

This version is a rebase on top of the current Linus tree. Except for
the review feedback and conflicting changes in the area there is only
one change to filter out __GFP_NOFAIL from the bulk allocator. This is
not necessary strictly speaking AFAICS but I found it less confusing
because vmalloc has its fallback strategy and the bulk allocator is
meant only for the fast path.

Original cover:
Based on a recent discussion with Dave and Neil [2] I have tried to
implement NOFS, NOIO, NOFAIL support for the vmalloc to make
life of kvmalloc users easier.

A requirement for NOFAIL support for kvmalloc was new to me but this
seems to be really needed by the xfs code.

NOFS/NOIO was a known and a long term problem which was hoped to be
handled by the scope API. Those scope should have been used at the
reclaim recursion boundaries both to document them and also to remove
the necessity of NOFS/NOIO constrains for all allocations within that
scope. Instead workarounds were developed to wrap a single allocation
instead (like ceph_kvmalloc).

First patch implements NOFS/NOIO support for vmalloc. The second one
adds NOFAIL support and the third one bundles all together into kvmalloc
and drops ceph_kvmalloc which can use kvmalloc directly now.

I hope I haven't missed anything in the vmalloc allocator.

Thanks!

[1] http://lkml.kernel.org/r/[email protected]
[2] http://lkml.kernel.org/r/[email protected]

2021-11-22 15:32:59

by Michal Hocko

[permalink] [raw]

Subject: [PATCH v2 3/4] mm/vmalloc: be more explicit about supported gfp flags.

From: Michal Hocko <[email protected]>

b7d90e7a5ea8 ("mm/vmalloc: be more explicit about supported gfp flags")
has been merged prematurely without the rest of the series and without
addressed review feedback from Neil. Fix that up now. Only wording is
changed slightly.

Signed-off-by: Michal Hocko <[email protected]>
---
mm/vmalloc.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b6aed4f94a85..b1c115ec13be 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3021,12 +3021,14 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
*
* Allocate enough pages to cover @size from the page level
* allocator with @gfp_mask flags. Please note that the full set of gfp
- * flags are not supported. GFP_KERNEL would be a preferred allocation mode
- * but GFP_NOFS and GFP_NOIO are supported as well. Zone modifiers are not
- * supported. From the reclaim modifiers__GFP_DIRECT_RECLAIM is required (aka
- * GFP_NOWAIT is not supported) and only __GFP_NOFAIL is supported (aka
- * __GFP_NORETRY and __GFP_RETRY_MAYFAIL are not supported).
- * __GFP_NOWARN can be used to suppress error messages about failures.
+ * flags are not supported. GFP_KERNEL, GFP_NOFS and GFP_NOIO are all
+ * supported.
+ * Zone modifiers are not supported. From the reclaim modifiers
+ * __GFP_DIRECT_RECLAIM is required (aka GFP_NOWAIT is not supported)
+ * and only __GFP_NOFAIL is supported (i.e. __GFP_NORETRY and
+ * __GFP_RETRY_MAYFAIL are not supported).
+ *
+ * __GFP_NOWARN can be used to suppress failures messages.
*
* Map them into contiguous kernel virtual space, using a pagetable
* protection of @prot.
--
2.30.2

2021-11-22 15:33:02

by Michal Hocko

[permalink] [raw]

Subject: [PATCH v2 4/4] mm: allow !GFP_KERNEL allocations for kvmalloc

From: Michal Hocko <[email protected]>

A support for GFP_NO{FS,IO} and __GFP_NOFAIL has been implemented
by previous patches so we can allow the support for kvmalloc. This
will allow some external users to simplify or completely remove
their helpers.

GFP_NOWAIT semantic hasn't been supported so far but it hasn't been
explicitly documented so let's add a note about that.

ceph_kvmalloc is the first helper to be dropped and changed to
kvmalloc.

Signed-off-by: Michal Hocko <[email protected]>
---
include/linux/ceph/libceph.h | 1 -
mm/util.c | 15 ++++-----------
net/ceph/buffer.c | 4 ++--
net/ceph/ceph_common.c | 27 ---------------------------
net/ceph/crypto.c | 2 +-
net/ceph/messenger.c | 2 +-
net/ceph/messenger_v2.c | 2 +-
net/ceph/osdmap.c | 12 ++++++------
8 files changed, 15 insertions(+), 50 deletions(-)

diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h
index 409d8c29bc4f..309acbcb5a8a 100644
--- a/include/linux/ceph/libceph.h
+++ b/include/linux/ceph/libceph.h
@@ -295,7 +295,6 @@ extern bool libceph_compatible(void *data);

extern const char *ceph_msg_type_name(int type);
extern int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid);
-extern void *ceph_kvmalloc(size_t size, gfp_t flags);

struct fs_parameter;
struct fc_log;
diff --git a/mm/util.c b/mm/util.c
index e58151a61255..7275f2829e3f 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -549,13 +549,10 @@ EXPORT_SYMBOL(vm_mmap);
* Uses kmalloc to get the memory but if the allocation fails then falls back
* to the vmalloc allocator. Use kvfree for freeing the memory.
*
- * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported.
+ * GFP_NOWAIT and GFP_ATOMIC are not supported, neither is the __GFP_NORETRY modifier.
* __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
* preferable to the vmalloc fallback, due to visible performance drawbacks.
*
- * Please note that any use of gfp flags outside of GFP_KERNEL is careful to not
- * fall back to vmalloc.
- *
* Return: pointer to the allocated memory of %NULL in case of failure
*/
void *kvmalloc_node(size_t size, gfp_t flags, int node)
@@ -563,13 +560,6 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
gfp_t kmalloc_flags = flags;
void *ret;

- /*
- * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
- * so the given set of flags has to be compatible.
- */
- if ((flags & GFP_KERNEL) != GFP_KERNEL)
- return kmalloc_node(size, flags, node);
-
/*
* We want to attempt a large physically contiguous block first because
* it is less likely to fragment multiple larger blocks and therefore
@@ -582,6 +572,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)

if (!(kmalloc_flags & __GFP_RETRY_MAYFAIL))
kmalloc_flags |= __GFP_NORETRY;
+
+ /* nofail semantic is implemented by the vmalloc fallback */
+ kmalloc_flags &= ~__GFP_NOFAIL;
}

ret = kmalloc_node(size, kmalloc_flags, node);
diff --git a/net/ceph/buffer.c b/net/ceph/buffer.c
index 5622763ad402..7e51f128045d 100644
--- a/net/ceph/buffer.c
+++ b/net/ceph/buffer.c
@@ -7,7 +7,7 @@

#include <linux/ceph/buffer.h>
#include <linux/ceph/decode.h>
-#include <linux/ceph/libceph.h> /* for ceph_kvmalloc */
+#include <linux/ceph/libceph.h> /* for kvmalloc */

struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp)
{
@@ -17,7 +17,7 @@ struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp)
if (!b)
return NULL;

- b->vec.iov_base = ceph_kvmalloc(len, gfp);
+ b->vec.iov_base = kvmalloc(len, gfp);
if (!b->vec.iov_base) {
kfree(b);
return NULL;
diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c
index 97d6ea763e32..9441b4a4912b 100644
--- a/net/ceph/ceph_common.c
+++ b/net/ceph/ceph_common.c
@@ -190,33 +190,6 @@ int ceph_compare_options(struct ceph_options *new_opt,
}
EXPORT_SYMBOL(ceph_compare_options);

-/*
- * kvmalloc() doesn't fall back to the vmalloc allocator unless flags are
- * compatible with (a superset of) GFP_KERNEL. This is because while the
- * actual pages are allocated with the specified flags, the page table pages
- * are always allocated with GFP_KERNEL.
- *
- * ceph_kvmalloc() may be called with GFP_KERNEL, GFP_NOFS or GFP_NOIO.
- */
-void *ceph_kvmalloc(size_t size, gfp_t flags)
-{
- void *p;
-
- if ((flags & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS)) {
- p = kvmalloc(size, flags);
- } else if ((flags & (__GFP_IO | __GFP_FS)) == __GFP_IO) {
- unsigned int nofs_flag = memalloc_nofs_save();
- p = kvmalloc(size, GFP_KERNEL);
- memalloc_nofs_restore(nofs_flag);
- } else {
- unsigned int noio_flag = memalloc_noio_save();
- p = kvmalloc(size, GFP_KERNEL);
- memalloc_noio_restore(noio_flag);
- }
-
- return p;
-}
-
static int parse_fsid(const char *str, struct ceph_fsid *fsid)
{
int i = 0;
diff --git a/net/ceph/crypto.c b/net/ceph/crypto.c
index 92d89b331645..051d22c0e4ad 100644
--- a/net/ceph/crypto.c
+++ b/net/ceph/crypto.c
@@ -147,7 +147,7 @@ void ceph_crypto_key_destroy(struct ceph_crypto_key *key)
static const u8 *aes_iv = (u8 *)CEPH_AES_IV;

/*
- * Should be used for buffers allocated with ceph_kvmalloc().
+ * Should be used for buffers allocated with kvmalloc().
* Currently these are encrypt out-buffer (ceph_buffer) and decrypt
* in-buffer (msg front).
*
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 57d043b382ed..7b891be799d2 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1920,7 +1920,7 @@ struct ceph_msg *ceph_msg_new2(int type, int front_len, int max_data_items,

/* front */
if (front_len) {
- m->front.iov_base = ceph_kvmalloc(front_len, flags);
+ m->front.iov_base = kvmalloc(front_len, flags);
if (m->front.iov_base == NULL) {
dout("ceph_msg_new can't allocate %d bytes\n",
front_len);
diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index cc40ce4e02fb..c4099b641b38 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -308,7 +308,7 @@ static void *alloc_conn_buf(struct ceph_connection *con, int len)
if (WARN_ON(con->v2.conn_buf_cnt >= ARRAY_SIZE(con->v2.conn_bufs)))
return NULL;

- buf = ceph_kvmalloc(len, GFP_NOIO);
+ buf = kvmalloc(len, GFP_NOIO);
if (!buf)
return NULL;

diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
index 75b738083523..2823bb3cff55 100644
--- a/net/ceph/osdmap.c
+++ b/net/ceph/osdmap.c
@@ -980,7 +980,7 @@ static struct crush_work *alloc_workspace(const struct crush_map *c)
work_size = crush_work_size(c, CEPH_PG_MAX_SIZE);
dout("%s work_size %zu bytes\n", __func__, work_size);

- work = ceph_kvmalloc(work_size, GFP_NOIO);
+ work = kvmalloc(work_size, GFP_NOIO);
if (!work)
return NULL;

@@ -1190,9 +1190,9 @@ static int osdmap_set_max_osd(struct ceph_osdmap *map, u32 max)
if (max == map->max_osd)
return 0;

- state = ceph_kvmalloc(array_size(max, sizeof(*state)), GFP_NOFS);
- weight = ceph_kvmalloc(array_size(max, sizeof(*weight)), GFP_NOFS);
- addr = ceph_kvmalloc(array_size(max, sizeof(*addr)), GFP_NOFS);
+ state = kvmalloc(array_size(max, sizeof(*state)), GFP_NOFS);
+ weight = kvmalloc(array_size(max, sizeof(*weight)), GFP_NOFS);
+ addr = kvmalloc(array_size(max, sizeof(*addr)), GFP_NOFS);
if (!state || !weight || !addr) {
kvfree(state);
kvfree(weight);
@@ -1222,7 +1222,7 @@ static int osdmap_set_max_osd(struct ceph_osdmap *map, u32 max)
if (map->osd_primary_affinity) {
u32 *affinity;

- affinity = ceph_kvmalloc(array_size(max, sizeof(*affinity)),
+ affinity = kvmalloc(array_size(max, sizeof(*affinity)),
GFP_NOFS);
if (!affinity)
return -ENOMEM;
@@ -1503,7 +1503,7 @@ static int set_primary_affinity(struct ceph_osdmap *map, int osd, u32 aff)
if (!map->osd_primary_affinity) {
int i;

- map->osd_primary_affinity = ceph_kvmalloc(
+ map->osd_primary_affinity = kvmalloc(
array_size(map->max_osd, sizeof(*map->osd_primary_affinity)),
GFP_NOFS);
if (!map->osd_primary_affinity)
--
2.30.2

2021-11-23 18:58:01

by Uladzislau Rezki (Sony)

[permalink] [raw]

Subject: Re: [PATCH v2 4/4] mm: allow !GFP_KERNEL allocations for kvmalloc

On Mon, Nov 22, 2021 at 04:32:33PM +0100, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> A support for GFP_NO{FS,IO} and __GFP_NOFAIL has been implemented
> by previous patches so we can allow the support for kvmalloc. This
> will allow some external users to simplify or completely remove
> their helpers.
>
> GFP_NOWAIT semantic hasn't been supported so far but it hasn't been
> explicitly documented so let's add a note about that.
>
> ceph_kvmalloc is the first helper to be dropped and changed to
> kvmalloc.
>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> include/linux/ceph/libceph.h | 1 -
> mm/util.c | 15 ++++-----------
> net/ceph/buffer.c | 4 ++--
> net/ceph/ceph_common.c | 27 ---------------------------
> net/ceph/crypto.c | 2 +-
> net/ceph/messenger.c | 2 +-
> net/ceph/messenger_v2.c | 2 +-
> net/ceph/osdmap.c | 12 ++++++------
> 8 files changed, 15 insertions(+), 50 deletions(-)
>
> diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h
> index 409d8c29bc4f..309acbcb5a8a 100644
> --- a/include/linux/ceph/libceph.h
> +++ b/include/linux/ceph/libceph.h
> @@ -295,7 +295,6 @@ extern bool libceph_compatible(void *data);
>
> extern const char *ceph_msg_type_name(int type);
> extern int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid);
> -extern void *ceph_kvmalloc(size_t size, gfp_t flags);
>
> struct fs_parameter;
> struct fc_log;
> diff --git a/mm/util.c b/mm/util.c
> index e58151a61255..7275f2829e3f 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -549,13 +549,10 @@ EXPORT_SYMBOL(vm_mmap);
> * Uses kmalloc to get the memory but if the allocation fails then falls back
> * to the vmalloc allocator. Use kvfree for freeing the memory.
> *
> - * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported.
> + * GFP_NOWAIT and GFP_ATOMIC are not supported, neither is the __GFP_NORETRY modifier.
> * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
> * preferable to the vmalloc fallback, due to visible performance drawbacks.
> *
> - * Please note that any use of gfp flags outside of GFP_KERNEL is careful to not
> - * fall back to vmalloc.
> - *
> * Return: pointer to the allocated memory of %NULL in case of failure
> */
> void *kvmalloc_node(size_t size, gfp_t flags, int node)
> @@ -563,13 +560,6 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
> gfp_t kmalloc_flags = flags;
> void *ret;
>
> - /*
> - * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> - * so the given set of flags has to be compatible.
> - */
> - if ((flags & GFP_KERNEL) != GFP_KERNEL)
> - return kmalloc_node(size, flags, node);
> -
> /*
> * We want to attempt a large physically contiguous block first because
> * it is less likely to fragment multiple larger blocks and therefore
> @@ -582,6 +572,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>
> if (!(kmalloc_flags & __GFP_RETRY_MAYFAIL))
> kmalloc_flags |= __GFP_NORETRY;
> +
> + /* nofail semantic is implemented by the vmalloc fallback */
> + kmalloc_flags &= ~__GFP_NOFAIL;
> }
>
> ret = kmalloc_node(size, kmalloc_flags, node);
> diff --git a/net/ceph/buffer.c b/net/ceph/buffer.c
> index 5622763ad402..7e51f128045d 100644
> --- a/net/ceph/buffer.c
> +++ b/net/ceph/buffer.c
> @@ -7,7 +7,7 @@
>
> #include <linux/ceph/buffer.h>
> #include <linux/ceph/decode.h>
> -#include <linux/ceph/libceph.h> /* for ceph_kvmalloc */
> +#include <linux/ceph/libceph.h> /* for kvmalloc */
>
> struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp)
> {
> @@ -17,7 +17,7 @@ struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp)
> if (!b)
> return NULL;
>
> - b->vec.iov_base = ceph_kvmalloc(len, gfp);
> + b->vec.iov_base = kvmalloc(len, gfp);
> if (!b->vec.iov_base) {
> kfree(b);
> return NULL;
> diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c
> index 97d6ea763e32..9441b4a4912b 100644
> --- a/net/ceph/ceph_common.c
> +++ b/net/ceph/ceph_common.c
> @@ -190,33 +190,6 @@ int ceph_compare_options(struct ceph_options *new_opt,
> }
> EXPORT_SYMBOL(ceph_compare_options);
>
> -/*
> - * kvmalloc() doesn't fall back to the vmalloc allocator unless flags are
> - * compatible with (a superset of) GFP_KERNEL. This is because while the
> - * actual pages are allocated with the specified flags, the page table pages
> - * are always allocated with GFP_KERNEL.
> - *
> - * ceph_kvmalloc() may be called with GFP_KERNEL, GFP_NOFS or GFP_NOIO.
> - */
> -void *ceph_kvmalloc(size_t size, gfp_t flags)
> -{
> - void *p;
> -
> - if ((flags & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS)) {
> - p = kvmalloc(size, flags);
> - } else if ((flags & (__GFP_IO | __GFP_FS)) == __GFP_IO) {
> - unsigned int nofs_flag = memalloc_nofs_save();
> - p = kvmalloc(size, GFP_KERNEL);
> - memalloc_nofs_restore(nofs_flag);
> - } else {
> - unsigned int noio_flag = memalloc_noio_save();
> - p = kvmalloc(size, GFP_KERNEL);
> - memalloc_noio_restore(noio_flag);
> - }
> -
> - return p;
> -}
> -
> static int parse_fsid(const char *str, struct ceph_fsid *fsid)
> {
> int i = 0;
> diff --git a/net/ceph/crypto.c b/net/ceph/crypto.c
> index 92d89b331645..051d22c0e4ad 100644
> --- a/net/ceph/crypto.c
> +++ b/net/ceph/crypto.c
> @@ -147,7 +147,7 @@ void ceph_crypto_key_destroy(struct ceph_crypto_key *key)
> static const u8 *aes_iv = (u8 *)CEPH_AES_IV;
>
> /*
> - * Should be used for buffers allocated with ceph_kvmalloc().
> + * Should be used for buffers allocated with kvmalloc().
> * Currently these are encrypt out-buffer (ceph_buffer) and decrypt
> * in-buffer (msg front).
> *
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index 57d043b382ed..7b891be799d2 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -1920,7 +1920,7 @@ struct ceph_msg *ceph_msg_new2(int type, int front_len, int max_data_items,
>
> /* front */
> if (front_len) {
> - m->front.iov_base = ceph_kvmalloc(front_len, flags);
> + m->front.iov_base = kvmalloc(front_len, flags);
> if (m->front.iov_base == NULL) {
> dout("ceph_msg_new can't allocate %d bytes\n",
> front_len);
> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> index cc40ce4e02fb..c4099b641b38 100644
> --- a/net/ceph/messenger_v2.c
> +++ b/net/ceph/messenger_v2.c
> @@ -308,7 +308,7 @@ static void *alloc_conn_buf(struct ceph_connection *con, int len)
> if (WARN_ON(con->v2.conn_buf_cnt >= ARRAY_SIZE(con->v2.conn_bufs)))
> return NULL;
>
> - buf = ceph_kvmalloc(len, GFP_NOIO);
> + buf = kvmalloc(len, GFP_NOIO);
> if (!buf)
> return NULL;
>
> diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
> index 75b738083523..2823bb3cff55 100644
> --- a/net/ceph/osdmap.c
> +++ b/net/ceph/osdmap.c
> @@ -980,7 +980,7 @@ static struct crush_work *alloc_workspace(const struct crush_map *c)
> work_size = crush_work_size(c, CEPH_PG_MAX_SIZE);
> dout("%s work_size %zu bytes\n", __func__, work_size);
>
> - work = ceph_kvmalloc(work_size, GFP_NOIO);
> + work = kvmalloc(work_size, GFP_NOIO);
> if (!work)
> return NULL;
>
> @@ -1190,9 +1190,9 @@ static int osdmap_set_max_osd(struct ceph_osdmap *map, u32 max)
> if (max == map->max_osd)
> return 0;
>
> - state = ceph_kvmalloc(array_size(max, sizeof(*state)), GFP_NOFS);
> - weight = ceph_kvmalloc(array_size(max, sizeof(*weight)), GFP_NOFS);
> - addr = ceph_kvmalloc(array_size(max, sizeof(*addr)), GFP_NOFS);
> + state = kvmalloc(array_size(max, sizeof(*state)), GFP_NOFS);
> + weight = kvmalloc(array_size(max, sizeof(*weight)), GFP_NOFS);
> + addr = kvmalloc(array_size(max, sizeof(*addr)), GFP_NOFS);
> if (!state || !weight || !addr) {
> kvfree(state);
> kvfree(weight);
> @@ -1222,7 +1222,7 @@ static int osdmap_set_max_osd(struct ceph_osdmap *map, u32 max)
> if (map->osd_primary_affinity) {
> u32 *affinity;
>
> - affinity = ceph_kvmalloc(array_size(max, sizeof(*affinity)),
> + affinity = kvmalloc(array_size(max, sizeof(*affinity)),
> GFP_NOFS);
> if (!affinity)
> return -ENOMEM;
> @@ -1503,7 +1503,7 @@ static int set_primary_affinity(struct ceph_osdmap *map, int osd, u32 aff)
> if (!map->osd_primary_affinity) {
> int i;
>
> - map->osd_primary_affinity = ceph_kvmalloc(
> + map->osd_primary_affinity = kvmalloc(
> array_size(map->max_osd, sizeof(*map->osd_primary_affinity)),
> GFP_NOFS);
> if (!map->osd_primary_affinity)
> --
> 2.30.2
>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>

--
Vlad Rezki

2021-11-23 18:59:02

by Uladzislau Rezki (Sony)

[permalink] [raw]

Subject: Re: [PATCH v2 3/4] mm/vmalloc: be more explicit about supported gfp flags.

On Mon, Nov 22, 2021 at 04:32:32PM +0100, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> b7d90e7a5ea8 ("mm/vmalloc: be more explicit about supported gfp flags")
> has been merged prematurely without the rest of the series and without
> addressed review feedback from Neil. Fix that up now. Only wording is
> changed slightly.
>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> mm/vmalloc.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index b6aed4f94a85..b1c115ec13be 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3021,12 +3021,14 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> *
> * Allocate enough pages to cover @size from the page level
> * allocator with @gfp_mask flags. Please note that the full set of gfp
> - * flags are not supported. GFP_KERNEL would be a preferred allocation mode
> - * but GFP_NOFS and GFP_NOIO are supported as well. Zone modifiers are not
> - * supported. From the reclaim modifiers__GFP_DIRECT_RECLAIM is required (aka
> - * GFP_NOWAIT is not supported) and only __GFP_NOFAIL is supported (aka
> - * __GFP_NORETRY and __GFP_RETRY_MAYFAIL are not supported).
> - * __GFP_NOWARN can be used to suppress error messages about failures.
> + * flags are not supported. GFP_KERNEL, GFP_NOFS and GFP_NOIO are all
> + * supported.
> + * Zone modifiers are not supported. From the reclaim modifiers
> + * __GFP_DIRECT_RECLAIM is required (aka GFP_NOWAIT is not supported)
> + * and only __GFP_NOFAIL is supported (i.e. __GFP_NORETRY and
> + * __GFP_RETRY_MAYFAIL are not supported).
> + *
> + * __GFP_NOWARN can be used to suppress failures messages.
> *
> * Map them into contiguous kernel virtual space, using a pagetable
> * protection of @prot.
> --
> 2.30.2
>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>

--
Vlad Rezki

2021-11-23 19:03:00

by Uladzislau Rezki (Sony)

[permalink] [raw]

Subject: Re: [PATCH v2 4/4] mm: allow !GFP_KERNEL allocations for kvmalloc

2021-11-24 22:55:33

by Dave Chinner

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] extend vmalloc support for constrained allocations

On Mon, Nov 22, 2021 at 04:32:29PM +0100, Michal Hocko wrote:
> Hi,
> The previous version has been posted here [1]
>
> I hope I have addressed all the feedback. There were some suggestions
> for further improvements but I would rather make this smaller as I
> cannot really invest more time and I believe further changes can be done
> on top.
>
> This version is a rebase on top of the current Linus tree. Except for
> the review feedback and conflicting changes in the area there is only
> one change to filter out __GFP_NOFAIL from the bulk allocator. This is
> not necessary strictly speaking AFAICS but I found it less confusing
> because vmalloc has its fallback strategy and the bulk allocator is
> meant only for the fast path.
>
> Original cover:
> Based on a recent discussion with Dave and Neil [2] I have tried to
> implement NOFS, NOIO, NOFAIL support for the vmalloc to make
> life of kvmalloc users easier.
>
> A requirement for NOFAIL support for kvmalloc was new to me but this
> seems to be really needed by the xfs code.
>
> NOFS/NOIO was a known and a long term problem which was hoped to be
> handled by the scope API. Those scope should have been used at the
> reclaim recursion boundaries both to document them and also to remove
> the necessity of NOFS/NOIO constrains for all allocations within that
> scope. Instead workarounds were developed to wrap a single allocation
> instead (like ceph_kvmalloc).
>
> First patch implements NOFS/NOIO support for vmalloc. The second one
> adds NOFAIL support and the third one bundles all together into kvmalloc
> and drops ceph_kvmalloc which can use kvmalloc directly now.
>
> I hope I haven't missed anything in the vmalloc allocator.

Correct __GFP_NOLOCKDEP support is also needed. See:

https://lore.kernel.org/linux-mm/[email protected]/

Cheers,

Dave.
--
Dave Chinner
[email protected]

2021-11-25 09:00:35

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] extend vmalloc support for constrained allocations

On Thu 25-11-21 09:55:26, Dave Chinner wrote:
[...]
> Correct __GFP_NOLOCKDEP support is also needed. See:
>
> https://lore.kernel.org/linux-mm/[email protected]/

I will have a closer look. This will require changes on both vmalloc and
sl?b sides.
--
Michal Hocko
SUSE Labs

2021-11-25 09:32:36

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] extend vmalloc support for constrained allocations

[Cc Sebastian and Vlastimil]

On Thu 25-11-21 09:58:31, Michal Hocko wrote:
> On Thu 25-11-21 09:55:26, Dave Chinner wrote:
> [...]
> > Correct __GFP_NOLOCKDEP support is also needed. See:
> >
> > https://lore.kernel.org/linux-mm/[email protected]/
>
> I will have a closer look. This will require changes on both vmalloc and
> sl?b sides.

This should hopefully make the trick
---
From 0082d29c771d831e5d1b9bb4c0a61d39bac017f0 Mon Sep 17 00:00:00 2001
From: Michal Hocko <[email protected]>
Date: Thu, 25 Nov 2021 10:20:16 +0100
Subject: [PATCH] mm: make slab and vmalloc allocators __GFP_NOLOCKDEP aware

sl?b and vmalloc allocators reduce the given gfp mask for their internal
needs. For that they use GFP_RECLAIM_MASK to preserve the reclaim
behavior and constrains.

__GFP_NOLOCKDEP is not a part of that mask because it doesn't really
control the reclaim behavior strictly speaking. On the other hand
it tells the underlying page allocator to disable reclaim recursion
detection so arguably it should be part of the mask.

Having __GFP_NOLOCKDEP in the mask will not alter the behavior in any
form so this change is safe pretty much by definition. It also adds
a support for this flag to SL?B and vmalloc allocators which will in
turn allow its use to kvmalloc as well. A lack of the support has been
noticed recently in http://lkml.kernel.org/r/[email protected]

Reported-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/internal.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/internal.h b/mm/internal.h
index 3b79a5c9427a..2ceea20b5b2a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -21,7 +21,7 @@
#define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\
__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\
__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
- __GFP_ATOMIC)
+ __GFP_ATOMIC|__GFP_NOLOCKDEP)

/* The GFP flags allowed during early boot */
#define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
--
2.30.2

--
Michal Hocko
SUSE Labs

2021-11-25 21:32:40

by Dave Chinner

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] extend vmalloc support for constrained allocations

On Thu, Nov 25, 2021 at 10:30:28AM +0100, Michal Hocko wrote:
> [Cc Sebastian and Vlastimil]
>
> On Thu 25-11-21 09:58:31, Michal Hocko wrote:
> > On Thu 25-11-21 09:55:26, Dave Chinner wrote:
> > [...]
> > > Correct __GFP_NOLOCKDEP support is also needed. See:
> > >
> > > https://lore.kernel.org/linux-mm/[email protected]/
> >
> > I will have a closer look. This will require changes on both vmalloc and
> > sl?b sides.
>
> This should hopefully make the trick
> ---
> From 0082d29c771d831e5d1b9bb4c0a61d39bac017f0 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <[email protected]>
> Date: Thu, 25 Nov 2021 10:20:16 +0100
> Subject: [PATCH] mm: make slab and vmalloc allocators __GFP_NOLOCKDEP aware
>
> sl?b and vmalloc allocators reduce the given gfp mask for their internal
> needs. For that they use GFP_RECLAIM_MASK to preserve the reclaim
> behavior and constrains.
>
> __GFP_NOLOCKDEP is not a part of that mask because it doesn't really
> control the reclaim behavior strictly speaking. On the other hand
> it tells the underlying page allocator to disable reclaim recursion
> detection so arguably it should be part of the mask.
>
> Having __GFP_NOLOCKDEP in the mask will not alter the behavior in any
> form so this change is safe pretty much by definition. It also adds
> a support for this flag to SL?B and vmalloc allocators which will in
> turn allow its use to kvmalloc as well. A lack of the support has been
> noticed recently in http://lkml.kernel.org/r/[email protected]
>
> Reported-by: Sebastian Andrzej Siewior <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> mm/internal.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 3b79a5c9427a..2ceea20b5b2a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -21,7 +21,7 @@
> #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\
> __GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\
> __GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
> - __GFP_ATOMIC)
> + __GFP_ATOMIC|__GFP_NOLOCKDEP)
>
> /* The GFP flags allowed during early boot */
> #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))

Looks reasonable to me.

Acked-by: Dave Chinner <[email protected]>

Cheers,

Dave.
--
Dave Chinner
[email protected]

2021-11-26 09:22:14

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v2 0/4] extend vmalloc support for constrained allocations

On 11/25/21 10:30, Michal Hocko wrote:
> [Cc Sebastian and Vlastimil]
>
> On Thu 25-11-21 09:58:31, Michal Hocko wrote:
>> On Thu 25-11-21 09:55:26, Dave Chinner wrote:
>> [...]
>> > Correct __GFP_NOLOCKDEP support is also needed. See:
>> >
>> > https://lore.kernel.org/linux-mm/[email protected]/
>>
>> I will have a closer look. This will require changes on both vmalloc and
>> sl?b sides.
>
> This should hopefully make the trick
> ---
> From 0082d29c771d831e5d1b9bb4c0a61d39bac017f0 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <[email protected]>
> Date: Thu, 25 Nov 2021 10:20:16 +0100
> Subject: [PATCH] mm: make slab and vmalloc allocators __GFP_NOLOCKDEP aware
>
> sl?b and vmalloc allocators reduce the given gfp mask for their internal
> needs. For that they use GFP_RECLAIM_MASK to preserve the reclaim
> behavior and constrains.
>
> __GFP_NOLOCKDEP is not a part of that mask because it doesn't really
> control the reclaim behavior strictly speaking. On the other hand
> it tells the underlying page allocator to disable reclaim recursion
> detection so arguably it should be part of the mask.
>
> Having __GFP_NOLOCKDEP in the mask will not alter the behavior in any
> form so this change is safe pretty much by definition. It also adds
> a support for this flag to SL?B and vmalloc allocators which will in
> turn allow its use to kvmalloc as well. A lack of the support has been
> noticed recently in http://lkml.kernel.org/r/[email protected]
>
> Reported-by: Sebastian Andrzej Siewior <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

> ---
> mm/internal.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 3b79a5c9427a..2ceea20b5b2a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -21,7 +21,7 @@
> #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\
> __GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\
> __GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
> - __GFP_ATOMIC)
> + __GFP_ATOMIC|__GFP_NOLOCKDEP)
>
> /* The GFP flags allowed during early boot */
> #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
>

2021-11-26 15:41:39

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v2 3/4] mm/vmalloc: be more explicit about supported gfp flags.

On 11/22/21 16:32, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> b7d90e7a5ea8 ("mm/vmalloc: be more explicit about supported gfp flags")
> has been merged prematurely without the rest of the series and without
> addressed review feedback from Neil. Fix that up now. Only wording is
> changed slightly.
>
> Signed-off-by: Michal Hocko <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

> ---
> mm/vmalloc.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index b6aed4f94a85..b1c115ec13be 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3021,12 +3021,14 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> *
> * Allocate enough pages to cover @size from the page level
> * allocator with @gfp_mask flags. Please note that the full set of gfp
> - * flags are not supported. GFP_KERNEL would be a preferred allocation mode
> - * but GFP_NOFS and GFP_NOIO are supported as well. Zone modifiers are not
> - * supported. From the reclaim modifiers__GFP_DIRECT_RECLAIM is required (aka
> - * GFP_NOWAIT is not supported) and only __GFP_NOFAIL is supported (aka
> - * __GFP_NORETRY and __GFP_RETRY_MAYFAIL are not supported).
> - * __GFP_NOWARN can be used to suppress error messages about failures.
> + * flags are not supported. GFP_KERNEL, GFP_NOFS and GFP_NOIO are all
> + * supported.
> + * Zone modifiers are not supported. From the reclaim modifiers
> + * __GFP_DIRECT_RECLAIM is required (aka GFP_NOWAIT is not supported)
> + * and only __GFP_NOFAIL is supported (i.e. __GFP_NORETRY and
> + * __GFP_RETRY_MAYFAIL are not supported).
> + *
> + * __GFP_NOWARN can be used to suppress failures messages.
> *
> * Map them into contiguous kernel virtual space, using a pagetable
> * protection of @prot.
>

2021-11-26 15:52:23

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v2 4/4] mm: allow !GFP_KERNEL allocations for kvmalloc

On 11/22/21 16:32, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> A support for GFP_NO{FS,IO} and __GFP_NOFAIL has been implemented
> by previous patches so we can allow the support for kvmalloc. This
> will allow some external users to simplify or completely remove
> their helpers.
>
> GFP_NOWAIT semantic hasn't been supported so far but it hasn't been
> explicitly documented so let's add a note about that.
>
> ceph_kvmalloc is the first helper to be dropped and changed to
> kvmalloc.
>
> Signed-off-by: Michal Hocko <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>