2022-09-23 20:36:41

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 00/16] slab: Introduce kmalloc_size_roundup()

Hi,

The main details on this series are in patch #2's commit log. It's long,
so I won't repeat it again here for the v2. As before, I've tried to
trim the CC list.

v2:
- _keep_ ksize(), but remove instrumentation (makes patch series smaller)
- reorganized skbuff logic to avoid yet more copy/paste code
- added a WARN to a separate skbuff ksize usage
- add new refactorings: bpf, openvswitch, devres, mempool, kasan
- dropped "independent" patches: iwlwifi, x86/microcode/AMD (sent separately)
v1: https://lore.kernel.org/lkml/[email protected]

Notes:

Originally when I was going to entirely remove ksize(), there were a
handful for refactorings that just needed to do ksize -> __ksize. In
the end, it was cleaner to actually leave ksize() as a real function,
just without the kasan instrumentation. I wonder, however, if it should
be converted into a static inline now?

I dropped Jakub's Ack because I refactored that code a bunch more.

The 2 patches that didn't need to call kmalloc_size_roundup() don't need
to be part of this series. (One is already in -next, actually.)

I'd like to land at least the first two patches in the coming v6.1 merge
window so that the per-subsystem patches can be sent to their various
subsystems directly. Vlastimil, what you think?

Thanks!

-Kees


Kees Cook (16):
slab: Remove __malloc attribute from realloc functions
slab: Introduce kmalloc_size_roundup()
skbuff: Proactively round up to kmalloc bucket size
skbuff: Phase out ksize() fallback for frag_size
net: ipa: Proactively round up to kmalloc bucket size
igb: Proactively round up to kmalloc bucket size
btrfs: send: Proactively round up to kmalloc bucket size
dma-buf: Proactively round up to kmalloc bucket size
coredump: Proactively round up to kmalloc bucket size
openvswitch: Use kmalloc_size_roundup() to match ksize() usage
bpf: Use kmalloc_size_roundup() to match ksize() usage
devres: Use kmalloc_size_roundup() to match ksize() usage
mempool: Use kmalloc_size_roundup() to match ksize() usage
kasan: Remove ksize()-related tests
mm: Make ksize() a reporting-only function
slab: Restore __alloc_size attribute to __kmalloc_track_caller

drivers/base/devres.c | 3 +
drivers/dma-buf/dma-resv.c | 9 ++-
drivers/net/ethernet/intel/igb/igb_main.c | 5 +-
drivers/net/ipa/gsi_trans.c | 7 +-
fs/btrfs/send.c | 11 +--
fs/coredump.c | 7 +-
include/linux/compiler_types.h | 13 ++--
include/linux/skbuff.h | 5 +-
include/linux/slab.h | 46 +++++++++++--
kernel/bpf/verifier.c | 49 +++++++++-----
lib/test_kasan.c | 42 ------------
mm/kasan/shadow.c | 4 +-
mm/mempool.c | 2 +-
mm/slab.c | 9 ++-
mm/slab_common.c | 62 ++++++++++-------
net/core/skbuff.c | 82 ++++++++++++-----------
net/openvswitch/flow_netlink.c | 2 +-
17 files changed, 192 insertions(+), 166 deletions(-)

--
2.34.1


2022-09-23 20:36:45

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 02/16] slab: Introduce kmalloc_size_roundup()

In the effort to help the compiler reason about buffer sizes, the
__alloc_size attribute was added to allocators. This improves the scope
of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
as the vast majority of callers are not expecting to use more memory
than what they asked for.

There is, however, one common exception to this: anticipatory resizing
of kmalloc allocations. These cases all use ksize() to determine the
actual bucket size of a given allocation (e.g. 128 when 126 was asked
for). This comes in two styles in the kernel:

1) An allocation has been determined to be too small, and needs to be
resized. Instead of the caller choosing its own next best size, it
wants to minimize the number of calls to krealloc(), so it just uses
ksize() plus some additional bytes, forcing the realloc into the next
bucket size, from which it can learn how large it is now. For example:

data = krealloc(data, ksize(data) + 1, gfp);
data_len = ksize(data);

2) The minimum size of an allocation is calculated, but since it may
grow in the future, just use all the space available in the chosen
bucket immediately, to avoid needing to reallocate later. A good
example of this is skbuff's allocators:

data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
...
/* kmalloc(size) might give us more room than requested.
* Put skb_shared_info exactly at the end of allocated zone,
* to allow max possible filling before reallocation.
*/
osize = ksize(data);
size = SKB_WITH_OVERHEAD(osize);

In both cases, the "how much was actually allocated?" question is answered
_after_ the allocation, where the compiler hinting is not in an easy place
to make the association any more. This mismatch between the compiler's
view of the buffer length and the code's intention about how much it is
going to actually use has already caused problems[1]. It is possible to
fix this by reordering the use of the "actual size" information.

We can serve the needs of users of ksize() and still have accurate buffer
length hinting for the compiler by doing the bucket size calculation
_before_ the allocation. Code can instead ask "how large an allocation
would I get for a given size?".

Introduce kmalloc_size_roundup(), to serve this function so we can start
replacing the "anticipatory resizing" uses of ksize().

[1] https://github.com/ClangBuiltLinux/linux/issues/1599
https://github.com/KSPP/linux/issues/183

Cc: Vlastimil Babka <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
include/linux/slab.h | 31 +++++++++++++++++++++++++++++++
mm/slab.c | 9 ++++++---
mm/slab_common.c | 20 ++++++++++++++++++++
3 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 41bd036e7551..727640173568 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __r
void kfree(const void *objp);
void kfree_sensitive(const void *objp);
size_t __ksize(const void *objp);
+
+/**
+ * ksize - Report actual allocation size of associated object
+ *
+ * @objp: Pointer returned from a prior kmalloc()-family allocation.
+ *
+ * This should not be used for writing beyond the originally requested
+ * allocation size. Either use krealloc() or round up the allocation size
+ * with kmalloc_size_roundup() prior to allocation. If this is used to
+ * access beyond the originally requested allocation size, UBSAN_BOUNDS
+ * and/or FORTIFY_SOURCE may trip, since they only know about the
+ * originally allocated size via the __alloc_size attribute.
+ */
size_t ksize(const void *objp);
+
#ifdef CONFIG_PRINTK
bool kmem_valid_obj(void *object);
void kmem_dump_obj(void *object);
@@ -779,6 +793,23 @@ extern void kvfree(const void *addr);
extern void kvfree_sensitive(const void *addr, size_t len);

unsigned int kmem_cache_size(struct kmem_cache *s);
+
+/**
+ * kmalloc_size_roundup - Report allocation bucket size for the given size
+ *
+ * @size: Number of bytes to round up from.
+ *
+ * This returns the number of bytes that would be available in a kmalloc()
+ * allocation of @size bytes. For example, a 126 byte request would be
+ * rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly
+ * for the general-purpose kmalloc()-based allocations, and is not for the
+ * pre-sized kmem_cache_alloc()-based allocations.)
+ *
+ * Use this to kmalloc() the full bucket size ahead of time instead of using
+ * ksize() to query the size after an allocation.
+ */
+size_t kmalloc_size_roundup(size_t size);
+
void __init kmem_cache_init_late(void);

#if defined(CONFIG_SMP) && defined(CONFIG_SLAB)
diff --git a/mm/slab.c b/mm/slab.c
index 10e96137b44f..2da862bf6226 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4192,11 +4192,14 @@ void __check_heap_object(const void *ptr, unsigned long n,
#endif /* CONFIG_HARDENED_USERCOPY */

/**
- * __ksize -- Uninstrumented ksize.
+ * __ksize -- Report full size of underlying allocation
* @objp: pointer to the object
*
- * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
- * safety checks as ksize() with KASAN instrumentation enabled.
+ * This should only be used internally to query the true size of allocations.
+ * It is not meant to be a way to discover the usable size of an allocation
+ * after the fact. Instead, use kmalloc_size_roundup(). Using memory beyond
+ * the originally requested allocation size may trigger KASAN, UBSAN_BOUNDS,
+ * and/or FORTIFY_SOURCE.
*
* Return: size of the actual memory used by @objp in bytes
*/
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 457671ace7eb..d7420cf649f8 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -721,6 +721,26 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
return kmalloc_caches[kmalloc_type(flags)][index];
}

+size_t kmalloc_size_roundup(size_t size)
+{
+ struct kmem_cache *c;
+
+ /* Short-circuit the 0 size case. */
+ if (unlikely(size == 0))
+ return 0;
+ /* Short-circuit saturated "too-large" case. */
+ if (unlikely(size == SIZE_MAX))
+ return SIZE_MAX;
+ /* Above the smaller buckets, size is a multiple of page size. */
+ if (size > KMALLOC_MAX_CACHE_SIZE)
+ return PAGE_SIZE << get_order(size);
+
+ /* The flags don't matter since size_index is common to all. */
+ c = kmalloc_slab(size, GFP_KERNEL);
+ return c ? c->object_size : 0;
+}
+EXPORT_SYMBOL(kmalloc_size_roundup);
+
#ifdef CONFIG_ZONE_DMA
#define KMALLOC_DMA_NAME(sz) .name[KMALLOC_DMA] = "dma-kmalloc-" #sz,
#else
--
2.34.1

2022-09-23 20:36:47

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 04/16] skbuff: Phase out ksize() fallback for frag_size

All callers of APIs that allowed a 0-sized frag_size appear to be
passing actual size information already, so this use of ksize() can
be removed. However, just in case there is something still depending
on this behavior, issue a WARN and fall back to as before to ksize()
which means we'll also potentially get KASAN warnings.

Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
net/core/skbuff.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0b30fbdbd0d0..84ca89c781cd 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -195,7 +195,11 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
unsigned int frag_size)
{
struct skb_shared_info *shinfo;
- unsigned int size = frag_size ? : ksize(data);
+ unsigned int size = frag_size;
+
+ /* All callers should be setting frag size now? */
+ if (WARN_ON_ONCE(size == 0))
+ size = ksize(data);

size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));

@@ -220,12 +224,10 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
/**
* __build_skb - build a network buffer
* @data: data buffer provided by caller
- * @frag_size: size of data, or 0 if head was kmalloced
+ * @frag_size: size of data
*
* Allocate a new &sk_buff. Caller provides space holding head and
- * skb_shared_info. @data must have been allocated by kmalloc() only if
- * @frag_size is 0, otherwise data should come from the page allocator
- * or vmalloc()
+ * skb_shared_info.
* The return is the new skb buffer.
* On a failure the return is %NULL, and @data is not freed.
* Notes :
@@ -272,7 +274,7 @@ EXPORT_SYMBOL(build_skb);
* build_skb_around - build a network buffer around provided skb
* @skb: sk_buff provide by caller, must be memset cleared
* @data: data buffer provided by caller
- * @frag_size: size of data, or 0 if head was kmalloced
+ * @frag_size: size of data
*/
struct sk_buff *build_skb_around(struct sk_buff *skb,
void *data, unsigned int frag_size)
@@ -294,7 +296,7 @@ EXPORT_SYMBOL(build_skb_around);
/**
* __napi_build_skb - build a network buffer
* @data: data buffer provided by caller
- * @frag_size: size of data, or 0 if head was kmalloced
+ * @frag_size: size of data
*
* Version of __build_skb() that uses NAPI percpu caches to obtain
* skbuff_head instead of inplace allocation.
@@ -318,7 +320,7 @@ static struct sk_buff *__napi_build_skb(void *data, unsigned int frag_size)
/**
* napi_build_skb - build a network buffer
* @data: data buffer provided by caller
- * @frag_size: size of data, or 0 if head was kmalloced
+ * @frag_size: size of data
*
* Version of __napi_build_skb() that takes care of skb->head_frag
* and skb->pfmemalloc when the data is a page or page fragment.
--
2.34.1

2022-09-23 20:36:57

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 07/16] btrfs: send: Proactively round up to kmalloc bucket size

Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: [email protected]
Acked-by: David Sterba <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Kees Cook <[email protected]>
---
fs/btrfs/send.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index e7671afcee4f..d40d65598e8f 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -435,6 +435,11 @@ static int fs_path_ensure_buf(struct fs_path *p, int len)
path_len = p->end - p->start;
old_buf_len = p->buf_len;

+ /*
+ * Allocate to the next largest kmalloc bucket size, to let
+ * the fast path happen most of the time.
+ */
+ len = kmalloc_size_roundup(len);
/*
* First time the inline_buf does not suffice
*/
@@ -448,11 +453,7 @@ static int fs_path_ensure_buf(struct fs_path *p, int len)
if (!tmp_buf)
return -ENOMEM;
p->buf = tmp_buf;
- /*
- * The real size of the buffer is bigger, this will let the fast path
- * happen most of the time
- */
- p->buf_len = ksize(p->buf);
+ p->buf_len = len;

if (p->reversed) {
tmp_buf = p->buf + old_buf_len - path_len - 1;
--
2.34.1

2022-09-23 20:37:29

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 08/16] dma-buf: Proactively round up to kmalloc bucket size

Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

Cc: Sumit Semwal <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
drivers/dma-buf/dma-resv.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 205acb2c744d..5b0a4b8830ff 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -98,12 +98,17 @@ static void dma_resv_list_set(struct dma_resv_list *list,
static struct dma_resv_list *dma_resv_list_alloc(unsigned int max_fences)
{
struct dma_resv_list *list;
+ size_t size;

- list = kmalloc(struct_size(list, table, max_fences), GFP_KERNEL);
+ /* Round up to the next kmalloc bucket size. */
+ size = kmalloc_size_roundup(struct_size(list, table, max_fences));
+
+ list = kmalloc(size, GFP_KERNEL);
if (!list)
return NULL;

- list->max_fences = (ksize(list) - offsetof(typeof(*list), table)) /
+ /* Given the resulting bucket size, recalculated max_fences. */
+ list->max_fences = (size - offsetof(typeof(*list), table)) /
sizeof(*list->table);

return list;
--
2.34.1

2022-09-23 20:37:50

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 11/16] bpf: Use kmalloc_size_roundup() to match ksize() usage

Round up allocations with kmalloc_size_roundup() so that the verifier's
use of ksize() is always accurate and no special handling of the memory
is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE. Pass the new size
information back up to callers so they can use the space immediately,
so array resizing to happen less frequently as well. Explicitly zero
any trailing bytes in new allocations.

Additionally fix a memory allocation leak: if krealloc() fails, "arr"
wasn't freed, but NULL was return to the caller of realloc_array() would
be writing NULL to the lvalue, losing the reference to the original
memory.

Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Stanislav Fomichev <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
kernel/bpf/verifier.c | 49 +++++++++++++++++++++++++++----------------
1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 096fdac70165..80531f8f0d36 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -978,42 +978,53 @@ static void print_insn_state(struct bpf_verifier_env *env,
*/
static void *copy_array(void *dst, const void *src, size_t n, size_t size, gfp_t flags)
{
- size_t bytes;
+ size_t src_bytes, dst_bytes;

if (ZERO_OR_NULL_PTR(src))
goto out;

- if (unlikely(check_mul_overflow(n, size, &bytes)))
+ if (unlikely(check_mul_overflow(n, size, &src_bytes)))
return NULL;

- if (ksize(dst) < bytes) {
+ dst_bytes = kmalloc_size_roundup(src_bytes);
+ if (ksize(dst) < dst_bytes) {
kfree(dst);
- dst = kmalloc_track_caller(bytes, flags);
+ dst = kmalloc_track_caller(dst_bytes, flags);
if (!dst)
return NULL;
}

- memcpy(dst, src, bytes);
+ memcpy(dst, src, src_bytes);
+ memset(dst + src_bytes, 0, dst_bytes - src_bytes);
out:
return dst ? dst : ZERO_SIZE_PTR;
}

-/* resize an array from old_n items to new_n items. the array is reallocated if it's too
- * small to hold new_n items. new items are zeroed out if the array grows.
+/* Resize an array from old_n items to *new_n items. The array is reallocated if it's too
+ * small to hold *new_n items. New items are zeroed out if the array grows. Allocation
+ * is rounded up to next kmalloc bucket size to reduce frequency of resizing. *new_n
+ * contains the new total number of items that will fit.
*
- * Contrary to krealloc_array, does not free arr if new_n is zero.
+ * Contrary to krealloc, does not free arr if new_n is zero.
*/
-static void *realloc_array(void *arr, size_t old_n, size_t new_n, size_t size)
+static void *realloc_array(void *arr, size_t old_n, size_t *new_n, size_t size)
{
- if (!new_n || old_n == new_n)
+ void *old_arr = arr;
+ size_t alloc_size;
+
+ if (!new_n || !*new_n || old_n == *new_n)
goto out;

- arr = krealloc_array(arr, new_n, size, GFP_KERNEL);
- if (!arr)
+ alloc_size = kmalloc_size_roundup(size_mul(*new_n, size));
+ arr = krealloc(old_arr, alloc_size, GFP_KERNEL);
+ if (!arr) {
+ kfree(old_arr);
return NULL;
+ }

- if (new_n > old_n)
- memset(arr + old_n * size, 0, (new_n - old_n) * size);
+ *new_n = alloc_size / size;
+ if (*new_n > old_n)
+ memset(arr + old_n * size, 0, (*new_n - old_n) * size);

out:
return arr ? arr : ZERO_SIZE_PTR;
@@ -1045,7 +1056,7 @@ static int copy_stack_state(struct bpf_func_state *dst, const struct bpf_func_st

static int resize_reference_state(struct bpf_func_state *state, size_t n)
{
- state->refs = realloc_array(state->refs, state->acquired_refs, n,
+ state->refs = realloc_array(state->refs, state->acquired_refs, &n,
sizeof(struct bpf_reference_state));
if (!state->refs)
return -ENOMEM;
@@ -1061,11 +1072,11 @@ static int grow_stack_state(struct bpf_func_state *state, int size)
if (old_n >= n)
return 0;

- state->stack = realloc_array(state->stack, old_n, n, sizeof(struct bpf_stack_state));
+ state->stack = realloc_array(state->stack, old_n, &n, sizeof(struct bpf_stack_state));
if (!state->stack)
return -ENOMEM;

- state->allocated_stack = size;
+ state->allocated_stack = n * BPF_REG_SIZE;
return 0;
}

@@ -2472,9 +2483,11 @@ static int push_jmp_history(struct bpf_verifier_env *env,
{
u32 cnt = cur->jmp_history_cnt;
struct bpf_idx_pair *p;
+ size_t size;

cnt++;
- p = krealloc(cur->jmp_history, cnt * sizeof(*p), GFP_USER);
+ size = kmalloc_size_roundup(size_mul(cnt, sizeof(*p)));
+ p = krealloc(cur->jmp_history, size, GFP_USER);
if (!p)
return -ENOMEM;
p[cnt - 1].idx = env->insn_idx;
--
2.34.1

2022-09-23 20:38:10

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 10/16] openvswitch: Use kmalloc_size_roundup() to match ksize() usage

Round up allocations with kmalloc_size_roundup() so that openvswitch's
use of ksize() is always accurate and no special handling of the memory
is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.

Cc: Pravin B Shelar <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
net/openvswitch/flow_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 4c09cf8a0ab2..6621873abde2 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2309,7 +2309,7 @@ static struct sw_flow_actions *nla_alloc_flow_actions(int size)

WARN_ON_ONCE(size > MAX_ACTIONS_BUFSIZE);

- sfa = kmalloc(sizeof(*sfa) + size, GFP_KERNEL);
+ sfa = kmalloc(kmalloc_size_roundup(sizeof(*sfa) + size), GFP_KERNEL);
if (!sfa)
return ERR_PTR(-ENOMEM);

--
2.34.1

2022-09-23 20:54:24

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 12/16] devres: Use kmalloc_size_roundup() to match ksize() usage

Round up allocations with kmalloc_size_roundup() so that devres's use
of ksize() is always accurate and no special handling of the memory is
needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.

Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
---
drivers/base/devres.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index 864d0b3f566e..7db20ce7ea8a 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -101,6 +101,9 @@ static bool check_dr_size(size_t size, size_t *tot_size)
size, tot_size)))
return false;

+ /* Actually allocate the full kmalloc bucket size. */
+ *tot_size = kmalloc_size_roundup(*tot_size);
+
return true;
}

--
2.34.1

2022-09-23 20:54:38

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 09/16] coredump: Proactively round up to kmalloc bucket size

Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
fs/coredump.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 9f4aae202109..0894b2c35d98 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -68,7 +68,10 @@ struct core_name {

static int expand_corename(struct core_name *cn, int size)
{
- char *corename = krealloc(cn->corename, size, GFP_KERNEL);
+ char *corename;
+
+ size = kmalloc_size_roundup(size);
+ corename = krealloc(cn->corename, size, GFP_KERNEL);

if (!corename)
return -ENOMEM;
@@ -76,7 +79,7 @@ static int expand_corename(struct core_name *cn, int size)
if (size > core_name_size) /* racy but harmless */
core_name_size = size;

- cn->size = ksize(corename);
+ cn->size = size;
cn->corename = corename;
return 0;
}
--
2.34.1

2022-09-23 20:54:38

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 14/16] kasan: Remove ksize()-related tests

In preparation for no longer unpoisoning in ksize(), remove the behavioral
self-tests for ksize().

Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
lib/test_kasan.c | 42 ------------------------------------------
mm/kasan/shadow.c | 4 +---
2 files changed, 1 insertion(+), 45 deletions(-)

diff --git a/lib/test_kasan.c b/lib/test_kasan.c
index 58c1b01ccfe2..bdd0ced8f8d7 100644
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -753,46 +753,6 @@ static void kasan_global_oob_left(struct kunit *test)
KUNIT_EXPECT_KASAN_FAIL(test, *(volatile char *)p);
}

-/* Check that ksize() makes the whole object accessible. */
-static void ksize_unpoisons_memory(struct kunit *test)
-{
- char *ptr;
- size_t size = 123, real_size;
-
- ptr = kmalloc(size, GFP_KERNEL);
- KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
- real_size = ksize(ptr);
-
- OPTIMIZER_HIDE_VAR(ptr);
-
- /* This access shouldn't trigger a KASAN report. */
- ptr[size] = 'x';
-
- /* This one must. */
- KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[real_size]);
-
- kfree(ptr);
-}
-
-/*
- * Check that a use-after-free is detected by ksize() and via normal accesses
- * after it.
- */
-static void ksize_uaf(struct kunit *test)
-{
- char *ptr;
- int size = 128 - KASAN_GRANULE_SIZE;
-
- ptr = kmalloc(size, GFP_KERNEL);
- KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
- kfree(ptr);
-
- OPTIMIZER_HIDE_VAR(ptr);
- KUNIT_EXPECT_KASAN_FAIL(test, ksize(ptr));
- KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[0]);
- KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[size]);
-}
-
static void kasan_stack_oob(struct kunit *test)
{
char stack_array[10];
@@ -1392,8 +1352,6 @@ static struct kunit_case kasan_kunit_test_cases[] = {
KUNIT_CASE(kasan_stack_oob),
KUNIT_CASE(kasan_alloca_oob_left),
KUNIT_CASE(kasan_alloca_oob_right),
- KUNIT_CASE(ksize_unpoisons_memory),
- KUNIT_CASE(ksize_uaf),
KUNIT_CASE(kmem_cache_double_free),
KUNIT_CASE(kmem_cache_invalid_free),
KUNIT_CASE(kmem_cache_double_destroy),
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 0e3648b603a6..0895c73e9b69 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -124,9 +124,7 @@ void kasan_unpoison(const void *addr, size_t size, bool init)
addr = kasan_reset_tag(addr);

/*
- * Skip KFENCE memory if called explicitly outside of sl*b. Also note
- * that calls to ksize(), where size is not a multiple of machine-word
- * size, would otherwise poison the invalid portion of the word.
+ * Skip KFENCE memory if called explicitly outside of sl*b.
*/
if (is_kfence_address(addr))
return;
--
2.34.1

2022-09-23 20:57:08

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 03/16] skbuff: Proactively round up to kmalloc bucket size

Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

This will allow for kernels built with CONFIG_UBSAN_BOUNDS or the
coming dynamic bounds checking under CONFIG_FORTIFY_SOURCE to gain
back the __alloc_size() hints that were temporarily reverted in commit
93dd04ab0b2b ("slab: remove __alloc_size attribute from __kmalloc_track_caller")

Additionally tries to normalize size variables to u32 from int. Most
interfaces are using "int", but notably __alloc_skb uses unsigned int.

Also fix some reverse Christmas tree and comments while touching nearby
code.

Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: [email protected]
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
---
include/linux/skbuff.h | 5 +---
net/core/skbuff.c | 64 +++++++++++++++++++++---------------------
2 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ca8afa382bf2..5a16177f38b5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1234,7 +1234,7 @@ void kfree_skb_partial(struct sk_buff *skb, bool head_stolen);
bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
bool *fragstolen, int *delta_truesize);

-struct sk_buff *__alloc_skb(unsigned int size, gfp_t priority, int flags,
+struct sk_buff *__alloc_skb(unsigned int bytes, gfp_t priority, int flags,
int node);
struct sk_buff *__build_skb(void *data, unsigned int frag_size);
struct sk_buff *build_skb(void *data, unsigned int frag_size);
@@ -1870,9 +1870,6 @@ static inline int skb_unclone(struct sk_buff *skb, gfp_t pri)

/* This variant of skb_unclone() makes sure skb->truesize
* and skb_end_offset() are not changed, whenever a new skb->head is needed.
- *
- * Indeed there is no guarantee that ksize(kmalloc(X)) == ksize(kmalloc(X))
- * when various debugging features are in place.
*/
int __skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri);
static inline int skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 974bbbbe7138..0b30fbdbd0d0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -343,19 +343,23 @@ EXPORT_SYMBOL(napi_build_skb);
* the caller if emergency pfmemalloc reserves are being used. If it is and
* the socket is later found to be SOCK_MEMALLOC then PFMEMALLOC reserves
* may be used. Otherwise, the packet data may be discarded until enough
- * memory is free
+ * memory is free.
*/
-static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
+static void *kmalloc_reserve(u32 *size, gfp_t flags, int node,
bool *pfmemalloc)
{
void *obj;
bool ret_pfmemalloc = false;

+ /* kmalloc(size) might give us more room than requested, so
+ * allocate the true bucket size up front.
+ */
+ *size = kmalloc_size_roundup(*size);
/*
* Try a regular allocation, when that fails and we're not entitled
* to the reserves, fail.
*/
- obj = kmalloc_node_track_caller(size,
+ obj = kmalloc_node_track_caller(*size,
flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
node);
if (obj || !(gfp_pfmemalloc_allowed(flags)))
@@ -363,7 +367,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,

/* Try again but now we are using pfmemalloc reserves */
ret_pfmemalloc = true;
- obj = kmalloc_node_track_caller(size, flags, node);
+ obj = kmalloc_node_track_caller(*size, flags, node);

out:
if (pfmemalloc)
@@ -380,7 +384,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,

/**
* __alloc_skb - allocate a network buffer
- * @size: size to allocate
+ * @bytes: minimum bytes to allocate
* @gfp_mask: allocation mask
* @flags: If SKB_ALLOC_FCLONE is set, allocate from fclone cache
* instead of head cache and allocate a cloned (child) skb.
@@ -395,12 +399,12 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
* Buffers may only be allocated from interrupts using a @gfp_mask of
* %GFP_ATOMIC.
*/
-struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
+struct sk_buff *__alloc_skb(unsigned int bytes, gfp_t gfp_mask,
int flags, int node)
{
struct kmem_cache *cache;
struct sk_buff *skb;
- unsigned int osize;
+ u32 size = bytes;
bool pfmemalloc;
u8 *data;

@@ -427,15 +431,13 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
*/
size = SKB_DATA_ALIGN(size);
size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
- data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
- if (unlikely(!data))
- goto nodata;
- /* kmalloc(size) might give us more room than requested.
- * Put skb_shared_info exactly at the end of allocated zone,
+ /* Put skb_shared_info exactly at the end of allocated zone,
* to allow max possible filling before reallocation.
*/
- osize = ksize(data);
- size = SKB_WITH_OVERHEAD(osize);
+ data = kmalloc_reserve(&size, gfp_mask, node, &pfmemalloc);
+ if (unlikely(!data))
+ goto nodata;
+ size = SKB_WITH_OVERHEAD(size);
prefetchw(data + size);

/*
@@ -444,7 +446,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
* the tail pointer in struct sk_buff!
*/
memset(skb, 0, offsetof(struct sk_buff, tail));
- __build_skb_around(skb, data, osize);
+ __build_skb_around(skb, data, size);
skb->pfmemalloc = pfmemalloc;

if (flags & SKB_ALLOC_FCLONE) {
@@ -1708,7 +1710,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
gfp_t gfp_mask)
{
int i, osize = skb_end_offset(skb);
- int size = osize + nhead + ntail;
+ u32 size = osize + nhead + ntail;
long off;
u8 *data;

@@ -1722,11 +1724,11 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,

if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
- data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
- gfp_mask, NUMA_NO_NODE, NULL);
+ size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
if (!data)
goto nodata;
- size = SKB_WITH_OVERHEAD(ksize(data));
+ size = SKB_WITH_OVERHEAD(size);

/* Copy only real data... and, alas, header. This should be
* optimized for the cases when header is void.
@@ -6060,22 +6062,21 @@ EXPORT_SYMBOL(alloc_skb_with_frags);
static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
const int headlen, gfp_t gfp_mask)
{
- int i;
- int size = skb_end_offset(skb);
+ u32 size = skb_end_offset(skb);
int new_hlen = headlen - off;
u8 *data;
+ int i;

size = SKB_DATA_ALIGN(size);

if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
- data = kmalloc_reserve(size +
- SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
- gfp_mask, NUMA_NO_NODE, NULL);
+ size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
if (!data)
return -ENOMEM;

- size = SKB_WITH_OVERHEAD(ksize(data));
+ size = SKB_WITH_OVERHEAD(size);

/* Copy real data, and all frags */
skb_copy_from_linear_data_offset(skb, off, data, new_hlen);
@@ -6179,23 +6180,22 @@ static int pskb_carve_frag_list(struct sk_buff *skb,
static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
int pos, gfp_t gfp_mask)
{
- int i, k = 0;
- int size = skb_end_offset(skb);
- u8 *data;
const int nfrags = skb_shinfo(skb)->nr_frags;
struct skb_shared_info *shinfo;
+ u32 size = skb_end_offset(skb);
+ int i, k = 0;
+ u8 *data;

size = SKB_DATA_ALIGN(size);

if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
- data = kmalloc_reserve(size +
- SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
- gfp_mask, NUMA_NO_NODE, NULL);
+ size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
if (!data)
return -ENOMEM;

- size = SKB_WITH_OVERHEAD(ksize(data));
+ size = SKB_WITH_OVERHEAD(size);

memcpy((struct skb_shared_info *)(data + size),
skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
--
2.34.1

2022-09-23 20:57:29

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 06/16] igb: Proactively round up to kmalloc bucket size

In preparation for removing the "silently change allocation size"
users of ksize(), explicitly round up all q_vector allocations so that
allocations can be correctly compared to ksize().

Additionally fix potential use-after-free in the case of new allocation
failure: only free memory if the replacement allocation succeeds.

Cc: Jesse Brandeburg <[email protected]>
Cc: Tony Nguyen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
drivers/net/ethernet/intel/igb/igb_main.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 2796e81d2726..eb51e531c096 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1195,15 +1195,16 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
return -ENOMEM;

ring_count = txr_count + rxr_count;
- size = struct_size(q_vector, ring, ring_count);
+ size = kmalloc_size_roundup(struct_size(q_vector, ring, ring_count));

/* allocate q_vector and rings */
q_vector = adapter->q_vector[v_idx];
if (!q_vector) {
q_vector = kzalloc(size, GFP_KERNEL);
} else if (size > ksize(q_vector)) {
- kfree_rcu(q_vector, rcu);
q_vector = kzalloc(size, GFP_KERNEL);
+ if (q_vector)
+ kfree_rcu(q_vector, rcu);
} else {
memset(q_vector, 0, size);
}
--
2.34.1

2022-09-23 21:21:12

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 05/16] net: ipa: Proactively round up to kmalloc bucket size

Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: [email protected]
Reviewed-by: Alex Elder <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Kees Cook <[email protected]>
---
drivers/net/ipa/gsi_trans.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ipa/gsi_trans.c b/drivers/net/ipa/gsi_trans.c
index 18e7e8c405be..eeec149b5d89 100644
--- a/drivers/net/ipa/gsi_trans.c
+++ b/drivers/net/ipa/gsi_trans.c
@@ -88,6 +88,7 @@ struct gsi_tre {
int gsi_trans_pool_init(struct gsi_trans_pool *pool, size_t size, u32 count,
u32 max_alloc)
{
+ size_t alloc_size;
void *virt;

if (!size)
@@ -104,13 +105,15 @@ int gsi_trans_pool_init(struct gsi_trans_pool *pool, size_t size, u32 count,
* If there aren't enough entries starting at the free index,
* we just allocate free entries from the beginning of the pool.
*/
- virt = kcalloc(count + max_alloc - 1, size, GFP_KERNEL);
+ alloc_size = size_mul(count + max_alloc - 1, size);
+ alloc_size = kmalloc_size_roundup(alloc_size);
+ virt = kzalloc(alloc_size, GFP_KERNEL);
if (!virt)
return -ENOMEM;

pool->base = virt;
/* If the allocator gave us any extra memory, use it */
- pool->count = ksize(pool->base) / size;
+ pool->count = alloc_size / size;
pool->free = 0;
pool->max_alloc = max_alloc;
pool->size = size;
--
2.34.1

2022-09-23 21:27:06

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 15/16] mm: Make ksize() a reporting-only function

With all "silently resizing" callers of ksize() refactored, remove the
logic in ksize() that would allow it to be used to effectively change
the size of an allocation (bypassing __alloc_size hints, etc). Users
wanting this feature need to either use kmalloc_size_roundup() before an
allocation, or call krealloc() directly.

For kfree_sensitive(), move the unpoisoning logic inline. Replace the
open-coded ksize() in __do_krealloc with ksize() now that it doesn't
perform unpoisoning.

Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
mm/slab_common.c | 38 ++++++++++++++------------------------
1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index d7420cf649f8..60b77bcdc2e3 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1160,13 +1160,8 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags)
void *ret;
size_t ks;

- /* Don't use instrumented ksize to allow precise KASAN poisoning. */
- if (likely(!ZERO_OR_NULL_PTR(p))) {
- if (!kasan_check_byte(p))
- return NULL;
- ks = kfence_ksize(p) ?: __ksize(p);
- } else
- ks = 0;
+ /* How large is the allocation actually? */
+ ks = ksize(p);

/* If the object still fits, repoison it precisely. */
if (ks >= new_size) {
@@ -1232,8 +1227,10 @@ void kfree_sensitive(const void *p)
void *mem = (void *)p;

ks = ksize(mem);
- if (ks)
+ if (ks) {
+ kasan_unpoison_range(mem, ks);
memzero_explicit(mem, ks);
+ }
kfree(mem);
}
EXPORT_SYMBOL(kfree_sensitive);
@@ -1242,10 +1239,11 @@ EXPORT_SYMBOL(kfree_sensitive);
* ksize - get the actual amount of memory allocated for a given object
* @objp: Pointer to the object
*
- * kmalloc may internally round up allocations and return more memory
+ * kmalloc() may internally round up allocations and return more memory
* than requested. ksize() can be used to determine the actual amount of
- * memory allocated. The caller may use this additional memory, even though
- * a smaller amount of memory was initially specified with the kmalloc call.
+ * allocated memory. The caller may NOT use this additional memory, unless
+ * it calls krealloc(). To avoid an alloc/realloc cycle, callers can use
+ * kmalloc_size_roundup() to find the size of the associated kmalloc bucket.
* The caller must guarantee that objp points to a valid object previously
* allocated with either kmalloc() or kmem_cache_alloc(). The object
* must not be freed during the duration of the call.
@@ -1254,13 +1252,11 @@ EXPORT_SYMBOL(kfree_sensitive);
*/
size_t ksize(const void *objp)
{
- size_t size;
-
/*
- * We need to first check that the pointer to the object is valid, and
- * only then unpoison the memory. The report printed from ksize() is
- * more useful, then when it's printed later when the behaviour could
- * be undefined due to a potential use-after-free or double-free.
+ * We need to first check that the pointer to the object is valid.
+ * The KASAN report printed from ksize() is more useful, then when
+ * it's printed later when the behaviour could be undefined due to
+ * a potential use-after-free or double-free.
*
* We use kasan_check_byte(), which is supported for the hardware
* tag-based KASAN mode, unlike kasan_check_read/write().
@@ -1274,13 +1270,7 @@ size_t ksize(const void *objp)
if (unlikely(ZERO_OR_NULL_PTR(objp)) || !kasan_check_byte(objp))
return 0;

- size = kfence_ksize(objp) ?: __ksize(objp);
- /*
- * We assume that ksize callers could use whole allocated area,
- * so we need to unpoison this area.
- */
- kasan_unpoison_range(objp, size);
- return size;
+ return kfence_ksize(objp) ?: __ksize(objp);
}
EXPORT_SYMBOL(ksize);

--
2.34.1

2022-09-23 21:30:28

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 16/16] slab: Restore __alloc_size attribute to __kmalloc_track_caller

With skbuff's post-allocation use of ksize() rearranged to use
kmalloc_size_round() prior to allocation, the compiler can correctly
reason about the size of these allocations. The prior mismatch had caused
buffer overflow mitigations to erroneously fire under CONFIG_UBSAN_BOUNDS,
requiring a partial revert of the __alloc_size attributes. Restore the
attribute that had been removed in commit 93dd04ab0b2b ("slab: remove
__alloc_size attribute from __kmalloc_track_caller").

Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
include/linux/slab.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 727640173568..297b85ed2c29 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -693,7 +693,8 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag
* allocator where we care about the real place the memory allocation
* request comes from.
*/
-extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller);
+extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller)
+ __alloc_size(1);
#define kmalloc_track_caller(size, flags) \
__kmalloc_track_caller(size, flags, _RET_IP_)

--
2.34.1

2022-09-23 21:31:56

by Kees Cook

[permalink] [raw]
Subject: [PATCH v2 13/16] mempool: Use kmalloc_size_roundup() to match ksize() usage

Round up allocations with kmalloc_size_roundup() so that mempool's use
of ksize() is always accurate and no special handling of the memory is
needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.

Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
mm/mempool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mempool.c b/mm/mempool.c
index 96488b13a1ef..0f3107b28e6b 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -526,7 +526,7 @@ EXPORT_SYMBOL(mempool_free_slab);
*/
void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data)
{
- size_t size = (size_t)pool_data;
+ size_t size = kmalloc_size_roundup((size_t)pool_data);
return kmalloc(size, gfp_mask);
}
EXPORT_SYMBOL(mempool_kmalloc);
--
2.34.1

2022-09-24 08:53:50

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH v2 14/16] kasan: Remove ksize()-related tests

On Fri, 23 Sept 2022 at 22:28, Kees Cook <[email protected]> wrote:
>
> In preparation for no longer unpoisoning in ksize(), remove the behavioral
> self-tests for ksize().
>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Alexander Potapenko <[email protected]>
> Cc: Andrey Konovalov <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Vincenzo Frascino <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Kees Cook <[email protected]>
> ---
> lib/test_kasan.c | 42 ------------------------------------------
> mm/kasan/shadow.c | 4 +---
> 2 files changed, 1 insertion(+), 45 deletions(-)
>
> diff --git a/lib/test_kasan.c b/lib/test_kasan.c
> index 58c1b01ccfe2..bdd0ced8f8d7 100644
> --- a/lib/test_kasan.c
> +++ b/lib/test_kasan.c
> @@ -753,46 +753,6 @@ static void kasan_global_oob_left(struct kunit *test)
> KUNIT_EXPECT_KASAN_FAIL(test, *(volatile char *)p);
> }
>
> -/* Check that ksize() makes the whole object accessible. */
> -static void ksize_unpoisons_memory(struct kunit *test)
> -{
> - char *ptr;
> - size_t size = 123, real_size;
> -
> - ptr = kmalloc(size, GFP_KERNEL);
> - KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
> - real_size = ksize(ptr);
> -
> - OPTIMIZER_HIDE_VAR(ptr);
> -
> - /* This access shouldn't trigger a KASAN report. */
> - ptr[size] = 'x';

I would rather keep the tests and update to the new behavior. We had
bugs in ksize, we need test coverage.
I assume ptr[size] access must now produce an error even after ksize.


> - /* This one must. */
> - KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[real_size]);
> -
> - kfree(ptr);
> -}
> -
> -/*
> - * Check that a use-after-free is detected by ksize() and via normal accesses
> - * after it.
> - */
> -static void ksize_uaf(struct kunit *test)
> -{
> - char *ptr;
> - int size = 128 - KASAN_GRANULE_SIZE;
> -
> - ptr = kmalloc(size, GFP_KERNEL);
> - KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
> - kfree(ptr);
> -
> - OPTIMIZER_HIDE_VAR(ptr);
> - KUNIT_EXPECT_KASAN_FAIL(test, ksize(ptr));

This is still a bug that should be detected, right? Calling ksize on a
freed pointer is a bug.

> - KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[0]);
> - KUNIT_EXPECT_KASAN_FAIL(test, ((volatile char *)ptr)[size]);
> -}
> -
> static void kasan_stack_oob(struct kunit *test)
> {
> char stack_array[10];
> @@ -1392,8 +1352,6 @@ static struct kunit_case kasan_kunit_test_cases[] = {
> KUNIT_CASE(kasan_stack_oob),
> KUNIT_CASE(kasan_alloca_oob_left),
> KUNIT_CASE(kasan_alloca_oob_right),
> - KUNIT_CASE(ksize_unpoisons_memory),
> - KUNIT_CASE(ksize_uaf),
> KUNIT_CASE(kmem_cache_double_free),
> KUNIT_CASE(kmem_cache_invalid_free),
> KUNIT_CASE(kmem_cache_double_destroy),
> diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
> index 0e3648b603a6..0895c73e9b69 100644
> --- a/mm/kasan/shadow.c
> +++ b/mm/kasan/shadow.c
> @@ -124,9 +124,7 @@ void kasan_unpoison(const void *addr, size_t size, bool init)
> addr = kasan_reset_tag(addr);
>
> /*
> - * Skip KFENCE memory if called explicitly outside of sl*b. Also note
> - * that calls to ksize(), where size is not a multiple of machine-word
> - * size, would otherwise poison the invalid portion of the word.
> + * Skip KFENCE memory if called explicitly outside of sl*b.
> */
> if (is_kfence_address(addr))
> return;
> --
> 2.34.1

2022-09-24 09:21:40

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 03/16] skbuff: Proactively round up to kmalloc bucket size

On Fri, Sep 23, 2022 at 01:28:09PM -0700, Kees Cook wrote:
> Instead of discovering the kmalloc bucket size _after_ allocation, round
> up proactively so the allocation is explicitly made for the full size,
> allowing the compiler to correctly reason about the resulting size of
> the buffer through the existing __alloc_size() hint.
>
> This will allow for kernels built with CONFIG_UBSAN_BOUNDS or the
> coming dynamic bounds checking under CONFIG_FORTIFY_SOURCE to gain
> back the __alloc_size() hints that were temporarily reverted in commit
> 93dd04ab0b2b ("slab: remove __alloc_size attribute from __kmalloc_track_caller")
>
> Additionally tries to normalize size variables to u32 from int. Most
> interfaces are using "int", but notably __alloc_skb uses unsigned int.
>
> Also fix some reverse Christmas tree and comments while touching nearby
> code.

Something in this patch is breaking things -- I've refactored it again
to avoid overwriting the incoming size argument, and instead add a
dedicated outgoing size variable. Here's what will be v3 ...

---
net/core/skbuff.c | 41 ++++++++++++++++++++++-------------------
1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 974bbbbe7138..9b5a9fb69d9d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -346,11 +346,12 @@ EXPORT_SYMBOL(napi_build_skb);
* memory is free
*/
static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
- bool *pfmemalloc)
+ bool *pfmemalloc, size_t *alloc_size)
{
void *obj;
bool ret_pfmemalloc = false;

+ size = kmalloc_size_roundup(size);
/*
* Try a regular allocation, when that fails and we're not entitled
* to the reserves, fail.
@@ -369,6 +370,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
if (pfmemalloc)
*pfmemalloc = ret_pfmemalloc;

+ *alloc_size = size;
return obj;
}

@@ -400,7 +402,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
{
struct kmem_cache *cache;
struct sk_buff *skb;
- unsigned int osize;
+ size_t alloc_size;
bool pfmemalloc;
u8 *data;

@@ -427,15 +429,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
*/
size = SKB_DATA_ALIGN(size);
size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
- data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
- if (unlikely(!data))
- goto nodata;
- /* kmalloc(size) might give us more room than requested.
+ /* kmalloc(size) might give us more room than requested, so
+ * allocate the true bucket size up front.
* Put skb_shared_info exactly at the end of allocated zone,
* to allow max possible filling before reallocation.
*/
- osize = ksize(data);
- size = SKB_WITH_OVERHEAD(osize);
+ data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc, &alloc_size);
+ if (unlikely(!data))
+ goto nodata;
+ size = SKB_WITH_OVERHEAD(alloc_size);
prefetchw(data + size);

/*
@@ -444,7 +446,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
* the tail pointer in struct sk_buff!
*/
memset(skb, 0, offsetof(struct sk_buff, tail));
- __build_skb_around(skb, data, osize);
+ __build_skb_around(skb, data, alloc_size);
skb->pfmemalloc = pfmemalloc;

if (flags & SKB_ALLOC_FCLONE) {
@@ -1709,6 +1711,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
{
int i, osize = skb_end_offset(skb);
int size = osize + nhead + ntail;
+ size_t alloc_size;
long off;
u8 *data;

@@ -1723,10 +1726,10 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
- gfp_mask, NUMA_NO_NODE, NULL);
+ gfp_mask, NUMA_NO_NODE, NULL, &alloc_size);
if (!data)
goto nodata;
- size = SKB_WITH_OVERHEAD(ksize(data));
+ size = SKB_WITH_OVERHEAD(alloc_size);

/* Copy only real data... and, alas, header. This should be
* optimized for the cases when header is void.
@@ -6063,19 +6066,19 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
int i;
int size = skb_end_offset(skb);
int new_hlen = headlen - off;
+ size_t alloc_size;
u8 *data;

size = SKB_DATA_ALIGN(size);

if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
- data = kmalloc_reserve(size +
- SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
- gfp_mask, NUMA_NO_NODE, NULL);
+ data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
+ gfp_mask, NUMA_NO_NODE, NULL, &alloc_size);
if (!data)
return -ENOMEM;

- size = SKB_WITH_OVERHEAD(ksize(data));
+ size = SKB_WITH_OVERHEAD(alloc_size);

/* Copy real data, and all frags */
skb_copy_from_linear_data_offset(skb, off, data, new_hlen);
@@ -6184,18 +6187,18 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
u8 *data;
const int nfrags = skb_shinfo(skb)->nr_frags;
struct skb_shared_info *shinfo;
+ size_t alloc_size;

size = SKB_DATA_ALIGN(size);

if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
- data = kmalloc_reserve(size +
- SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
- gfp_mask, NUMA_NO_NODE, NULL);
+ data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
+ gfp_mask, NUMA_NO_NODE, NULL, &alloc_size);
if (!data)
return -ENOMEM;

- size = SKB_WITH_OVERHEAD(ksize(data));
+ size = SKB_WITH_OVERHEAD(alloc_size);

memcpy((struct skb_shared_info *)(data + size),
skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
--
2.34.1


--
Kees Cook

2022-09-25 07:38:16

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH v2 04/16] skbuff: Phase out ksize() fallback for frag_size

On Fri, 2022-09-23 at 13:28 -0700, Kees Cook wrote:
> All callers of APIs that allowed a 0-sized frag_size appear to be
> passing actual size information already

AFAICS, not yet:

drivers/net/ethernet/qlogic/qed/qed_ll2.c:
skb = build_skb(buffer->data, 0); // -> __build_skb(..., 0) 
// -> __build_skb_around()

drivers/net/ethernet/broadcom/bnx2.c:
skb = build_skb(data, 0);

I guess some more drivers have calls leading to 

__build_skb_around(..., 0)

there are several call path to checks...


> , so this use of ksize() can
> be removed. However, just in case there is something still depending
> on this behavior, issue a WARN and fall back to as before to ksize()
> which means we'll also potentially get KASAN warnings.
>
> Cc: "David S. Miller" <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: Paolo Abeni <[email protected]>
> Cc: [email protected]
> Signed-off-by: Kees Cook <[email protected]>
> ---
> net/core/skbuff.c | 18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 0b30fbdbd0d0..84ca89c781cd 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -195,7 +195,11 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
> unsigned int frag_size)
> {
> struct skb_shared_info *shinfo;
> - unsigned int size = frag_size ? : ksize(data);
> + unsigned int size = frag_size;
> +
> + /* All callers should be setting frag size now? */
> + if (WARN_ON_ONCE(size == 0))
> + size = ksize(data);

At some point in the future, I guess we could even drop this check,
right?

Thanks!

Paolo

2022-09-26 01:30:59

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 04/16] skbuff: Phase out ksize() fallback for frag_size

On Sun, Sep 25, 2022 at 09:17:40AM +0200, Paolo Abeni wrote:
> On Fri, 2022-09-23 at 13:28 -0700, Kees Cook wrote:
> > All callers of APIs that allowed a 0-sized frag_size appear to be
> > passing actual size information already
>
> AFAICS, not yet:
>
> drivers/net/ethernet/qlogic/qed/qed_ll2.c:
> skb = build_skb(buffer->data, 0); // -> __build_skb(..., 0)?
> // -> __build_skb_around()
>
> drivers/net/ethernet/broadcom/bnx2.c:
> skb = build_skb(data, 0);
>
> I guess some more drivers have calls leading to?
>
> __build_skb_around(..., 0)
>
> there are several call path to checks...

Ah-ha! Thank you. I will try to hunt these down -- I think we can't
remove the "secret resizing" effect of ksize() without fixing these.

> > [...]
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 0b30fbdbd0d0..84ca89c781cd 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -195,7 +195,11 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
> > unsigned int frag_size)
> > {
> > struct skb_shared_info *shinfo;
> > - unsigned int size = frag_size ? : ksize(data);
> > + unsigned int size = frag_size;
> > +
> > + /* All callers should be setting frag size now? */
> > + if (WARN_ON_ONCE(size == 0))
> > + size = ksize(data);
>
> At some point in the future, I guess we could even drop this check,
> right?

Alternatively, we might be able to ask the slab if "data" came from
kmalloc or a kmem_cache, and if the former, do:

data = krealloc(kmalloc_size_roundup(ksize(data), ...)

But that seems ugly...

--
Kees Cook

2022-09-26 02:12:05

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 14/16] kasan: Remove ksize()-related tests

On Sat, Sep 24, 2022 at 10:15:18AM +0200, Dmitry Vyukov wrote:
> On Fri, 23 Sept 2022 at 22:28, Kees Cook <[email protected]> wrote:
> >
> > In preparation for no longer unpoisoning in ksize(), remove the behavioral
> > self-tests for ksize().
> >
> > [...]
> > -/* Check that ksize() makes the whole object accessible. */
> > -static void ksize_unpoisons_memory(struct kunit *test)
> > -{
> > - char *ptr;
> > - size_t size = 123, real_size;
> > -
> > - ptr = kmalloc(size, GFP_KERNEL);
> > - KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr);
> > - real_size = ksize(ptr);
> > -
> > - OPTIMIZER_HIDE_VAR(ptr);
> > -
> > - /* This access shouldn't trigger a KASAN report. */
> > - ptr[size] = 'x';
>
> I would rather keep the tests and update to the new behavior. We had
> bugs in ksize, we need test coverage.
> I assume ptr[size] access must now produce an error even after ksize.

Good point on all these! I'll respin.

--
Kees Cook

2022-09-26 09:45:48

by Christian König

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] [PATCH v2 08/16] dma-buf: Proactively round up to kmalloc bucket size

Am 23.09.22 um 22:28 schrieb Kees Cook:
> Instead of discovering the kmalloc bucket size _after_ allocation, round
> up proactively so the allocation is explicitly made for the full size,
> allowing the compiler to correctly reason about the resulting size of
> the buffer through the existing __alloc_size() hint.
>
> Cc: Sumit Semwal <[email protected]>
> Cc: "Christian König" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Kees Cook <[email protected]>

Reviewed-by: Christian König <[email protected]>

> ---
> drivers/dma-buf/dma-resv.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 205acb2c744d..5b0a4b8830ff 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -98,12 +98,17 @@ static void dma_resv_list_set(struct dma_resv_list *list,
> static struct dma_resv_list *dma_resv_list_alloc(unsigned int max_fences)
> {
> struct dma_resv_list *list;
> + size_t size;
>
> - list = kmalloc(struct_size(list, table, max_fences), GFP_KERNEL);
> + /* Round up to the next kmalloc bucket size. */
> + size = kmalloc_size_roundup(struct_size(list, table, max_fences));
> +
> + list = kmalloc(size, GFP_KERNEL);
> if (!list)
> return NULL;
>
> - list->max_fences = (ksize(list) - offsetof(typeof(*list), table)) /
> + /* Given the resulting bucket size, recalculated max_fences. */
> + list->max_fences = (size - offsetof(typeof(*list), table)) /
> sizeof(*list->table);
>
> return list;

2022-09-26 15:45:06

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 02/16] slab: Introduce kmalloc_size_roundup()

On 9/23/22 22:28, Kees Cook wrote:
> In the effort to help the compiler reason about buffer sizes, the
> __alloc_size attribute was added to allocators. This improves the scope
> of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
> future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
> as the vast majority of callers are not expecting to use more memory
> than what they asked for.
>
> There is, however, one common exception to this: anticipatory resizing
> of kmalloc allocations. These cases all use ksize() to determine the
> actual bucket size of a given allocation (e.g. 128 when 126 was asked
> for). This comes in two styles in the kernel:
>
> 1) An allocation has been determined to be too small, and needs to be
> resized. Instead of the caller choosing its own next best size, it
> wants to minimize the number of calls to krealloc(), so it just uses
> ksize() plus some additional bytes, forcing the realloc into the next
> bucket size, from which it can learn how large it is now. For example:
>
> data = krealloc(data, ksize(data) + 1, gfp);
> data_len = ksize(data);
>
> 2) The minimum size of an allocation is calculated, but since it may
> grow in the future, just use all the space available in the chosen
> bucket immediately, to avoid needing to reallocate later. A good
> example of this is skbuff's allocators:
>
> data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
> ...
> /* kmalloc(size) might give us more room than requested.
> * Put skb_shared_info exactly at the end of allocated zone,
> * to allow max possible filling before reallocation.
> */
> osize = ksize(data);
> size = SKB_WITH_OVERHEAD(osize);
>
> In both cases, the "how much was actually allocated?" question is answered
> _after_ the allocation, where the compiler hinting is not in an easy place
> to make the association any more. This mismatch between the compiler's
> view of the buffer length and the code's intention about how much it is
> going to actually use has already caused problems[1]. It is possible to
> fix this by reordering the use of the "actual size" information.
>
> We can serve the needs of users of ksize() and still have accurate buffer
> length hinting for the compiler by doing the bucket size calculation
> _before_ the allocation. Code can instead ask "how large an allocation
> would I get for a given size?".
>
> Introduce kmalloc_size_roundup(), to serve this function so we can start
> replacing the "anticipatory resizing" uses of ksize().
>
> [1] https://github.com/ClangBuiltLinux/linux/issues/1599
> https://github.com/KSPP/linux/issues/183
>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Pekka Enberg <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Signed-off-by: Kees Cook <[email protected]>

OK, added patch 1+2 to slab.git for-next branch.
Had to adjust this one a bit, see below.

> ---
> include/linux/slab.h | 31 +++++++++++++++++++++++++++++++
> mm/slab.c | 9 ++++++---
> mm/slab_common.c | 20 ++++++++++++++++++++
> 3 files changed, 57 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 41bd036e7551..727640173568 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __r
> void kfree(const void *objp);
> void kfree_sensitive(const void *objp);
> size_t __ksize(const void *objp);
> +
> +/**
> + * ksize - Report actual allocation size of associated object
> + *
> + * @objp: Pointer returned from a prior kmalloc()-family allocation.
> + *
> + * This should not be used for writing beyond the originally requested
> + * allocation size. Either use krealloc() or round up the allocation size
> + * with kmalloc_size_roundup() prior to allocation. If this is used to
> + * access beyond the originally requested allocation size, UBSAN_BOUNDS
> + * and/or FORTIFY_SOURCE may trip, since they only know about the
> + * originally allocated size via the __alloc_size attribute.
> + */
> size_t ksize(const void *objp);
> +
> #ifdef CONFIG_PRINTK
> bool kmem_valid_obj(void *object);
> void kmem_dump_obj(void *object);
> @@ -779,6 +793,23 @@ extern void kvfree(const void *addr);
> extern void kvfree_sensitive(const void *addr, size_t len);
>
> unsigned int kmem_cache_size(struct kmem_cache *s);
> +
> +/**
> + * kmalloc_size_roundup - Report allocation bucket size for the given size
> + *
> + * @size: Number of bytes to round up from.
> + *
> + * This returns the number of bytes that would be available in a kmalloc()
> + * allocation of @size bytes. For example, a 126 byte request would be
> + * rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly
> + * for the general-purpose kmalloc()-based allocations, and is not for the
> + * pre-sized kmem_cache_alloc()-based allocations.)
> + *
> + * Use this to kmalloc() the full bucket size ahead of time instead of using
> + * ksize() to query the size after an allocation.
> + */
> +size_t kmalloc_size_roundup(size_t size);
> +
> void __init kmem_cache_init_late(void);
>
> #if defined(CONFIG_SMP) && defined(CONFIG_SLAB)
> diff --git a/mm/slab.c b/mm/slab.c
> index 10e96137b44f..2da862bf6226 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -4192,11 +4192,14 @@ void __check_heap_object(const void *ptr, unsigned long n,
> #endif /* CONFIG_HARDENED_USERCOPY */
>
> /**
> - * __ksize -- Uninstrumented ksize.
> + * __ksize -- Report full size of underlying allocation
> * @objp: pointer to the object
> *
> - * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
> - * safety checks as ksize() with KASAN instrumentation enabled.
> + * This should only be used internally to query the true size of allocations.
> + * It is not meant to be a way to discover the usable size of an allocation
> + * after the fact. Instead, use kmalloc_size_roundup(). Using memory beyond
> + * the originally requested allocation size may trigger KASAN, UBSAN_BOUNDS,
> + * and/or FORTIFY_SOURCE.
> *
> * Return: size of the actual memory used by @objp in bytes
> */
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 457671ace7eb..d7420cf649f8 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -721,6 +721,26 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
> return kmalloc_caches[kmalloc_type(flags)][index];
> }
>
> +size_t kmalloc_size_roundup(size_t size)
> +{
> + struct kmem_cache *c;
> +
> + /* Short-circuit the 0 size case. */
> + if (unlikely(size == 0))
> + return 0;
> + /* Short-circuit saturated "too-large" case. */
> + if (unlikely(size == SIZE_MAX))
> + return SIZE_MAX;
> + /* Above the smaller buckets, size is a multiple of page size. */
> + if (size > KMALLOC_MAX_CACHE_SIZE)
> + return PAGE_SIZE << get_order(size);
> +
> + /* The flags don't matter since size_index is common to all. */
> + c = kmalloc_slab(size, GFP_KERNEL);
> + return c ? c->object_size : 0;
> +}
> +EXPORT_SYMBOL(kmalloc_size_roundup);

We need a SLOB version too as it's not yet removed... I added this:

diff --git a/mm/slob.c b/mm/slob.c
index 2bd4f476c340..5dbdf6ad8bcc 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -574,6 +574,20 @@ void kfree(const void *block)
}
EXPORT_SYMBOL(kfree);

+size_t kmalloc_size_roundup(size_t size)
+{
+ /* Short-circuit the 0 size case. */
+ if (unlikely(size == 0))
+ return 0;
+ /* Short-circuit saturated "too-large" case. */
+ if (unlikely(size == SIZE_MAX))
+ return SIZE_MAX;
+
+ return ALIGN(size, ARCH_KMALLOC_MINALIGN);
+}
+
+EXPORT_SYMBOL(kmalloc_size_roundup);
+
/* can't use ksize for kmem_cache_alloc memory, only kmalloc */
size_t __ksize(const void *block)
{


2022-09-26 15:55:51

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2 13/16] mempool: Use kmalloc_size_roundup() to match ksize() usage

On 9/23/22 22:28, Kees Cook wrote:
> Round up allocations with kmalloc_size_roundup() so that mempool's use
> of ksize() is always accurate and no special handling of the memory is
> needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.
>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Signed-off-by: Kees Cook <[email protected]>
> ---
> mm/mempool.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/mempool.c b/mm/mempool.c
> index 96488b13a1ef..0f3107b28e6b 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -526,7 +526,7 @@ EXPORT_SYMBOL(mempool_free_slab);
> */
> void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data)
> {
> - size_t size = (size_t)pool_data;
> + size_t size = kmalloc_size_roundup((size_t)pool_data);

Hm it is kinda wasteful to call into kmalloc_size_roundup for every
allocation that has the same input. We could do it just once in
mempool_init_node() for adjusting pool->pool_data ?

But looking more closely, I wonder why poison_element() and
kasan_unpoison_element() in mm/mempool.c even have to use
ksize()/__ksize() and not just operate on the requested size (again,
pool->pool_data). If no kmalloc mempool's users use ksize() to write
beyond requested size, then we don't have to unpoison/poison that area
either?

> return kmalloc(size, gfp_mask);
> }
> EXPORT_SYMBOL(mempool_kmalloc);

2022-09-26 17:19:12

by Michael J. Ruhl

[permalink] [raw]
Subject: RE: [PATCH v2 06/16] igb: Proactively round up to kmalloc bucket size

>-----Original Message-----
>From: Kees Cook <[email protected]>
>Sent: Friday, September 23, 2022 4:28 PM
>To: Vlastimil Babka <[email protected]>
>Cc: Kees Cook <[email protected]>; Brandeburg, Jesse
><[email protected]>; Nguyen, Anthony L
><[email protected]>; David S. Miller <[email protected]>;
>Eric Dumazet <[email protected]>; Jakub Kicinski <[email protected]>;
>Paolo Abeni <[email protected]>; [email protected];
>[email protected]; Ruhl, Michael J <[email protected]>;
>Hyeonggon Yoo <[email protected]>; Christoph Lameter
><[email protected]>; Pekka Enberg <[email protected]>; David Rientjes
><[email protected]>; Joonsoo Kim <[email protected]>; Andrew
>Morton <[email protected]>; Greg Kroah-Hartman
><[email protected]>; Nick Desaulniers
><[email protected]>; Alex Elder <[email protected]>; Josef Bacik
><[email protected]>; David Sterba <[email protected]>; Sumit Semwal
><[email protected]>; Christian K?nig <[email protected]>;
>Daniel Micay <[email protected]>; Yonghong Song <[email protected]>;
>Marco Elver <[email protected]>; Miguel Ojeda <[email protected]>; linux-
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; linaro-mm-
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected]
>Subject: [PATCH v2 06/16] igb: Proactively round up to kmalloc bucket size
>
>In preparation for removing the "silently change allocation size"
>users of ksize(), explicitly round up all q_vector allocations so that
>allocations can be correctly compared to ksize().
>
>Additionally fix potential use-after-free in the case of new allocation
>failure: only free memory if the replacement allocation succeeds.
>
>Cc: Jesse Brandeburg <[email protected]>
>Cc: Tony Nguyen <[email protected]>
>Cc: "David S. Miller" <[email protected]>
>Cc: Eric Dumazet <[email protected]>
>Cc: Jakub Kicinski <[email protected]>
>Cc: Paolo Abeni <[email protected]>
>Cc: [email protected]
>Cc: [email protected]
>Signed-off-by: Kees Cook <[email protected]>
>---
> drivers/net/ethernet/intel/igb/igb_main.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
>b/drivers/net/ethernet/intel/igb/igb_main.c
>index 2796e81d2726..eb51e531c096 100644
>--- a/drivers/net/ethernet/intel/igb/igb_main.c
>+++ b/drivers/net/ethernet/intel/igb/igb_main.c
>@@ -1195,15 +1195,16 @@ static int igb_alloc_q_vector(struct igb_adapter
>*adapter,
> return -ENOMEM;
>
> ring_count = txr_count + rxr_count;
>- size = struct_size(q_vector, ring, ring_count);
>+ size = kmalloc_size_roundup(struct_size(q_vector, ring, ring_count));

This looks good to me...

> /* allocate q_vector and rings */
> q_vector = adapter->q_vector[v_idx];
> if (!q_vector) {
> q_vector = kzalloc(size, GFP_KERNEL);
> } else if (size > ksize(q_vector)) {
>- kfree_rcu(q_vector, rcu);
> q_vector = kzalloc(size, GFP_KERNEL);
>+ if (q_vector)
>+ kfree_rcu(q_vector, rcu);

Even though this is in the ksize part, this seems like an unrelated change?
Should this be in a different patch?

Also, the kfree_rcu will free q_vector after the RCU grace period?

Is that what you want to do?

How does rcu distinguish between the original q_vector, and the newly kzalloced one?

Thanks,

Mike



> } else {
> memset(q_vector, 0, size);
> }
>--
>2.34.1

2022-09-26 18:23:47

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 02/16] slab: Introduce kmalloc_size_roundup()

On Mon, Sep 26, 2022 at 03:15:22PM +0200, Vlastimil Babka wrote:
> On 9/23/22 22:28, Kees Cook wrote:
> > In the effort to help the compiler reason about buffer sizes, the
> > __alloc_size attribute was added to allocators. This improves the scope
> > of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
> > future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
> > as the vast majority of callers are not expecting to use more memory
> > than what they asked for.
> >
> > There is, however, one common exception to this: anticipatory resizing
> > of kmalloc allocations. These cases all use ksize() to determine the
> > actual bucket size of a given allocation (e.g. 128 when 126 was asked
> > for). This comes in two styles in the kernel:
> >
> > 1) An allocation has been determined to be too small, and needs to be
> > resized. Instead of the caller choosing its own next best size, it
> > wants to minimize the number of calls to krealloc(), so it just uses
> > ksize() plus some additional bytes, forcing the realloc into the next
> > bucket size, from which it can learn how large it is now. For example:
> >
> > data = krealloc(data, ksize(data) + 1, gfp);
> > data_len = ksize(data);
> >
> > 2) The minimum size of an allocation is calculated, but since it may
> > grow in the future, just use all the space available in the chosen
> > bucket immediately, to avoid needing to reallocate later. A good
> > example of this is skbuff's allocators:
> >
> > data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
> > ...
> > /* kmalloc(size) might give us more room than requested.
> > * Put skb_shared_info exactly at the end of allocated zone,
> > * to allow max possible filling before reallocation.
> > */
> > osize = ksize(data);
> > size = SKB_WITH_OVERHEAD(osize);
> >
> > In both cases, the "how much was actually allocated?" question is answered
> > _after_ the allocation, where the compiler hinting is not in an easy place
> > to make the association any more. This mismatch between the compiler's
> > view of the buffer length and the code's intention about how much it is
> > going to actually use has already caused problems[1]. It is possible to
> > fix this by reordering the use of the "actual size" information.
> >
> > We can serve the needs of users of ksize() and still have accurate buffer
> > length hinting for the compiler by doing the bucket size calculation
> > _before_ the allocation. Code can instead ask "how large an allocation
> > would I get for a given size?".
> >
> > Introduce kmalloc_size_roundup(), to serve this function so we can start
> > replacing the "anticipatory resizing" uses of ksize().
> >
> > [1] https://github.com/ClangBuiltLinux/linux/issues/1599
> > https://github.com/KSPP/linux/issues/183
> >
> > Cc: Vlastimil Babka <[email protected]>
> > Cc: Christoph Lameter <[email protected]>
> > Cc: Pekka Enberg <[email protected]>
> > Cc: David Rientjes <[email protected]>
> > Cc: Joonsoo Kim <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Kees Cook <[email protected]>
>
> OK, added patch 1+2 to slab.git for-next branch.
> Had to adjust this one a bit, see below.
>
> > ---
> > include/linux/slab.h | 31 +++++++++++++++++++++++++++++++
> > mm/slab.c | 9 ++++++---
> > mm/slab_common.c | 20 ++++++++++++++++++++
> > 3 files changed, 57 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index 41bd036e7551..727640173568 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __r
> > void kfree(const void *objp);
> > void kfree_sensitive(const void *objp);
> > size_t __ksize(const void *objp);
> > +
> > +/**
> > + * ksize - Report actual allocation size of associated object
> > + *
> > + * @objp: Pointer returned from a prior kmalloc()-family allocation.
> > + *
> > + * This should not be used for writing beyond the originally requested
> > + * allocation size. Either use krealloc() or round up the allocation size
> > + * with kmalloc_size_roundup() prior to allocation. If this is used to
> > + * access beyond the originally requested allocation size, UBSAN_BOUNDS
> > + * and/or FORTIFY_SOURCE may trip, since they only know about the
> > + * originally allocated size via the __alloc_size attribute.
> > + */
> > size_t ksize(const void *objp);
> > +
> > #ifdef CONFIG_PRINTK
> > bool kmem_valid_obj(void *object);
> > void kmem_dump_obj(void *object);
> > @@ -779,6 +793,23 @@ extern void kvfree(const void *addr);
> > extern void kvfree_sensitive(const void *addr, size_t len);
> > unsigned int kmem_cache_size(struct kmem_cache *s);
> > +
> > +/**
> > + * kmalloc_size_roundup - Report allocation bucket size for the given size
> > + *
> > + * @size: Number of bytes to round up from.
> > + *
> > + * This returns the number of bytes that would be available in a kmalloc()
> > + * allocation of @size bytes. For example, a 126 byte request would be
> > + * rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly
> > + * for the general-purpose kmalloc()-based allocations, and is not for the
> > + * pre-sized kmem_cache_alloc()-based allocations.)
> > + *
> > + * Use this to kmalloc() the full bucket size ahead of time instead of using
> > + * ksize() to query the size after an allocation.
> > + */
> > +size_t kmalloc_size_roundup(size_t size);
> > +
> > void __init kmem_cache_init_late(void);
> > #if defined(CONFIG_SMP) && defined(CONFIG_SLAB)
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 10e96137b44f..2da862bf6226 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -4192,11 +4192,14 @@ void __check_heap_object(const void *ptr, unsigned long n,
> > #endif /* CONFIG_HARDENED_USERCOPY */
> > /**
> > - * __ksize -- Uninstrumented ksize.
> > + * __ksize -- Report full size of underlying allocation
> > * @objp: pointer to the object
> > *
> > - * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
> > - * safety checks as ksize() with KASAN instrumentation enabled.
> > + * This should only be used internally to query the true size of allocations.
> > + * It is not meant to be a way to discover the usable size of an allocation
> > + * after the fact. Instead, use kmalloc_size_roundup(). Using memory beyond
> > + * the originally requested allocation size may trigger KASAN, UBSAN_BOUNDS,
> > + * and/or FORTIFY_SOURCE.
> > *
> > * Return: size of the actual memory used by @objp in bytes
> > */
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 457671ace7eb..d7420cf649f8 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -721,6 +721,26 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
> > return kmalloc_caches[kmalloc_type(flags)][index];
> > }
> > +size_t kmalloc_size_roundup(size_t size)
> > +{
> > + struct kmem_cache *c;
> > +
> > + /* Short-circuit the 0 size case. */
> > + if (unlikely(size == 0))
> > + return 0;
> > + /* Short-circuit saturated "too-large" case. */
> > + if (unlikely(size == SIZE_MAX))
> > + return SIZE_MAX;
> > + /* Above the smaller buckets, size is a multiple of page size. */
> > + if (size > KMALLOC_MAX_CACHE_SIZE)
> > + return PAGE_SIZE << get_order(size);
> > +
> > + /* The flags don't matter since size_index is common to all. */
> > + c = kmalloc_slab(size, GFP_KERNEL);
> > + return c ? c->object_size : 0;
> > +}
> > +EXPORT_SYMBOL(kmalloc_size_roundup);
>
> We need a SLOB version too as it's not yet removed... I added this:
>
> diff --git a/mm/slob.c b/mm/slob.c
> index 2bd4f476c340..5dbdf6ad8bcc 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -574,6 +574,20 @@ void kfree(const void *block)
> }
> EXPORT_SYMBOL(kfree);
> +size_t kmalloc_size_roundup(size_t size)
> +{
> + /* Short-circuit the 0 size case. */
> + if (unlikely(size == 0))
> + return 0;
> + /* Short-circuit saturated "too-large" case. */
> + if (unlikely(size == SIZE_MAX))
> + return SIZE_MAX;
> +
> + return ALIGN(size, ARCH_KMALLOC_MINALIGN);
> +}
> +
> +EXPORT_SYMBOL(kmalloc_size_roundup);

Ah, perfect! Thanks for catching that. :)

FWIW:

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook

2022-09-26 18:39:14

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 13/16] mempool: Use kmalloc_size_roundup() to match ksize() usage

On Mon, Sep 26, 2022 at 03:50:43PM +0200, Vlastimil Babka wrote:
> On 9/23/22 22:28, Kees Cook wrote:
> > Round up allocations with kmalloc_size_roundup() so that mempool's use
> > of ksize() is always accurate and no special handling of the memory is
> > needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.
> >
> > Cc: Andrew Morton <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Kees Cook <[email protected]>
> > ---
> > mm/mempool.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/mempool.c b/mm/mempool.c
> > index 96488b13a1ef..0f3107b28e6b 100644
> > --- a/mm/mempool.c
> > +++ b/mm/mempool.c
> > @@ -526,7 +526,7 @@ EXPORT_SYMBOL(mempool_free_slab);
> > */
> > void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data)
> > {
> > - size_t size = (size_t)pool_data;
> > + size_t size = kmalloc_size_roundup((size_t)pool_data);
>
> Hm it is kinda wasteful to call into kmalloc_size_roundup for every
> allocation that has the same input. We could do it just once in
> mempool_init_node() for adjusting pool->pool_data ?
>
> But looking more closely, I wonder why poison_element() and
> kasan_unpoison_element() in mm/mempool.c even have to use ksize()/__ksize()
> and not just operate on the requested size (again, pool->pool_data). If no
> kmalloc mempool's users use ksize() to write beyond requested size, then we
> don't have to unpoison/poison that area either?

Yeah, I think that's a fair point. I will adjust this.

--
Kees Cook

2022-10-01 16:52:13

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH v2 02/16] slab: Introduce kmalloc_size_roundup()

On Fri, Sep 23, 2022 at 01:28:08PM -0700, Kees Cook wrote:
> In the effort to help the compiler reason about buffer sizes, the
> __alloc_size attribute was added to allocators. This improves the scope
> of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
> future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
> as the vast majority of callers are not expecting to use more memory
> than what they asked for.
>
> There is, however, one common exception to this: anticipatory resizing
> of kmalloc allocations. These cases all use ksize() to determine the
> actual bucket size of a given allocation (e.g. 128 when 126 was asked
> for). This comes in two styles in the kernel:
>
> 1) An allocation has been determined to be too small, and needs to be
> resized. Instead of the caller choosing its own next best size, it
> wants to minimize the number of calls to krealloc(), so it just uses
> ksize() plus some additional bytes, forcing the realloc into the next
> bucket size, from which it can learn how large it is now. For example:
>
> data = krealloc(data, ksize(data) + 1, gfp);
> data_len = ksize(data);
>
> 2) The minimum size of an allocation is calculated, but since it may
> grow in the future, just use all the space available in the chosen
> bucket immediately, to avoid needing to reallocate later. A good
> example of this is skbuff's allocators:
>
> data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
> ...
> /* kmalloc(size) might give us more room than requested.
> * Put skb_shared_info exactly at the end of allocated zone,
> * to allow max possible filling before reallocation.
> */
> osize = ksize(data);
> size = SKB_WITH_OVERHEAD(osize);
>
> In both cases, the "how much was actually allocated?" question is answered
> _after_ the allocation, where the compiler hinting is not in an easy place
> to make the association any more. This mismatch between the compiler's
> view of the buffer length and the code's intention about how much it is
> going to actually use has already caused problems[1]. It is possible to
> fix this by reordering the use of the "actual size" information.
>
> We can serve the needs of users of ksize() and still have accurate buffer
> length hinting for the compiler by doing the bucket size calculation
> _before_ the allocation. Code can instead ask "how large an allocation
> would I get for a given size?".
>
> Introduce kmalloc_size_roundup(), to serve this function so we can start
> replacing the "anticipatory resizing" uses of ksize().
>
> [1] https://github.com/ClangBuiltLinux/linux/issues/1599
> https://github.com/KSPP/linux/issues/183
>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Pekka Enberg <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Signed-off-by: Kees Cook <[email protected]>
> ---
> include/linux/slab.h | 31 +++++++++++++++++++++++++++++++
> mm/slab.c | 9 ++++++---
> mm/slab_common.c | 20 ++++++++++++++++++++
> 3 files changed, 57 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 41bd036e7551..727640173568 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __r
> void kfree(const void *objp);
> void kfree_sensitive(const void *objp);
> size_t __ksize(const void *objp);
> +
> +/**
> + * ksize - Report actual allocation size of associated object
> + *
> + * @objp: Pointer returned from a prior kmalloc()-family allocation.
> + *
> + * This should not be used for writing beyond the originally requested
> + * allocation size. Either use krealloc() or round up the allocation size
> + * with kmalloc_size_roundup() prior to allocation. If this is used to
> + * access beyond the originally requested allocation size, UBSAN_BOUNDS
> + * and/or FORTIFY_SOURCE may trip, since they only know about the
> + * originally allocated size via the __alloc_size attribute.
> + */
> size_t ksize(const void *objp);
> +

With this now we have two conflicting kernel-doc comments
about ksize in mm/slab_common.c and include/linux/slab.h.

> #ifdef CONFIG_PRINTK
> bool kmem_valid_obj(void *object);
> void kmem_dump_obj(void *object);
> @@ -779,6 +793,23 @@ extern void kvfree(const void *addr);
> extern void kvfree_sensitive(const void *addr, size_t len);
>
> unsigned int kmem_cache_size(struct kmem_cache *s);
> +
> +/**
> + * kmalloc_size_roundup - Report allocation bucket size for the given size
> + *
> + * @size: Number of bytes to round up from.
> + *
> + * This returns the number of bytes that would be available in a kmalloc()
> + * allocation of @size bytes. For example, a 126 byte request would be
> + * rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly
> + * for the general-purpose kmalloc()-based allocations, and is not for the
> + * pre-sized kmem_cache_alloc()-based allocations.)
> + *
> + * Use this to kmalloc() the full bucket size ahead of time instead of using
> + * ksize() to query the size after an allocation.
> + */
> +size_t kmalloc_size_roundup(size_t size);
> +
> void __init kmem_cache_init_late(void);
>
> #if defined(CONFIG_SMP) && defined(CONFIG_SLAB)
> diff --git a/mm/slab.c b/mm/slab.c
> index 10e96137b44f..2da862bf6226 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -4192,11 +4192,14 @@ void __check_heap_object(const void *ptr, unsigned long n,
> #endif /* CONFIG_HARDENED_USERCOPY */
>
> /**
> - * __ksize -- Uninstrumented ksize.
> + * __ksize -- Report full size of underlying allocation
> * @objp: pointer to the object
> *
> - * Unlike ksize(), __ksize() is uninstrumented, and does not provide the same
> - * safety checks as ksize() with KASAN instrumentation enabled.
> + * This should only be used internally to query the true size of allocations.
> + * It is not meant to be a way to discover the usable size of an allocation
> + * after the fact. Instead, use kmalloc_size_roundup(). Using memory beyond
> + * the originally requested allocation size may trigger KASAN, UBSAN_BOUNDS,
> + * and/or FORTIFY_SOURCE.
> *
> * Return: size of the actual memory used by @objp in bytes
> */
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 457671ace7eb..d7420cf649f8 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -721,6 +721,26 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
> return kmalloc_caches[kmalloc_type(flags)][index];
> }
>
> +size_t kmalloc_size_roundup(size_t size)
> +{
> + struct kmem_cache *c;
> +
> + /* Short-circuit the 0 size case. */
> + if (unlikely(size == 0))
> + return 0;
> + /* Short-circuit saturated "too-large" case. */
> + if (unlikely(size == SIZE_MAX))
> + return SIZE_MAX;
> + /* Above the smaller buckets, size is a multiple of page size. */
> + if (size > KMALLOC_MAX_CACHE_SIZE)
> + return PAGE_SIZE << get_order(size);
> +
> + /* The flags don't matter since size_index is common to all. */
> + c = kmalloc_slab(size, GFP_KERNEL);
> + return c ? c->object_size : 0;
> +}
> +EXPORT_SYMBOL(kmalloc_size_roundup);
> +
> #ifdef CONFIG_ZONE_DMA
> #define KMALLOC_DMA_NAME(sz) .name[KMALLOC_DMA] = "dma-kmalloc-" #sz,
> #else
> --
> 2.34.1

Otherwise looks good!

--
Thanks,
Hyeonggon