2018-02-14 18:28:16

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 0/2] Add kvzalloc_struct to complement kvzalloc_array

From: Matthew Wilcox <[email protected]>

We all know the perils of multiplying a value provided from userspace
by a constant and then allocating the resulting number of bytes. That's
why we have kvmalloc_array(), so we don't have to think about it.
This solves the same problem when we embed one of these arrays in a
struct like this:

struct {
int n;
unsigned long array[];
};

Using kvzalloc_struct() to allocate this will save precious thinking
time and reduce the possibilities of bugs.

Matthew Wilcox (2):
mm: Add kernel-doc for kvfree
mm: Add kvmalloc_ab_c and kvzalloc_struct

include/linux/mm.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
mm/util.c | 10 ++++++++++
2 files changed, 61 insertions(+)

--
2.15.1



2018-02-14 18:28:13

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

From: Matthew Wilcox <[email protected]>

We have kvmalloc_array in order to safely allocate an array with a
number of elements specified by userspace (avoiding arithmetic overflow
leading to a buffer overrun). But it's fairly common to have a header
in front of that array (eg specifying the length of the array), so we
need a helper function for that situation.

kvmalloc_ab_c() is the workhorse that does the calculation, but in spite
of our best efforts to name the arguments, it's really hard to remember
which order to put the arguments in. kvzalloc_struct() eliminates that
effort; you tell it about the struct you're allocating, and it puts the
arguments in the right order for you (and checks that the arguments
you've given are at least plausible).

For comparison between the three schemes:

sev = kvzalloc(sizeof(*sev) + sizeof(struct v4l2_kevent) * elems,
GFP_KERNEL);
sev = kvzalloc_ab_c(elems, sizeof(struct v4l2_kevent), sizeof(*sev),
GFP_KERNEL);
sev = kvzalloc_struct(sev, events, elems, GFP_KERNEL);

Signed-off-by: Matthew Wilcox <[email protected]>
---
include/linux/mm.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 81bd7f0be286..ddf929c5aaee 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -557,6 +557,57 @@ static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
return kvmalloc(n * size, flags);
}

+/**
+ * kvmalloc_ab_c() - Allocate memory.
+ * @n: Number of elements.
+ * @size: Size of each element (should be constant).
+ * @c: Size of header (should be constant).
+ * @gfp: Memory allocation flags.
+ *
+ * Use this function to allocate @n * @size + @c bytes of memory. This
+ * function is safe to use when @n is controlled from userspace; it will
+ * return %NULL if the required amount of memory cannot be allocated.
+ * Use kvfree() to free the allocated memory.
+ *
+ * The kvzalloc_hdr_arr() function is easier to use as it has typechecking
+ * and you do not need to remember which of the arguments should be constants.
+ *
+ * Context: Process context. May sleep; the @gfp flags should be based on
+ * %GFP_KERNEL.
+ * Return: A pointer to the allocated memory or %NULL.
+ */
+static inline __must_check
+void *kvmalloc_ab_c(size_t n, size_t size, size_t c, gfp_t gfp)
+{
+ if (size != 0 && n > (SIZE_MAX - c) / size)
+ return NULL;
+
+ return kvmalloc(n * size + c, gfp);
+}
+#define kvzalloc_ab_c(a, b, c, gfp) kvmalloc_ab_c(a, b, c, gfp | __GFP_ZERO)
+
+/**
+ * kvzalloc_struct() - Allocate and zero-fill a structure containing a
+ * variable length array.
+ * @p: Pointer to the structure.
+ * @member: Name of the array member.
+ * @n: Number of elements in the array.
+ * @gfp: Memory allocation flags.
+ *
+ * Allocate (and zero-fill) enough memory for a structure with an array
+ * of @n elements. This function is safe to use when @n is specified by
+ * userspace as the arithmetic will not overflow.
+ * Use kvfree() to free the allocated memory.
+ *
+ * Context: Process context. May sleep; the @gfp flags should be based on
+ * %GFP_KERNEL.
+ * Return: Zero-filled memory or a NULL pointer.
+ */
+#define kvzalloc_struct(p, member, n, gfp) \
+ (typeof(p))kvzalloc_ab_c(n, \
+ sizeof(*(p)->member) + __must_be_array((p)->member), \
+ offsetof(typeof(*(p)), member), gfp)
+
extern void kvfree(const void *addr);

static inline atomic_t *compound_mapcount_ptr(struct page *page)
--
2.15.1


2018-02-14 18:28:32

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 1/2] mm: Add kernel-doc for kvfree

From: Matthew Wilcox <[email protected]>

Signed-off-by: Matthew Wilcox <[email protected]>
---
mm/util.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/mm/util.c b/mm/util.c
index c1250501364f..dc4c7b551aaf 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -430,6 +430,16 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
}
EXPORT_SYMBOL(kvmalloc_node);

+/**
+ * kvfree() - Free memory.
+ * @addr: Pointer to allocated memory.
+ *
+ * kvfree frees memory allocated by any of vmalloc(), kmalloc() or
+ * kvmalloc(). It is slightly more efficient to use kfree() or vfree()
+ * if you are certain that you know which one to use.
+ *
+ * Context: Any context except NMI.
+ */
void kvfree(const void *addr)
{
if (is_vmalloc_addr(addr))
--
2.15.1


2018-02-14 18:49:22

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add kvzalloc_struct to complement kvzalloc_array

On Wed, 2018-02-14 at 10:26 -0800, Matthew Wilcox wrote:
> From: Matthew Wilcox <[email protected]>
>
> We all know the perils of multiplying a value provided from userspace
> by a constant and then allocating the resulting number of bytes. That's
> why we have kvmalloc_array(), so we don't have to think about it.
> This solves the same problem when we embed one of these arrays in a
> struct like this:
>
> struct {
> int n;
> unsigned long array[];
> };

I think expanding the number of allocation functions
is not necessary.


2018-02-14 19:34:09

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add kvzalloc_struct to complement kvzalloc_array

On Wed, 2018-02-14 at 11:23 -0800, Kees Cook wrote:
> On Wed, Feb 14, 2018 at 10:47 AM, Joe Perches <[email protected]> wrote:
> > On Wed, 2018-02-14 at 10:26 -0800, Matthew Wilcox wrote:
> > > From: Matthew Wilcox <[email protected]>
> > >
> > > We all know the perils of multiplying a value provided from userspace
> > > by a constant and then allocating the resulting number of bytes. That's
> > > why we have kvmalloc_array(), so we don't have to think about it.
> > > This solves the same problem when we embed one of these arrays in a
> > > struct like this:
> > >
> > > struct {
> > > int n;
> > > unsigned long array[];
> > > };
> >
> > I think expanding the number of allocation functions
> > is not necessary.
>
> I think removing common mispatterns in favor of overflow-protected
> allocation functions makes sense.

Function symmetry matters too.

These allocation functions are specific to kvz<foo>
and are not symmetric for k<foo>, v<foo>, devm_<foo>
<foo>_node, and the like.


2018-02-14 19:38:34

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add kvzalloc_struct to complement kvzalloc_array

On Wed, Feb 14, 2018 at 11:32:45AM -0800, Joe Perches wrote:
> On Wed, 2018-02-14 at 11:23 -0800, Kees Cook wrote:
> > On Wed, Feb 14, 2018 at 10:47 AM, Joe Perches <[email protected]> wrote:
> > > I think expanding the number of allocation functions
> > > is not necessary.
> >
> > I think removing common mispatterns in favor of overflow-protected
> > allocation functions makes sense.
>
> Function symmetry matters too.
>
> These allocation functions are specific to kvz<foo>
> and are not symmetric for k<foo>, v<foo>, devm_<foo>
> <foo>_node, and the like.

If somebody wants them, then we can add them.

2018-02-14 19:44:45

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add kvzalloc_struct to complement kvzalloc_array

On Wed, 2018-02-14 at 11:36 -0800, Matthew Wilcox wrote:
> On Wed, Feb 14, 2018 at 11:32:45AM -0800, Joe Perches wrote:
> > On Wed, 2018-02-14 at 11:23 -0800, Kees Cook wrote:
> > > On Wed, Feb 14, 2018 at 10:47 AM, Joe Perches <[email protected]> wrote:
> > > > I think expanding the number of allocation functions
> > > > is not necessary.
> > >
> > > I think removing common mispatterns in favor of overflow-protected
> > > allocation functions makes sense.
> >
> > Function symmetry matters too.
> >
> > These allocation functions are specific to kvz<foo>
> > and are not symmetric for k<foo>, v<foo>, devm_<foo>
> > <foo>_node, and the like.
>
> If somebody wants them, then we can add them.

Yeah, but I don't think any of it is necessary.

How many of these struct+bufsize * count entries
actually exist?


2018-02-14 20:13:21

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Wed, Feb 14, 2018 at 10:26 AM, Matthew Wilcox <[email protected]> wrote:
> From: Matthew Wilcox <[email protected]>
>
> We have kvmalloc_array in order to safely allocate an array with a
> number of elements specified by userspace (avoiding arithmetic overflow
> leading to a buffer overrun). But it's fairly common to have a header
> in front of that array (eg specifying the length of the array), so we
> need a helper function for that situation.
>
> kvmalloc_ab_c() is the workhorse that does the calculation, but in spite
> of our best efforts to name the arguments, it's really hard to remember
> which order to put the arguments in. kvzalloc_struct() eliminates that
> effort; you tell it about the struct you're allocating, and it puts the
> arguments in the right order for you (and checks that the arguments
> you've given are at least plausible).
>
> For comparison between the three schemes:
>
> sev = kvzalloc(sizeof(*sev) + sizeof(struct v4l2_kevent) * elems,
> GFP_KERNEL);
> sev = kvzalloc_ab_c(elems, sizeof(struct v4l2_kevent), sizeof(*sev),
> GFP_KERNEL);
> sev = kvzalloc_struct(sev, events, elems, GFP_KERNEL);
>
> Signed-off-by: Matthew Wilcox <[email protected]>
> ---
> include/linux/mm.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 51 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 81bd7f0be286..ddf929c5aaee 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -557,6 +557,57 @@ static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
> return kvmalloc(n * size, flags);
> }
>
> +/**
> + * kvmalloc_ab_c() - Allocate memory.

Longer description, maybe? "Allocate a *b + c bytes of memory"?

> + * @n: Number of elements.
> + * @size: Size of each element (should be constant).
> + * @c: Size of header (should be constant).

If these should be constant, should we mark them as "const"? Or WARN
if __builtin_constant_p() isn't true?

> + * @gfp: Memory allocation flags.
> + *
> + * Use this function to allocate @n * @size + @c bytes of memory. This
> + * function is safe to use when @n is controlled from userspace; it will
> + * return %NULL if the required amount of memory cannot be allocated.
> + * Use kvfree() to free the allocated memory.
> + *
> + * The kvzalloc_hdr_arr() function is easier to use as it has typechecking

renaming typo? Should this be "kvzalloc_struct()"?

> + * and you do not need to remember which of the arguments should be constants.
> + *
> + * Context: Process context. May sleep; the @gfp flags should be based on
> + * %GFP_KERNEL.
> + * Return: A pointer to the allocated memory or %NULL.
> + */
> +static inline __must_check
> +void *kvmalloc_ab_c(size_t n, size_t size, size_t c, gfp_t gfp)
> +{
> + if (size != 0 && n > (SIZE_MAX - c) / size)
> + return NULL;
> +
> + return kvmalloc(n * size + c, gfp);
> +}
> +#define kvzalloc_ab_c(a, b, c, gfp) kvmalloc_ab_c(a, b, c, gfp | __GFP_ZERO)

Nit: "(gfp) | __GFP_ZERO" just in case of insane usage.

> +
> +/**
> + * kvzalloc_struct() - Allocate and zero-fill a structure containing a
> + * variable length array.
> + * @p: Pointer to the structure.
> + * @member: Name of the array member.
> + * @n: Number of elements in the array.
> + * @gfp: Memory allocation flags.
> + *
> + * Allocate (and zero-fill) enough memory for a structure with an array
> + * of @n elements. This function is safe to use when @n is specified by
> + * userspace as the arithmetic will not overflow.
> + * Use kvfree() to free the allocated memory.
> + *
> + * Context: Process context. May sleep; the @gfp flags should be based on
> + * %GFP_KERNEL.
> + * Return: Zero-filled memory or a NULL pointer.
> + */
> +#define kvzalloc_struct(p, member, n, gfp) \
> + (typeof(p))kvzalloc_ab_c(n, \
> + sizeof(*(p)->member) + __must_be_array((p)->member), \
> + offsetof(typeof(*(p)), member), gfp)
> +
> extern void kvfree(const void *addr);
>
> static inline atomic_t *compound_mapcount_ptr(struct page *page)

It might be nice to include another patch that replaces some of the
existing/common uses of a*b+c with the new function...

Otherwise, yes, please. We could build a coccinelle rule for
additional replacements...

-Kees

--
Kees Cook
Pixel Security

2018-02-14 20:13:45

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct



On Wed, 14 Feb 2018, Kees Cook wrote:

> On Wed, Feb 14, 2018 at 10:26 AM, Matthew Wilcox <[email protected]> wrote:
> > From: Matthew Wilcox <[email protected]>
> >
> > We have kvmalloc_array in order to safely allocate an array with a
> > number of elements specified by userspace (avoiding arithmetic overflow
> > leading to a buffer overrun). But it's fairly common to have a header
> > in front of that array (eg specifying the length of the array), so we
> > need a helper function for that situation.
> >
> > kvmalloc_ab_c() is the workhorse that does the calculation, but in spite
> > of our best efforts to name the arguments, it's really hard to remember
> > which order to put the arguments in. kvzalloc_struct() eliminates that
> > effort; you tell it about the struct you're allocating, and it puts the
> > arguments in the right order for you (and checks that the arguments
> > you've given are at least plausible).
> >
> > For comparison between the three schemes:
> >
> > sev = kvzalloc(sizeof(*sev) + sizeof(struct v4l2_kevent) * elems,
> > GFP_KERNEL);
> > sev = kvzalloc_ab_c(elems, sizeof(struct v4l2_kevent), sizeof(*sev),
> > GFP_KERNEL);
> > sev = kvzalloc_struct(sev, events, elems, GFP_KERNEL);
> >
> > Signed-off-by: Matthew Wilcox <[email protected]>
> > ---
> > include/linux/mm.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 51 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 81bd7f0be286..ddf929c5aaee 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -557,6 +557,57 @@ static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
> > return kvmalloc(n * size, flags);
> > }
> >
> > +/**
> > + * kvmalloc_ab_c() - Allocate memory.
>
> Longer description, maybe? "Allocate a *b + c bytes of memory"?
>
> > + * @n: Number of elements.
> > + * @size: Size of each element (should be constant).
> > + * @c: Size of header (should be constant).
>
> If these should be constant, should we mark them as "const"? Or WARN
> if __builtin_constant_p() isn't true?
>
> > + * @gfp: Memory allocation flags.
> > + *
> > + * Use this function to allocate @n * @size + @c bytes of memory. This
> > + * function is safe to use when @n is controlled from userspace; it will
> > + * return %NULL if the required amount of memory cannot be allocated.
> > + * Use kvfree() to free the allocated memory.
> > + *
> > + * The kvzalloc_hdr_arr() function is easier to use as it has typechecking
>
> renaming typo? Should this be "kvzalloc_struct()"?
>
> > + * and you do not need to remember which of the arguments should be constants.
> > + *
> > + * Context: Process context. May sleep; the @gfp flags should be based on
> > + * %GFP_KERNEL.
> > + * Return: A pointer to the allocated memory or %NULL.
> > + */
> > +static inline __must_check
> > +void *kvmalloc_ab_c(size_t n, size_t size, size_t c, gfp_t gfp)
> > +{
> > + if (size != 0 && n > (SIZE_MAX - c) / size)
> > + return NULL;
> > +
> > + return kvmalloc(n * size + c, gfp);
> > +}
> > +#define kvzalloc_ab_c(a, b, c, gfp) kvmalloc_ab_c(a, b, c, gfp | __GFP_ZERO)
>
> Nit: "(gfp) | __GFP_ZERO" just in case of insane usage.
>
> > +
> > +/**
> > + * kvzalloc_struct() - Allocate and zero-fill a structure containing a
> > + * variable length array.
> > + * @p: Pointer to the structure.
> > + * @member: Name of the array member.
> > + * @n: Number of elements in the array.
> > + * @gfp: Memory allocation flags.
> > + *
> > + * Allocate (and zero-fill) enough memory for a structure with an array
> > + * of @n elements. This function is safe to use when @n is specified by
> > + * userspace as the arithmetic will not overflow.
> > + * Use kvfree() to free the allocated memory.
> > + *
> > + * Context: Process context. May sleep; the @gfp flags should be based on
> > + * %GFP_KERNEL.
> > + * Return: Zero-filled memory or a NULL pointer.
> > + */
> > +#define kvzalloc_struct(p, member, n, gfp) \
> > + (typeof(p))kvzalloc_ab_c(n, \
> > + sizeof(*(p)->member) + __must_be_array((p)->member), \
> > + offsetof(typeof(*(p)), member), gfp)
> > +
> > extern void kvfree(const void *addr);
> >
> > static inline atomic_t *compound_mapcount_ptr(struct page *page)
>
> It might be nice to include another patch that replaces some of the
> existing/common uses of a*b+c with the new function...
>
> Otherwise, yes, please. We could build a coccinelle rule for
> additional replacements...

Thanks for the suggestion. I will look into it.

julia

2018-02-14 20:13:55

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Wed, Feb 14, 2018 at 11:22:38AM -0800, Kees Cook wrote:
> > +/**
> > + * kvmalloc_ab_c() - Allocate memory.
>
> Longer description, maybe? "Allocate a *b + c bytes of memory"?

Done!

> > + * @n: Number of elements.
> > + * @size: Size of each element (should be constant).
> > + * @c: Size of header (should be constant).
>
> If these should be constant, should we mark them as "const"? Or WARN
> if __builtin_constant_p() isn't true?

It's only less efficient if they're not const. Theoretically they could be
variable ... and I've been bitten by __builtin_constant_p() recently
(gcc bug 83653 which I still don't really understand).

> > + * @gfp: Memory allocation flags.
> > + *
> > + * Use this function to allocate @n * @size + @c bytes of memory. This
> > + * function is safe to use when @n is controlled from userspace; it will
> > + * return %NULL if the required amount of memory cannot be allocated.
> > + * Use kvfree() to free the allocated memory.
> > + *
> > + * The kvzalloc_hdr_arr() function is easier to use as it has typechecking
>
> renaming typo? Should this be "kvzalloc_struct()"?

Urgh, yes. I swear I searched for it ... must've typoed my search string.
Anyway, fixed, because kvzalloc_hdr_arr() wasn't a good name.

> > +#define kvzalloc_ab_c(a, b, c, gfp) kvmalloc_ab_c(a, b, c, gfp | __GFP_ZERO)
>
> Nit: "(gfp) | __GFP_ZERO" just in case of insane usage.

Fixed!

> It might be nice to include another patch that replaces some of the
> existing/common uses of a*b+c with the new function...

Sure! I have a few examples in my tree, I just didn't want to complicate
things by sending a patch that crossed dozens of maintainer trees.


2018-02-14 20:14:20

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add kvzalloc_struct to complement kvzalloc_array

On Wed, Feb 14, 2018 at 10:47 AM, Joe Perches <[email protected]> wrote:
> On Wed, 2018-02-14 at 10:26 -0800, Matthew Wilcox wrote:
>> From: Matthew Wilcox <[email protected]>
>>
>> We all know the perils of multiplying a value provided from userspace
>> by a constant and then allocating the resulting number of bytes. That's
>> why we have kvmalloc_array(), so we don't have to think about it.
>> This solves the same problem when we embed one of these arrays in a
>> struct like this:
>>
>> struct {
>> int n;
>> unsigned long array[];
>> };
>
> I think expanding the number of allocation functions
> is not necessary.

I think removing common mispatterns in favor of overflow-protected
allocation functions makes sense.

-Kees

--
Kees Cook
Pixel Security

Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Wed, 14 Feb 2018, Matthew Wilcox wrote:

> +#define kvzalloc_struct(p, member, n, gfp) \
> + (typeof(p))kvzalloc_ab_c(n, \
> + sizeof(*(p)->member) + __must_be_array((p)->member), \
> + offsetof(typeof(*(p)), member), gfp)
> +

Uppercase like the similar KMEM_CACHE related macros in
include/linux/slab.h?>



2018-02-14 20:14:24

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add kvzalloc_struct to complement kvzalloc_array

On Wed, Feb 14, 2018 at 11:43:46AM -0800, Joe Perches wrote:
> On Wed, 2018-02-14 at 11:36 -0800, Matthew Wilcox wrote:
> > If somebody wants them, then we can add them.
>
> Yeah, but I don't think any of it is necessary.
>
> How many of these struct+bufsize * count entries
> actually exist?

Wrong question. How many of them currently exist that don't check for
integer overflow? How many of them will be added in the future that
will fail to check for integer overflow?

I chose five at random to fix as demonstration that the API is good.
There are more; I imagine that Julia will be able to tell us how many.

2018-02-14 20:14:55

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 0/2] Add kvzalloc_struct to complement kvzalloc_array

On Wed, 2018-02-14 at 11:56 -0800, Matthew Wilcox wrote:
> On Wed, Feb 14, 2018 at 11:43:46AM -0800, Joe Perches wrote:
> > On Wed, 2018-02-14 at 11:36 -0800, Matthew Wilcox wrote:
> > > If somebody wants them, then we can add them.
> >
> > Yeah, but I don't think any of it is necessary.
> >
> > How many of these struct+bufsize * count entries
> > actually exist?
>
> Wrong question. How many of them currently exist that don't check for
> integer overflow? How many of them will be added in the future that
> will fail to check for integer overflow?
>
> I chose five at random to fix as demonstration that the API is good.
> There are more; I imagine that Julia will be able to tell us how many.

No such conversions exist in the patch series
you submitted.

What are those?


2018-02-14 20:14:56

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Wed, Feb 14, 2018 at 01:55:59PM -0600, Christopher Lameter wrote:
> On Wed, 14 Feb 2018, Matthew Wilcox wrote:
>
> > +#define kvzalloc_struct(p, member, n, gfp) \
> > + (typeof(p))kvzalloc_ab_c(n, \
> > + sizeof(*(p)->member) + __must_be_array((p)->member), \
> > + offsetof(typeof(*(p)), member), gfp)
> > +
>
> Uppercase like the similar KMEM_CACHE related macros in
> include/linux/slab.h?>

Do you think that would look better in the users? Compare:

@@ -1284,7 +1284,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
return -EOPNOTSUPP;
if (mem.nregions > max_mem_regions)
return -E2BIG;
- newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
+ newmem = kvzalloc_struct(newmem, regions, mem.nregions, GFP_KERNEL);
if (!newmem)
return -ENOMEM;

@@ -1284,7 +1284,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
return -EOPNOTSUPP;
if (mem.nregions > max_mem_regions)
return -E2BIG;
- newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
+ newmem = KVZALLOC_STRUCT(newmem, regions, mem.nregions, GFP_KERNEL);
if (!newmem)
return -ENOMEM;

Making it look like a function is more pleasing to my eye, but I'll
change it if that's the only thing keeping it from being merged.

Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Wed, 14 Feb 2018, Matthew Wilcox wrote:

> > Uppercase like the similar KMEM_CACHE related macros in
> > include/linux/slab.h?>
>
> Do you think that would look better in the users? Compare:

Does looking matter? I thought we had the convention that macros are
uppercase. There are some tricks going on with the struct. Uppercase shows
that something special is going on.

> Making it look like a function is more pleasing to my eye, but I'll
> change it if that's the only thing keeping it from being merged.

This should be consistent throughout the source.

2018-02-16 08:25:29

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Thu, Feb 15, 2018 at 09:55:11AM -0600, Christopher Lameter wrote:
> On Wed, 14 Feb 2018, Matthew Wilcox wrote:
>
> > > Uppercase like the similar KMEM_CACHE related macros in
> > > include/linux/slab.h?>
> >
> > Do you think that would look better in the users? Compare:
>
> Does looking matter? I thought we had the convention that macros are
> uppercase. There are some tricks going on with the struct. Uppercase shows
> that something special is going on.

12) Macros, Enums and RTL
-------------------------

Names of macros defining constants and labels in enums are capitalized.

.. code-block:: c

#define CONSTANT 0x12345

Enums are preferred when defining several related constants.

CAPITALIZED macro names are appreciated but macros resembling functions
may be named in lower case.

I dunno. Yes, there's macro trickery going on here, but it certainly
resembles a function. It doesn't fail any of the rules laid out in that
chapter of coding-style about unacceptable uses of macros.

Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Thu, 15 Feb 2018, Matthew Wilcox wrote:

> I dunno. Yes, there's macro trickery going on here, but it certainly
> resembles a function. It doesn't fail any of the rules laid out in that
> chapter of coding-style about unacceptable uses of macros.

It sure looks like a function but does magic things with the struct
parameter. So its not working like a function and the capitalization makes
one aware of that.




2018-02-22 01:29:47

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Thu, Feb 15, 2018 at 9:06 AM, Christopher Lameter <[email protected]> wrote:
> On Thu, 15 Feb 2018, Matthew Wilcox wrote:
>
>> I dunno. Yes, there's macro trickery going on here, but it certainly
>> resembles a function. It doesn't fail any of the rules laid out in that
>> chapter of coding-style about unacceptable uses of macros.
>
> It sure looks like a function but does magic things with the struct
> parameter. So its not working like a function and the capitalization makes
> one aware of that.

I think readability trumps that -- nearly everything else in the
kernel that hides these kinds of details is lower case.

-Kees

--
Kees Cook
Pixel Security

2018-03-07 21:20:47

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct



On Wed, 14 Feb 2018, Kees Cook wrote:

> On Wed, Feb 14, 2018 at 10:26 AM, Matthew Wilcox <[email protected]> wrote:
> > From: Matthew Wilcox <[email protected]>
> >
> > We have kvmalloc_array in order to safely allocate an array with a
> > number of elements specified by userspace (avoiding arithmetic overflow
> > leading to a buffer overrun). But it's fairly common to have a header
> > in front of that array (eg specifying the length of the array), so we
> > need a helper function for that situation.
> >
> > kvmalloc_ab_c() is the workhorse that does the calculation, but in spite
> > of our best efforts to name the arguments, it's really hard to remember
> > which order to put the arguments in. kvzalloc_struct() eliminates that
> > effort; you tell it about the struct you're allocating, and it puts the
> > arguments in the right order for you (and checks that the arguments
> > you've given are at least plausible).
> >
> > For comparison between the three schemes:
> >
> > sev = kvzalloc(sizeof(*sev) + sizeof(struct v4l2_kevent) * elems,
> > GFP_KERNEL);
> > sev = kvzalloc_ab_c(elems, sizeof(struct v4l2_kevent), sizeof(*sev),
> > GFP_KERNEL);
> > sev = kvzalloc_struct(sev, events, elems, GFP_KERNEL);
> >
> > Signed-off-by: Matthew Wilcox <[email protected]>
> > ---
> > include/linux/mm.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 51 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 81bd7f0be286..ddf929c5aaee 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -557,6 +557,57 @@ static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
> > return kvmalloc(n * size, flags);
> > }
> >
> > +/**
> > + * kvmalloc_ab_c() - Allocate memory.
>
> Longer description, maybe? "Allocate a *b + c bytes of memory"?
>
> > + * @n: Number of elements.
> > + * @size: Size of each element (should be constant).
> > + * @c: Size of header (should be constant).
>
> If these should be constant, should we mark them as "const"? Or WARN
> if __builtin_constant_p() isn't true?
>
> > + * @gfp: Memory allocation flags.
> > + *
> > + * Use this function to allocate @n * @size + @c bytes of memory. This
> > + * function is safe to use when @n is controlled from userspace; it will
> > + * return %NULL if the required amount of memory cannot be allocated.
> > + * Use kvfree() to free the allocated memory.
> > + *
> > + * The kvzalloc_hdr_arr() function is easier to use as it has typechecking
>
> renaming typo? Should this be "kvzalloc_struct()"?
>
> > + * and you do not need to remember which of the arguments should be constants.
> > + *
> > + * Context: Process context. May sleep; the @gfp flags should be based on
> > + * %GFP_KERNEL.
> > + * Return: A pointer to the allocated memory or %NULL.
> > + */
> > +static inline __must_check
> > +void *kvmalloc_ab_c(size_t n, size_t size, size_t c, gfp_t gfp)
> > +{
> > + if (size != 0 && n > (SIZE_MAX - c) / size)
> > + return NULL;
> > +
> > + return kvmalloc(n * size + c, gfp);
> > +}
> > +#define kvzalloc_ab_c(a, b, c, gfp) kvmalloc_ab_c(a, b, c, gfp | __GFP_ZERO)
>
> Nit: "(gfp) | __GFP_ZERO" just in case of insane usage.
>
> > +
> > +/**
> > + * kvzalloc_struct() - Allocate and zero-fill a structure containing a
> > + * variable length array.
> > + * @p: Pointer to the structure.
> > + * @member: Name of the array member.
> > + * @n: Number of elements in the array.
> > + * @gfp: Memory allocation flags.
> > + *
> > + * Allocate (and zero-fill) enough memory for a structure with an array
> > + * of @n elements. This function is safe to use when @n is specified by
> > + * userspace as the arithmetic will not overflow.
> > + * Use kvfree() to free the allocated memory.
> > + *
> > + * Context: Process context. May sleep; the @gfp flags should be based on
> > + * %GFP_KERNEL.
> > + * Return: Zero-filled memory or a NULL pointer.
> > + */
> > +#define kvzalloc_struct(p, member, n, gfp) \
> > + (typeof(p))kvzalloc_ab_c(n, \
> > + sizeof(*(p)->member) + __must_be_array((p)->member), \
> > + offsetof(typeof(*(p)), member), gfp)
> > +
> > extern void kvfree(const void *addr);
> >
> > static inline atomic_t *compound_mapcount_ptr(struct page *page)
>
> It might be nice to include another patch that replaces some of the
> existing/common uses of a*b+c with the new function...
>
> Otherwise, yes, please. We could build a coccinelle rule for
> additional replacements...

A potential semantic patch and the changes it generates are attached
below. Himanshu Jha helped with its development. Working on this
uncovered one bug, where the allocated array is too large, because the
size provided for it was a structure size, but actually only pointers to
that structure were to be stored in it.

Note that the rule changes both kmallocs and kzallocs to kvzalloc_struct.
If this is not wanted, it would be easy to change.

julia

------------

// Last field case

@r@
identifier i,idn;
struct i *buf;
expression e;
position p;
@@

\(kmalloc\|kzalloc\|kvmalloc\|kvzalloc\)
(sizeof(*buf) + sizeof(*buf->idn) * e, <+...GFP_KERNEL...+>)@p

@s@
identifier r.i,r.idn;
type T1;
@@

struct i {
...
(
T1 idn[0];
|
T1 idn[];
)
}

@depends on s@
identifier r.idn;
expression buf, e, flag;
position r.p;
@@

(
- kmalloc(sizeof(*buf) + sizeof(*buf->idn) * e, flag)@p
+ kvzalloc_struct(buf, idn, e, flag)
|
- kzalloc(sizeof(*buf) + sizeof(*buf->idn) * e, flag)@p
+ kvzalloc_struct(buf, idn, e, flag)
|
- kvmalloc(sizeof(*buf) + sizeof(*buf->idn) * e, flag)@p
+ kvzalloc_struct(buf, idn, e, flag)
|
- kvzalloc(sizeof(*buf) + sizeof(*buf->idn) * e, flag)@p
+ kvzalloc_struct(buf, idn, e, flag)
)

// -------------------------------------------------------
// Type case

@r1@
identifier i;
struct i *buf;
type T;
T *exp;
expression e;
position p;
@@

\(kmalloc\|kzalloc\|kvmalloc\|kvzalloc\)
(sizeof(*buf) + \(sizeof(T)\|sizeof(*exp)\) * e, <+...GFP_KERNEL...+>)@p


@s1@
identifier r1.i,fld;
type r1.T;
@@

struct i {
...
(
T fld[0];
|
T fld[];
)
}

@depends on s1@
identifier s1.fld;
type r1.T;
T *exp;
expression buf, e, flag;
position r1.p;
@@

(
- kmalloc(sizeof(*buf) + \(sizeof(T)\|sizeof(*exp)\) * e, flag)@p
+ kvzalloc_struct(buf, fld, e, flag)
|
- kzalloc(sizeof(*buf) + \(sizeof(T)\|sizeof(*exp)\) * e, flag)@p
+ kvzalloc_struct(buf, fld, e, flag)
|
- kvmalloc(sizeof(*buf) + \(sizeof(T)\|sizeof(*exp)\) * e, flag)@p
+ kvzalloc_struct(buf, fld, e, flag)
|
- kvzalloc(sizeof(*buf) + \(sizeof(T)\|sizeof(*exp)\) * e, flag)@p
+ kvzalloc_struct(buf, fld, e, flag)
)

----------------

diff -u -p a/drivers/irqchip/irq-bcm6345-l1.c b/drivers/irqchip/irq-bcm6345-l1.c
--- a/drivers/irqchip/irq-bcm6345-l1.c
+++ b/drivers/irqchip/irq-bcm6345-l1.c
@@ -255,8 +255,8 @@ static int __init bcm6345_l1_init_one(st
else if (intc->n_words != n_words)
return -EINVAL;

- cpu = intc->cpus[idx] = kzalloc(sizeof(*cpu) + n_words * sizeof(u32),
- GFP_KERNEL);
+ cpu = intc->cpus[idx] = kvzalloc_struct(cpu, enable_cache, n_words,
+ GFP_KERNEL);
if (!cpu)
return -ENOMEM;

diff -u -p a/drivers/irqchip/irq-bcm7038-l1.c b/drivers/irqchip/irq-bcm7038-l1.c
--- a/drivers/irqchip/irq-bcm7038-l1.c
+++ b/drivers/irqchip/irq-bcm7038-l1.c
@@ -263,8 +263,8 @@ static int __init bcm7038_l1_init_one(st
else if (intc->n_words != n_words)
return -EINVAL;

- cpu = intc->cpus[idx] = kzalloc(sizeof(*cpu) + n_words * sizeof(u32),
- GFP_KERNEL);
+ cpu = intc->cpus[idx] = kvzalloc_struct(cpu, mask_cache, n_words,
+ GFP_KERNEL);
if (!cpu)
return -ENOMEM;

diff -u -p a/drivers/clk/clk-efm32gg.c b/drivers/clk/clk-efm32gg.c
--- a/drivers/clk/clk-efm32gg.c
+++ b/drivers/clk/clk-efm32gg.c
@@ -25,8 +25,7 @@ static void __init efm32gg_cmu_init(stru
void __iomem *base;
struct clk_hw **hws;

- clk_data = kzalloc(sizeof(*clk_data) +
- sizeof(*clk_data->hws) * CMU_MAX_CLKS, GFP_KERNEL);
+ clk_data = kvzalloc_struct(clk_data, hws, CMU_MAX_CLKS, GFP_KERNEL);

if (!clk_data)
return;
diff -u -p a/drivers/clk/clk-gemini.c b/drivers/clk/clk-gemini.c
--- a/drivers/clk/clk-gemini.c
+++ b/drivers/clk/clk-gemini.c
@@ -399,9 +399,8 @@ static void __init gemini_cc_init(struct
int ret;
int i;

- gemini_clk_data = kzalloc(sizeof(*gemini_clk_data) +
- sizeof(*gemini_clk_data->hws) * GEMINI_NUM_CLKS,
- GFP_KERNEL);
+ gemini_clk_data = kvzalloc_struct(gemini_clk_data, hws,
+ GEMINI_NUM_CLKS, GFP_KERNEL);
if (!gemini_clk_data)
return;

diff -u -p a/drivers/clk/clk-stm32h7.c b/drivers/clk/clk-stm32h7.c
--- a/drivers/clk/clk-stm32h7.c
+++ b/drivers/clk/clk-stm32h7.c
@@ -1201,9 +1201,8 @@ static void __init stm32h7_rcc_init(stru
const char *hse_clk, *lse_clk, *i2s_clk;
struct regmap *pdrm;

- clk_data = kzalloc(sizeof(*clk_data) +
- sizeof(*clk_data->hws) * STM32H7_MAX_CLKS,
- GFP_KERNEL);
+ clk_data = kvzalloc_struct(clk_data, hws, STM32H7_MAX_CLKS,
+ GFP_KERNEL);
if (!clk_data)
return;

diff -u -p a/drivers/clk/bcm/clk-iproc-asiu.c b/drivers/clk/bcm/clk-iproc-asiu.c
--- a/drivers/clk/bcm/clk-iproc-asiu.c
+++ b/drivers/clk/bcm/clk-iproc-asiu.c
@@ -197,8 +197,8 @@ void __init iproc_asiu_setup(struct devi
if (WARN_ON(!asiu))
return;

- asiu->clk_data = kzalloc(sizeof(*asiu->clk_data->hws) * num_clks +
- sizeof(*asiu->clk_data), GFP_KERNEL);
+ asiu->clk_data = kvzalloc_struct(asiu->clk_data, hws, num_clks,
+ GFP_KERNEL);
if (WARN_ON(!asiu->clk_data))
goto err_clks;
asiu->clk_data->num = num_clks;
diff -u -p a/drivers/clk/bcm/clk-iproc-pll.c b/drivers/clk/bcm/clk-iproc-pll.c
--- a/drivers/clk/bcm/clk-iproc-pll.c
+++ b/drivers/clk/bcm/clk-iproc-pll.c
@@ -744,8 +744,7 @@ void iproc_pll_clk_setup(struct device_n
if (WARN_ON(!pll))
return;

- clk_data = kzalloc(sizeof(*clk_data->hws) * num_clks +
- sizeof(*clk_data), GFP_KERNEL);
+ clk_data = kvzalloc_struct(clk_data, hws, num_clks, GFP_KERNEL);
if (WARN_ON(!clk_data))
goto err_clk_data;
clk_data->num = num_clks;
diff -u -p a/drivers/clk/clk-asm9260.c b/drivers/clk/clk-asm9260.c
--- a/drivers/clk/clk-asm9260.c
+++ b/drivers/clk/clk-asm9260.c
@@ -273,8 +273,7 @@ static void __init asm9260_acc_init(stru
int n;
u32 accuracy = 0;

- clk_data = kzalloc(sizeof(*clk_data) +
- sizeof(*clk_data->hws) * MAX_CLKS, GFP_KERNEL);
+ clk_data = kvzalloc_struct(clk_data, hws, MAX_CLKS, GFP_KERNEL);
if (!clk_data)
return;
clk_data->num = MAX_CLKS;
diff -u -p a/drivers/clk/clk-aspeed.c b/drivers/clk/clk-aspeed.c
--- a/drivers/clk/clk-aspeed.c
+++ b/drivers/clk/clk-aspeed.c
@@ -621,9 +621,8 @@ static void __init aspeed_cc_init(struct
if (!scu_base)
return;

- aspeed_clk_data = kzalloc(sizeof(*aspeed_clk_data) +
- sizeof(*aspeed_clk_data->hws) * ASPEED_NUM_CLKS,
- GFP_KERNEL);
+ aspeed_clk_data = kvzalloc_struct(aspeed_clk_data, hws,
+ ASPEED_NUM_CLKS, GFP_KERNEL);
if (!aspeed_clk_data)
return;

diff -u -p a/drivers/clk/berlin/bg2.c b/drivers/clk/berlin/bg2.c
--- a/drivers/clk/berlin/bg2.c
+++ b/drivers/clk/berlin/bg2.c
@@ -509,8 +509,7 @@ static void __init berlin2_clock_setup(s
u8 avpll_flags = 0;
int n, ret;

- clk_data = kzalloc(sizeof(*clk_data) +
- sizeof(*clk_data->hws) * MAX_CLKS, GFP_KERNEL);
+ clk_data = kvzalloc_struct(clk_data, hws, MAX_CLKS, GFP_KERNEL);
if (!clk_data)
return;
clk_data->num = MAX_CLKS;
diff -u -p a/drivers/clk/berlin/bg2q.c b/drivers/clk/berlin/bg2q.c
--- a/drivers/clk/berlin/bg2q.c
+++ b/drivers/clk/berlin/bg2q.c
@@ -295,8 +295,7 @@ static void __init berlin2q_clock_setup(
struct clk_hw **hws;
int n, ret;

- clk_data = kzalloc(sizeof(*clk_data) +
- sizeof(*clk_data->hws) * MAX_CLKS, GFP_KERNEL);
+ clk_data = kvzalloc_struct(clk_data, hws, MAX_CLKS, GFP_KERNEL);
if (!clk_data)
return;
clk_data->num = MAX_CLKS;
diff -u -p a/drivers/input/input-leds.c b/drivers/input/input-leds.c
--- a/drivers/input/input-leds.c
+++ b/drivers/input/input-leds.c
@@ -97,8 +97,7 @@ static int input_leds_connect(struct inp
if (!num_leds)
return -ENXIO;

- leds = kzalloc(sizeof(*leds) + num_leds * sizeof(*leds->leds),
- GFP_KERNEL);
+ leds = kvzalloc_struct(leds, leds, num_leds, GFP_KERNEL);
if (!leds)
return -ENOMEM;

diff -u -p a/drivers/input/input-mt.c b/drivers/input/input-mt.c
--- a/drivers/input/input-mt.c
+++ b/drivers/input/input-mt.c
@@ -49,7 +49,7 @@ int input_mt_init_slots(struct input_dev
if (mt)
return mt->num_slots != num_slots ? -EINVAL : 0;

- mt = kzalloc(sizeof(*mt) + num_slots * sizeof(*mt->slots), GFP_KERNEL);
+ mt = kvzalloc_struct(mt, slots, num_slots, GFP_KERNEL);
if (!mt)
goto err_mem;

diff -u -p a/drivers/dax/device.c b/drivers/dax/device.c
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -586,7 +586,7 @@ struct dev_dax *devm_create_dev_dax(stru
if (!count)
return ERR_PTR(-EINVAL);

- dev_dax = kzalloc(sizeof(*dev_dax) + sizeof(*res) * count, GFP_KERNEL);
+ dev_dax = kvzalloc_struct(dev_dax, res, count, GFP_KERNEL);
if (!dev_dax)
return ERR_PTR(-ENOMEM);

diff -u -p a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c
--- a/drivers/usb/gadget/function/f_midi.c
+++ b/drivers/usb/gadget/function/f_midi.c
@@ -1286,9 +1286,8 @@ static struct usb_function *f_midi_alloc
}

/* allocate and initialize one new instance */
- midi = kzalloc(
- sizeof(*midi) + opts->in_ports * sizeof(*midi->in_ports_array),
- GFP_KERNEL);
+ midi = kvzalloc_struct(midi, in_ports_array, opts->in_ports,
+ GFP_KERNEL);
if (!midi) {
status = -ENOMEM;
goto setup_fail;
diff -u -p a/drivers/usb/atm/usbatm.c b/drivers/usb/atm/usbatm.c
--- a/drivers/usb/atm/usbatm.c
+++ b/drivers/usb/atm/usbatm.c
@@ -1015,7 +1015,8 @@ int usbatm_usb_probe(struct usb_interfac
unsigned int maxpacket, num_packets;

/* instance init */
- instance = kzalloc(sizeof(*instance) + sizeof(struct urb *) * (num_rcv_urbs + num_snd_urbs), GFP_KERNEL);
+ instance = kvzalloc_struct(instance, urbs,
+ (num_rcv_urbs + num_snd_urbs), GFP_KERNEL);
if (!instance)
return -ENOMEM;

diff -u -p a/drivers/hwspinlock/omap_hwspinlock.c b/drivers/hwspinlock/omap_hwspinlock.c
--- a/drivers/hwspinlock/omap_hwspinlock.c
+++ b/drivers/hwspinlock/omap_hwspinlock.c
@@ -132,7 +132,7 @@ static int omap_hwspinlock_probe(struct

num_locks = i * 32; /* actual number of locks in this device */

- bank = kzalloc(sizeof(*bank) + num_locks * sizeof(*hwlock), GFP_KERNEL);
+ bank = kvzalloc_struct(bank, lock, num_locks, GFP_KERNEL);
if (!bank) {
ret = -ENOMEM;
goto iounmap_base;
diff -u -p a/drivers/hwspinlock/u8500_hsem.c b/drivers/hwspinlock/u8500_hsem.c
--- a/drivers/hwspinlock/u8500_hsem.c
+++ b/drivers/hwspinlock/u8500_hsem.c
@@ -119,7 +119,7 @@ static int u8500_hsem_probe(struct platf
/* clear all interrupts */
writel(0xFFFF, io_base + HSEM_ICRALL);

- bank = kzalloc(sizeof(*bank) + num_locks * sizeof(*hwlock), GFP_KERNEL);
+ bank = kvzalloc_struct(bank, lock, num_locks, GFP_KERNEL);
if (!bank) {
ret = -ENOMEM;
goto iounmap_base;
diff -u -p a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -367,9 +367,9 @@ static struct mvs_info *mvs_pci_alloc(st
struct mvs_info *mvi = NULL;
struct sas_ha_struct *sha = SHOST_TO_SAS_HA(shost);

- mvi = kzalloc(sizeof(*mvi) +
- (1L << mvs_chips[ent->driver_data].slot_width) *
- sizeof(struct mvs_slot_info), GFP_KERNEL);
+ mvi = kvzalloc_struct(mvi, slot_info,
+ (1L << mvs_chips[ent->driver_data].slot_width),
+ GFP_KERNEL);
if (!mvi)
return NULL;

diff -u -p a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -2633,7 +2633,7 @@ static int crypt_ctr(struct dm_target *t
return -EINVAL;
}

- cc = kzalloc(sizeof(*cc) + key_size * sizeof(u8), GFP_KERNEL);
+ cc = kvzalloc_struct(cc, key, key_size, GFP_KERNEL);
if (!cc) {
ti->error = "Cannot allocate encryption context";
return -ENOMEM;
diff -u -p a/drivers/md/md-linear.c b/drivers/md/md-linear.c
--- a/drivers/md/md-linear.c
+++ b/drivers/md/md-linear.c
@@ -96,8 +96,7 @@ static struct linear_conf *linear_conf(s
int i, cnt;
bool discard_supported = false;

- conf = kzalloc (sizeof (*conf) + raid_disks*sizeof(struct dev_info),
- GFP_KERNEL);
+ conf = kvzalloc_struct(conf, disks, raid_disks, GFP_KERNEL);
if (!conf)
return NULL;

diff -u -p a/drivers/gpu/drm/nouveau/nvkm/engine/pm/base.c b/drivers/gpu/drm/nouveau/nvkm/engine/pm/base.c
--- a/drivers/gpu/drm/nouveau/nvkm/engine/pm/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/pm/base.c
@@ -779,8 +779,8 @@ nvkm_perfdom_new(struct nvkm_pm *pm, con

sdom = spec;
while (sdom->signal_nr) {
- dom = kzalloc(sizeof(*dom) + sdom->signal_nr *
- sizeof(*dom->signal), GFP_KERNEL);
+ dom = kvzalloc_struct(dom, signal, sdom->signal_nr,
+ GFP_KERNEL);
if (!dom)
return -ENOMEM;

diff -u -p a/drivers/gpu/drm/i915/selftests/mock_dmabuf.c b/drivers/gpu/drm/i915/selftests/mock_dmabuf.c
--- a/drivers/gpu/drm/i915/selftests/mock_dmabuf.c
+++ b/drivers/gpu/drm/i915/selftests/mock_dmabuf.c
@@ -145,8 +145,7 @@ static struct dma_buf *mock_dmabuf(int n
struct dma_buf *dmabuf;
int i;

- mock = kmalloc(sizeof(*mock) + npages * sizeof(struct page *),
- GFP_KERNEL);
+ mock = kvzalloc_struct(mock, pages, npages, GFP_KERNEL);
if (!mock)
return ERR_PTR(-ENOMEM);

diff -u -p a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -433,8 +433,7 @@ static struct port_buffer *alloc_buf(str
* Allocate buffer and the sg list. The sg list array is allocated
* directly after the port_buffer struct.
*/
- buf = kmalloc(sizeof(*buf) + sizeof(struct scatterlist) * pages,
- GFP_KERNEL);
+ buf = kvzalloc_struct(buf, sg, pages, GFP_KERNEL);
if (!buf)
goto fail;

diff -u -p a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -968,8 +968,7 @@ struct virtqueue *__vring_new_virtqueue(
unsigned int i;
struct vring_virtqueue *vq;

- vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
- GFP_KERNEL);
+ vq = kvzalloc_struct(vq, desc_state, vring.num, GFP_KERNEL);
if (!vq)
return NULL;

diff -u -p a/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.c
--- a/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.c
@@ -851,9 +851,7 @@ iwl_parse_eeprom_data(struct device *dev
if (WARN_ON(!cfg || !cfg->eeprom_params))
return NULL;

- data = kzalloc(sizeof(*data) +
- sizeof(struct ieee80211_channel) * IWL_NUM_CHANNELS,
- GFP_KERNEL);
+ data = kvzalloc_struct(data, channels, IWL_NUM_CHANNELS, GFP_KERNEL);
if (!data)
return NULL;

diff -u -p a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -421,8 +421,7 @@ struct nfp_reprs *nfp_reprs_alloc(unsign
{
struct nfp_reprs *reprs;

- reprs = kzalloc(sizeof(*reprs) +
- num_reprs * sizeof(struct net_device *), GFP_KERNEL);
+ reprs = kvzalloc_struct(reprs, reprs, num_reprs, GFP_KERNEL);
if (!reprs)
return NULL;
reprs->num_reprs = num_reprs;
diff -u -p a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -505,8 +505,7 @@ static inline struct rdma_hw_stats *rdma
{
struct rdma_hw_stats *stats;

- stats = kzalloc(sizeof(*stats) + num_counters * sizeof(u64),
- GFP_KERNEL);
+ stats = kvzalloc_struct(stats, value, num_counters, GFP_KERNEL);
if (!stats)
return NULL;
stats->names = names;
diff -u -p a/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c b/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c
--- a/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c
@@ -289,8 +289,7 @@ struct clip_tbl *t4_init_clip_tbl(unsign
if (clipt_size < CLIPT_MIN_HASH_BUCKETS)
return NULL;

- ctbl = kvzalloc(sizeof(*ctbl) +
- clipt_size*sizeof(struct list_head), GFP_KERNEL);
+ ctbl = kvzalloc_struct(ctbl, hash_list, clipt_size, GFP_KERNEL);
if (!ctbl)
return NULL;

diff -u -p a/drivers/net/ethernet/chelsio/cxgb4/l2t.c b/drivers/net/ethernet/chelsio/cxgb4/l2t.c
--- a/drivers/net/ethernet/chelsio/cxgb4/l2t.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/l2t.c
@@ -646,7 +646,7 @@ struct l2t_data *t4_init_l2t(unsigned in
if (l2t_size < L2T_MIN_HASH_BUCKETS)
return NULL;

- d = kvzalloc(sizeof(*d) + l2t_size * sizeof(struct l2t_entry), GFP_KERNEL);
+ d = kvzalloc_struct(d, l2tab, l2t_size, GFP_KERNEL);
if (!d)
return NULL;

diff -u -p a/drivers/net/ethernet/chelsio/cxgb4/sched.c b/drivers/net/ethernet/chelsio/cxgb4/sched.c
--- a/drivers/net/ethernet/chelsio/cxgb4/sched.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sched.c
@@ -512,7 +512,7 @@ struct sched_table *t4_init_sched(unsign
struct sched_table *s;
unsigned int i;

- s = kvzalloc(sizeof(*s) + sched_size * sizeof(struct sched_class), GFP_KERNEL);
+ s = kvzalloc_struct(s, tab, sched_size, GFP_KERNEL);
if (!s)
return NULL;

diff -u -p a/drivers/net/ethernet/chelsio/cxgb4/smt.c b/drivers/net/ethernet/chelsio/cxgb4/smt.c
--- a/drivers/net/ethernet/chelsio/cxgb4/smt.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/smt.c
@@ -47,8 +47,7 @@ struct smt_data *t4_init_smt(void)

smt_size = SMT_SIZE;

- s = kvzalloc(sizeof(*s) + smt_size * sizeof(struct smt_entry),
- GFP_KERNEL);
+ s = kvzalloc_struct(s, smtab, smt_size, GFP_KERNEL);
if (!s)
return NULL;
s->smt_size = smt_size;
diff -u -p a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -505,8 +505,7 @@ static inline struct rdma_hw_stats *rdma
{
struct rdma_hw_stats *stats;

- stats = kzalloc(sizeof(*stats) + num_counters * sizeof(u64),
- GFP_KERNEL);
+ stats = kvzalloc_struct(stats, value, num_counters, GFP_KERNEL);
if (!stats)
return NULL;
stats->names = names;
diff -u -p a/drivers/misc/vexpress-syscfg.c b/drivers/misc/vexpress-syscfg.c
--- a/drivers/misc/vexpress-syscfg.c
+++ b/drivers/misc/vexpress-syscfg.c
@@ -182,8 +182,7 @@ static struct regmap *vexpress_syscfg_re
val = energy_quirk;
}

- func = kzalloc(sizeof(*func) + sizeof(*func->template) * num,
- GFP_KERNEL);
+ func = kvzalloc_struct(func, template, num, GFP_KERNEL);
if (!func)
return ERR_PTR(-ENOMEM);

diff -u -p a/drivers/cpufreq/e_powersaver.c b/drivers/cpufreq/e_powersaver.c
--- a/drivers/cpufreq/e_powersaver.c
+++ b/drivers/cpufreq/e_powersaver.c
@@ -324,9 +324,8 @@ static int eps_cpu_init(struct cpufreq_p
states = 2;

/* Allocate private data and frequency table for current cpu */
- centaur = kzalloc(sizeof(*centaur)
- + (states + 1) * sizeof(struct cpufreq_frequency_table),
- GFP_KERNEL);
+ centaur = kvzalloc_struct(centaur, freq_table, (states + 1),
+ GFP_KERNEL);
if (!centaur)
return -ENOMEM;
eps_cpu[0] = centaur;
diff -u -p a/drivers/media/v4l2-core/v4l2-event.c b/drivers/media/v4l2-core/v4l2-event.c
--- a/drivers/media/v4l2-core/v4l2-event.c
+++ b/drivers/media/v4l2-core/v4l2-event.c
@@ -215,8 +215,7 @@ int v4l2_event_subscribe(struct v4l2_fh
if (elems < 1)
elems = 1;

- sev = kvzalloc(sizeof(*sev) + sizeof(struct v4l2_kevent) * elems,
- GFP_KERNEL);
+ sev = kvzalloc_struct(sev, events, elems, GFP_KERNEL);
if (!sev)
return -ENOMEM;
for (i = 0; i < elems; i++)
diff -u -p a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -2380,9 +2380,7 @@ static void ib_sa_add_one(struct ib_devi
s = rdma_start_port(device);
e = rdma_end_port(device);

- sa_dev = kzalloc(sizeof *sa_dev +
- (e - s + 1) * sizeof (struct ib_sa_port),
- GFP_KERNEL);
+ sa_dev = kvzalloc_struct(sa_dev, port, (e - s + 1), GFP_KERNEL);
if (!sa_dev)
return;

diff -u -p a/drivers/infiniband/core/uverbs_ioctl_merge.c b/drivers/infiniband/core/uverbs_ioctl_merge.c
--- a/drivers/infiniband/core/uverbs_ioctl_merge.c
+++ b/drivers/infiniband/core/uverbs_ioctl_merge.c
@@ -297,9 +297,8 @@ static struct uverbs_method_spec *build_
if (max_attr_buckets >= 0)
num_attr_buckets = max_attr_buckets + 1;

- method = kzalloc(sizeof(*method) +
- num_attr_buckets * sizeof(*method->attr_buckets),
- GFP_KERNEL);
+ method = kvzalloc_struct(method, attr_buckets, num_attr_buckets,
+ GFP_KERNEL);
if (!method)
return ERR_PTR(-ENOMEM);

@@ -446,9 +445,8 @@ static struct uverbs_object_spec *build_
if (max_method_buckets >= 0)
num_method_buckets = max_method_buckets + 1;

- object = kzalloc(sizeof(*object) +
- num_method_buckets *
- sizeof(*object->method_buckets), GFP_KERNEL);
+ object = kvzalloc_struct(object, method_buckets, num_method_buckets,
+ GFP_KERNEL);
if (!object)
return ERR_PTR(-ENOMEM);

@@ -469,9 +467,8 @@ static struct uverbs_object_spec *build_
if (methods_max_bucket < 0)
continue;

- hash = kzalloc(sizeof(*hash) +
- sizeof(*hash->methods) * (methods_max_bucket + 1),
- GFP_KERNEL);
+ hash = kvzalloc_struct(hash, methods,
+ (methods_max_bucket + 1), GFP_KERNEL);
if (!hash) {
res = -ENOMEM;
goto free;
@@ -579,9 +576,8 @@ struct uverbs_root_spec *uverbs_alloc_sp
if (max_object_buckets >= 0)
num_objects_buckets = max_object_buckets + 1;

- root_spec = kzalloc(sizeof(*root_spec) +
- num_objects_buckets * sizeof(*root_spec->object_buckets),
- GFP_KERNEL);
+ root_spec = kvzalloc_struct(root_spec, object_buckets,
+ num_objects_buckets, GFP_KERNEL);
if (!root_spec)
return ERR_PTR(-ENOMEM);
root_spec->num_buckets = num_objects_buckets;
@@ -603,9 +599,8 @@ struct uverbs_root_spec *uverbs_alloc_sp
if (objects_max_bucket < 0)
continue;

- hash = kzalloc(sizeof(*hash) +
- sizeof(*hash->objects) * (objects_max_bucket + 1),
- GFP_KERNEL);
+ hash = kvzalloc_struct(hash, objects,
+ (objects_max_bucket + 1), GFP_KERNEL);
if (!hash) {
res = -ENOMEM;
goto free;
diff -u -p a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -815,8 +815,7 @@ static void mcast_add_one(struct ib_devi
int i;
int count = 0;

- dev = kmalloc(sizeof *dev + device->phys_port_cnt * sizeof *port,
- GFP_KERNEL);
+ dev = kvzalloc_struct(dev, port, device->phys_port_cnt, GFP_KERNEL);
if (!dev)
return;

diff -u -p a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -505,8 +505,7 @@ static inline struct rdma_hw_stats *rdma
{
struct rdma_hw_stats *stats;

- stats = kzalloc(sizeof(*stats) + num_counters * sizeof(u64),
- GFP_KERNEL);
+ stats = kvzalloc_struct(stats, value, num_counters, GFP_KERNEL);
if (!stats)
return NULL;
stats->names = names;
diff -u -p a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -3951,8 +3951,7 @@ static void cm_recv_handler(struct ib_ma
atomic_long_inc(&port->counter_group[CM_RECV].
counter[attr_id - CM_ATTR_ID_OFFSET]);

- work = kmalloc(sizeof(*work) + sizeof(struct sa_path_rec) * paths,
- GFP_KERNEL);
+ work = kvzalloc_struct(work, path, paths, GFP_KERNEL);
if (!work) {
ib_free_recv_mad(mad_recv_wc);
return;
diff -u -p a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -1272,9 +1272,7 @@ static void ib_umad_add_one(struct ib_de
s = rdma_start_port(device);
e = rdma_end_port(device);

- umad_dev = kzalloc(sizeof *umad_dev +
- (e - s + 1) * sizeof (struct ib_umad_port),
- GFP_KERNEL);
+ umad_dev = kvzalloc_struct(umad_dev, port, (e - s + 1), GFP_KERNEL);
if (!umad_dev)
return;

diff -u -p a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -1065,16 +1065,16 @@ static void ib_cache_update(struct ib_de
goto err;
}

- pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
- sizeof *pkey_cache->table, GFP_KERNEL);
+ pkey_cache = kvzalloc_struct(pkey_cache, table, tprops->pkey_tbl_len,
+ GFP_KERNEL);
if (!pkey_cache)
goto err;

pkey_cache->table_len = tprops->pkey_tbl_len;

if (!use_roce_gid_table) {
- gid_cache = kmalloc(sizeof(*gid_cache) + tprops->gid_tbl_len *
- sizeof(*gid_cache->table), GFP_KERNEL);
+ gid_cache = kvzalloc_struct(gid_cache, table,
+ tprops->gid_tbl_len, GFP_KERNEL);
if (!gid_cache)
goto err;

diff -u -p a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -155,10 +155,9 @@ static int usnic_uiom_get_pages(unsigned
off = 0;

while (ret) {
- chunk = kmalloc(sizeof(*chunk) +
- sizeof(struct scatterlist) *
- min_t(int, ret, USNIC_UIOM_PAGE_CHUNK),
- GFP_KERNEL);
+ chunk = kvzalloc_struct(chunk, page_list,
+ min_t(int, ret, USNIC_UIOM_PAGE_CHUNK),
+ GFP_KERNEL);
if (!chunk) {
ret = -ENOMEM;
goto out;
diff -u -p a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -367,7 +367,7 @@ struct mthca_icm_table *mthca_alloc_icm_
obj_per_chunk = MTHCA_TABLE_CHUNK_SIZE / obj_size;
num_icm = DIV_ROUND_UP(nobj, obj_per_chunk);

- table = kmalloc(sizeof *table + num_icm * sizeof *table->icm, GFP_KERNEL);
+ table = kvzalloc_struct(table, icm, num_icm, GFP_KERNEL);
if (!table)
return NULL;

@@ -529,7 +529,7 @@ struct mthca_user_db_table *mthca_init_u
return NULL;

npages = dev->uar_table.uarc_size / MTHCA_ICM_PAGE_SIZE;
- db_tab = kmalloc(sizeof *db_tab + npages * sizeof *db_tab->page, GFP_KERNEL);
+ db_tab = kvzalloc_struct(db_tab, page, npages, GFP_KERNEL);
if (!db_tab)
return ERR_PTR(-ENOMEM);

diff -u -p a/fs/aio.c b/fs/aio.c
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -669,8 +669,7 @@ static int ioctx_add_table(struct kioctx
new_nr = (table ? table->nr : 1) * 4;
spin_unlock(&mm->ioctx_lock);

- table = kzalloc(sizeof(*table) + sizeof(struct kioctx *) *
- new_nr, GFP_KERNEL);
+ table = kvzalloc_struct(table, table, new_nr, GFP_KERNEL);
if (!table)
return -ENOMEM;

diff -u -p a/mm/memcontrol.c b/mm/memcontrol.c
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3558,8 +3558,7 @@ static int __mem_cgroup_usage_register_e
size = thresholds->primary ? thresholds->primary->size + 1 : 1;

/* Allocate memory for new array of thresholds */
- new = kmalloc(sizeof(*new) + size * sizeof(struct mem_cgroup_threshold),
- GFP_KERNEL);
+ new = kvzalloc_struct(new, entries, size, GFP_KERNEL);
if (!new) {
ret = -ENOMEM;
goto unlock;
diff -u -p a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -505,8 +505,7 @@ static inline struct rdma_hw_stats *rdma
{
struct rdma_hw_stats *stats;

- stats = kzalloc(sizeof(*stats) + num_counters * sizeof(u64),
- GFP_KERNEL);
+ stats = kvzalloc_struct(stats, value, num_counters, GFP_KERNEL);
if (!stats)
return NULL;
stats->names = names;
diff -u -p a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -850,9 +850,7 @@ static struct rpcrdma_sendctx *rpcrdma_s
{
struct rpcrdma_sendctx *sc;

- sc = kzalloc(sizeof(*sc) +
- ia->ri_max_send_sges * sizeof(struct ib_sge),
- GFP_KERNEL);
+ sc = kvzalloc_struct(sc, sc_sges, ia->ri_max_send_sges, GFP_KERNEL);
if (!sc)
return NULL;

diff -u -p a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -61,9 +61,8 @@ svc_rdma_get_rw_ctxt(struct svcxprt_rdma
spin_unlock(&rdma->sc_rw_ctxt_lock);
} else {
spin_unlock(&rdma->sc_rw_ctxt_lock);
- ctxt = kmalloc(sizeof(*ctxt) +
- SG_CHUNK_SIZE * sizeof(struct scatterlist),
- GFP_KERNEL);
+ ctxt = kvzalloc_struct(ctxt, rw_first_sgl, SG_CHUNK_SIZE,
+ GFP_KERNEL);
if (!ctxt)
goto out;
INIT_LIST_HEAD(&ctxt->rw_list);
diff -u -p a/net/openvswitch/meter.c b/net/openvswitch/meter.c
--- a/net/openvswitch/meter.c
+++ b/net/openvswitch/meter.c
@@ -206,8 +206,7 @@ static struct dp_meter *dp_meter_create(
return ERR_PTR(-EINVAL);

/* Allocate and set up the meter before locking anything. */
- meter = kzalloc(n_bands * sizeof(struct dp_meter_band) +
- sizeof(*meter), GFP_KERNEL);
+ meter = kvzalloc_struct(meter, bands, n_bands, GFP_KERNEL);
if (!meter)
return ERR_PTR(-ENOMEM);

diff -u -p a/sound/usb/usx2y/usbusx2yaudio.c b/sound/usb/usx2y/usbusx2yaudio.c
--- a/sound/usb/usx2y/usbusx2yaudio.c
+++ b/sound/usb/usx2y/usbusx2yaudio.c
@@ -657,7 +657,7 @@ static int usX2Y_rate_set(struct usX2Yde
struct s_c2 *ra = rate == 48000 ? SetRate48000 : SetRate44100;

if (usX2Y->rate != rate) {
- us = kzalloc(sizeof(*us) + sizeof(struct urb*) * NOOF_SETRATE_URBS, GFP_KERNEL);
+ us = kvzalloc_struct(us, urb, NOOF_SETRATE_URBS, GFP_KERNEL);
if (NULL == us) {
err = -ENOMEM;
goto cleanup;
diff -u -p a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c
--- a/sound/pci/hda/hda_codec.c
+++ b/sound/pci/hda/hda_codec.c
@@ -128,7 +128,7 @@ static int add_conn_list(struct hda_code
{
struct hda_conn_list *p;

- p = kmalloc(sizeof(*p) + len * sizeof(hda_nid_t), GFP_KERNEL);
+ p = kvzalloc_struct(p, conns, len, GFP_KERNEL);
if (!p)
return -ENOMEM;
p->len = len;

2018-03-08 03:00:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Wed, Mar 07, 2018 at 10:18:21PM +0100, Julia Lawall wrote:
> > Otherwise, yes, please. We could build a coccinelle rule for
> > additional replacements...
>
> A potential semantic patch and the changes it generates are attached
> below. Himanshu Jha helped with its development. Working on this
> uncovered one bug, where the allocated array is too large, because the
> size provided for it was a structure size, but actually only pointers to
> that structure were to be stored in it.

This is cool! Thanks for doing the coccinelle patch! Diffstat:

50 files changed, 81 insertions(+), 124 deletions(-)

I find that pretty compelling. I'll repost the kvmalloc_struct patch
imminently.

2018-03-08 06:27:03

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct



On Wed, 7 Mar 2018, Matthew Wilcox wrote:

> On Wed, Mar 07, 2018 at 10:18:21PM +0100, Julia Lawall wrote:
> > > Otherwise, yes, please. We could build a coccinelle rule for
> > > additional replacements...
> >
> > A potential semantic patch and the changes it generates are attached
> > below. Himanshu Jha helped with its development. Working on this
> > uncovered one bug, where the allocated array is too large, because the
> > size provided for it was a structure size, but actually only pointers to
> > that structure were to be stored in it.
>
> This is cool! Thanks for doing the coccinelle patch! Diffstat:
>
> 50 files changed, 81 insertions(+), 124 deletions(-)
>
> I find that pretty compelling. I'll repost the kvmalloc_struct patch
> imminently.

Thanks. So it's OK to replace kmalloc and kzalloc, even though they
didn't previously consider vmalloc and even though kmalloc doesn't zero?

There are a few other cases that use GFP_NOFS and GFP_NOWAIT, but I didn't
transform those because the comment says that the flags should be
GFP_KERNEL based. Should those be transformed too?

julia

2018-03-08 23:06:26

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Thu, Mar 08, 2018 at 07:24:47AM +0100, Julia Lawall wrote:
> On Wed, 7 Mar 2018, Matthew Wilcox wrote:
> > On Wed, Mar 07, 2018 at 10:18:21PM +0100, Julia Lawall wrote:
> > > > Otherwise, yes, please. We could build a coccinelle rule for
> > > > additional replacements...
> > >
> > > A potential semantic patch and the changes it generates are attached
> > > below. Himanshu Jha helped with its development. Working on this
> > > uncovered one bug, where the allocated array is too large, because the
> > > size provided for it was a structure size, but actually only pointers to
> > > that structure were to be stored in it.
> >
> > This is cool! Thanks for doing the coccinelle patch! Diffstat:
> >
> > 50 files changed, 81 insertions(+), 124 deletions(-)
> >
> > I find that pretty compelling. I'll repost the kvmalloc_struct patch
> > imminently.
>
> Thanks. So it's OK to replace kmalloc and kzalloc, even though they
> didn't previously consider vmalloc and even though kmalloc doesn't zero?

We'll also need to replace the corresponding places where those structs
are freed with kvfree(). Can coccinelle handle that too?

> There are a few other cases that use GFP_NOFS and GFP_NOWAIT, but I didn't
> transform those because the comment says that the flags should be
> GFP_KERNEL based. Should those be transformed too?

The problem with non-GFP_KERNEL allocations is that vmalloc may have to
allocate page tables, which is always done with an implicit GFP_KERNEL
allocation. There's an intent to get rid of GFP_NOFS, but that's not
been realised yet (and I'm not sure of our strategy to eliminate it ...
I'll send a separate email about that). I'm not sure why anything's
trying to allocate with GFP_NOWAIT; can you send a list of those places?

2018-03-09 06:00:51

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct



On Thu, 8 Mar 2018, Matthew Wilcox wrote:

> On Thu, Mar 08, 2018 at 07:24:47AM +0100, Julia Lawall wrote:
> > On Wed, 7 Mar 2018, Matthew Wilcox wrote:
> > > On Wed, Mar 07, 2018 at 10:18:21PM +0100, Julia Lawall wrote:
> > > > > Otherwise, yes, please. We could build a coccinelle rule for
> > > > > additional replacements...
> > > >
> > > > A potential semantic patch and the changes it generates are attached
> > > > below. Himanshu Jha helped with its development. Working on this
> > > > uncovered one bug, where the allocated array is too large, because the
> > > > size provided for it was a structure size, but actually only pointers to
> > > > that structure were to be stored in it.
> > >
> > > This is cool! Thanks for doing the coccinelle patch! Diffstat:
> > >
> > > 50 files changed, 81 insertions(+), 124 deletions(-)
> > >
> > > I find that pretty compelling. I'll repost the kvmalloc_struct patch
> > > imminently.
> >
> > Thanks. So it's OK to replace kmalloc and kzalloc, even though they
> > didn't previously consider vmalloc and even though kmalloc doesn't zero?
>
> We'll also need to replace the corresponding places where those structs
> are freed with kvfree(). Can coccinelle handle that too?

This would be harder to do 100% reliably. Coccinelle would have to rely
on the structure name or the structure type, if the free is in a different
function. But I guess that the type should be mostly reliable, since all
instances of allocations of the same type should be transformed in the
same way.

>
> > There are a few other cases that use GFP_NOFS and GFP_NOWAIT, but I didn't
> > transform those because the comment says that the flags should be
> > GFP_KERNEL based. Should those be transformed too?
>
> The problem with non-GFP_KERNEL allocations is that vmalloc may have to
> allocate page tables, which is always done with an implicit GFP_KERNEL
> allocation. There's an intent to get rid of GFP_NOFS, but that's not
> been realised yet (and I'm not sure of our strategy to eliminate it ...
> I'll send a separate email about that). I'm not sure why anything's
> trying to allocate with GFP_NOWAIT; can you send a list of those places?

drivers/dma/fsl-edma.c:

fsl_desc = kzalloc(sizeof(*fsl_desc) + sizeof(struct fsl_edma_sw_tcd) * sg_len, GFP_NOWAIT);

drivers/dma/st_fdma.c:

fdesc = kzalloc(sizeof(*fdesc) + sizeof(struct st_fdma_sw_node) * sg_len,
GFP_NOWAIT);

drivers/dma/pxa_dma.c:

sw_desc = kzalloc(sizeof(*sw_desc) + nb_hw_desc * sizeof(struct
pxad_desc_hw *), GFP_NOWAIT);

julia

2018-03-13 17:21:36

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct



On Thu, 8 Mar 2018, Matthew Wilcox wrote:

> On Thu, Mar 08, 2018 at 07:24:47AM +0100, Julia Lawall wrote:
> > On Wed, 7 Mar 2018, Matthew Wilcox wrote:
> > > On Wed, Mar 07, 2018 at 10:18:21PM +0100, Julia Lawall wrote:
> > > > > Otherwise, yes, please. We could build a coccinelle rule for
> > > > > additional replacements...
> > > >
> > > > A potential semantic patch and the changes it generates are attached
> > > > below. Himanshu Jha helped with its development. Working on this
> > > > uncovered one bug, where the allocated array is too large, because the
> > > > size provided for it was a structure size, but actually only pointers to
> > > > that structure were to be stored in it.
> > >
> > > This is cool! Thanks for doing the coccinelle patch! Diffstat:
> > >
> > > 50 files changed, 81 insertions(+), 124 deletions(-)
> > >
> > > I find that pretty compelling. I'll repost the kvmalloc_struct patch
> > > imminently.
> >
> > Thanks. So it's OK to replace kmalloc and kzalloc, even though they
> > didn't previously consider vmalloc and even though kmalloc doesn't zero?
>
> We'll also need to replace the corresponding places where those structs
> are freed with kvfree(). Can coccinelle handle that too?

Is the use of vmalloc a necessary part of the design? Or could there be a
non vmalloc versions for call sites that are already ok with that?

julia

> > There are a few other cases that use GFP_NOFS and GFP_NOWAIT, but I didn't
> > transform those because the comment says that the flags should be
> > GFP_KERNEL based. Should those be transformed too?
>
> The problem with non-GFP_KERNEL allocations is that vmalloc may have to
> allocate page tables, which is always done with an implicit GFP_KERNEL
> allocation. There's an intent to get rid of GFP_NOFS, but that's not
> been realised yet (and I'm not sure of our strategy to eliminate it ...
> I'll send a separate email about that). I'm not sure why anything's
> trying to allocate with GFP_NOWAIT; can you send a list of those places?
>

2018-03-13 18:33:47

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Tue, Mar 13, 2018 at 06:19:51PM +0100, Julia Lawall wrote:
> On Thu, 8 Mar 2018, Matthew Wilcox wrote:
> > On Thu, Mar 08, 2018 at 07:24:47AM +0100, Julia Lawall wrote:
> > > Thanks. So it's OK to replace kmalloc and kzalloc, even though they
> > > didn't previously consider vmalloc and even though kmalloc doesn't zero?
> >
> > We'll also need to replace the corresponding places where those structs
> > are freed with kvfree(). Can coccinelle handle that too?
>
> Is the use of vmalloc a necessary part of the design? Or could there be a
> non vmalloc versions for call sites that are already ok with that?

We can also add kmalloc_struct() along with kmalloc_ab_c that won't fall
back to vmalloc but just return NULL.


2018-03-13 18:38:08

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct



On Tue, 13 Mar 2018, Matthew Wilcox wrote:

> On Tue, Mar 13, 2018 at 06:19:51PM +0100, Julia Lawall wrote:
> > On Thu, 8 Mar 2018, Matthew Wilcox wrote:
> > > On Thu, Mar 08, 2018 at 07:24:47AM +0100, Julia Lawall wrote:
> > > > Thanks. So it's OK to replace kmalloc and kzalloc, even though they
> > > > didn't previously consider vmalloc and even though kmalloc doesn't zero?
> > >
> > > We'll also need to replace the corresponding places where those structs
> > > are freed with kvfree(). Can coccinelle handle that too?
> >
> > Is the use of vmalloc a necessary part of the design? Or could there be a
> > non vmalloc versions for call sites that are already ok with that?
>
> We can also add kmalloc_struct() along with kmalloc_ab_c that won't fall
> back to vmalloc but just return NULL.

It could be safer than being sure to find all of the relevant kfrees.

julia

2018-04-29 17:01:29

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Tue, Mar 13, 2018 at 11:32 AM, Matthew Wilcox <[email protected]> wrote:
> On Tue, Mar 13, 2018 at 06:19:51PM +0100, Julia Lawall wrote:
>> On Thu, 8 Mar 2018, Matthew Wilcox wrote:
>> > On Thu, Mar 08, 2018 at 07:24:47AM +0100, Julia Lawall wrote:
>> > > Thanks. So it's OK to replace kmalloc and kzalloc, even though they
>> > > didn't previously consider vmalloc and even though kmalloc doesn't zero?
>> >
>> > We'll also need to replace the corresponding places where those structs
>> > are freed with kvfree(). Can coccinelle handle that too?
>>
>> Is the use of vmalloc a necessary part of the design? Or could there be a
>> non vmalloc versions for call sites that are already ok with that?
>
> We can also add kmalloc_struct() along with kmalloc_ab_c that won't fall
> back to vmalloc but just return NULL.

Did this ever happen? I'd also like to see kmalloc_array_3d() or
something that takes three size arguments. We have a lot of this
pattern too:

kmalloc(sizeof(foo) * A * B, gfp...)

And we could turn that into:

kmalloc_array_3d(sizeof(foo), A, B, gfp...)

-Kees


--
Kees Cook
Pixel Security

2018-04-29 20:31:09

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Sun, Apr 29, 2018 at 09:59:27AM -0700, Kees Cook wrote:
> Did this ever happen?

Not yet. I brought it up at LSFMM, and I'll repost the patches soon.

> I'd also like to see kmalloc_array_3d() or
> something that takes three size arguments. We have a lot of this
> pattern too:
>
> kmalloc(sizeof(foo) * A * B, gfp...)
>
> And we could turn that into:
>
> kmalloc_array_3d(sizeof(foo), A, B, gfp...)

Are either of A or B constant? Because if so, we could just use
kmalloc_array. If not, then kmalloc_array_3d becomes a little more
expensive than kmalloc_array because we have to do a divide at runtime
instead of compile-time. that's still better than allocating too few
bytes, of course.

I'm wondering how far down the abc + ab + ac + bc + d rabbit-hole we're
going to end up going. As far as we have to, I guess.

2018-04-30 19:03:19

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Sun, Apr 29, 2018 at 1:30 PM, Matthew Wilcox <[email protected]> wrote:
> On Sun, Apr 29, 2018 at 09:59:27AM -0700, Kees Cook wrote:
>> Did this ever happen?
>
> Not yet. I brought it up at LSFMM, and I'll repost the patches soon.
>
>> I'd also like to see kmalloc_array_3d() or
>> something that takes three size arguments. We have a lot of this
>> pattern too:
>>
>> kmalloc(sizeof(foo) * A * B, gfp...)
>>
>> And we could turn that into:
>>
>> kmalloc_array_3d(sizeof(foo), A, B, gfp...)
>
> Are either of A or B constant? Because if so, we could just use
> kmalloc_array. If not, then kmalloc_array_3d becomes a little more
> expensive than kmalloc_array because we have to do a divide at runtime
> instead of compile-time. that's still better than allocating too few
> bytes, of course.

Yeah, getting the order of the division is nice. Some thoughts below...

>
> I'm wondering how far down the abc + ab + ac + bc + d rabbit-hole we're
> going to end up going. As far as we have to, I guess.

Well, the common patterns I've seen so far are:

a
ab
abc
a + bc
ab + cd

For any longer multiplications, I've only found[1]:

drivers/staging/rtl8188eu/os_dep/osdep_service.c: void **a =
kzalloc(h * sizeof(void *) + h * w * size, GFP_KERNEL);


At the end of the day, though, I don't really like having all these
different names...

kmalloc(), kmalloc_array(), kmalloc_ab_c(), kmalloc_array_3d()

with their "matching" zeroing function:

kzalloc(), kcalloc(), kzalloc_ab_c(), kmalloc_array_3d(..., gfp | __GFP_ZERO)

For the multiplication cases, I wonder if we could just have:

kmalloc_multN(gfp, a, b, c, ...)
kzalloc_multN(gfp, a, b, c, ...)

and we can replace all kcalloc() users with kzalloc_mult2(), all
kmalloc_array() users with kmalloc_mult2(), the abc uses with
kmalloc_mult3().

That said, I *do* like kmalloc_struct() as it's a very common pattern...

Or maybe, just leave the pattern in the name? kmalloc_ab(),
kmalloc_abc(), kmalloc_ab_c(), kmalloc_ab_cd() ?

Getting the constant ordering right could be part of the macro
definition, maybe? i.e.:

static inline void *kmalloc_ab(size_t a, size_t b, gfp_t flags)
{
if (__builtin_constant_p(a) && a != 0 && \
b > SIZE_MAX / a)
return NULL;
else if (__builtin_constant_p(b) && b != 0 && \
a > SIZE_MAX / b)
return NULL;

return kmalloc(a * b, flags);
}

(I just wish C had a sensible way to catch overflow...)

-Kees

[1] git grep -E 'alloc\([^,]+[^(]\*[^)][^,]+[^(]\*[^)][^,]+[^(]\*[^)][^,]+,'

--
Kees Cook
Pixel Security

2018-04-30 20:17:13

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Mon, Apr 30, 2018 at 12:02:14PM -0700, Kees Cook wrote:
> On Sun, Apr 29, 2018 at 1:30 PM, Matthew Wilcox <[email protected]> wrote:
> > On Sun, Apr 29, 2018 at 09:59:27AM -0700, Kees Cook wrote:
> >> Did this ever happen?
> >
> > Not yet. I brought it up at LSFMM, and I'll repost the patches soon.
> >
> >> I'd also like to see kmalloc_array_3d() or
> >> something that takes three size arguments. We have a lot of this
> >> pattern too:
> >>
> >> kmalloc(sizeof(foo) * A * B, gfp...)
> >>
> >> And we could turn that into:
> >>
> >> kmalloc_array_3d(sizeof(foo), A, B, gfp...)
> >
> > Are either of A or B constant? Because if so, we could just use
> > kmalloc_array. If not, then kmalloc_array_3d becomes a little more
> > expensive than kmalloc_array because we have to do a divide at runtime
> > instead of compile-time. that's still better than allocating too few
> > bytes, of course.
>
> Yeah, getting the order of the division is nice. Some thoughts below...
>
> >
> > I'm wondering how far down the abc + ab + ac + bc + d rabbit-hole we're
> > going to end up going. As far as we have to, I guess.
>
> Well, the common patterns I've seen so far are:
>
> a
> ab
> abc
> a + bc
> ab + cd
>
> For any longer multiplications, I've only found[1]:
>
> drivers/staging/rtl8188eu/os_dep/osdep_service.c: void **a =
> kzalloc(h * sizeof(void *) + h * w * size, GFP_KERNEL);

That's pretty good, although it's just an atrocious vendor driver and
it turns out all of those things are constants, and it'd be far better
off with just declaring an array. I bet they used to declare one on
the stack ...

> At the end of the day, though, I don't really like having all these
> different names...
>
> kmalloc(), kmalloc_array(), kmalloc_ab_c(), kmalloc_array_3d()
>
> with their "matching" zeroing function:
>
> kzalloc(), kcalloc(), kzalloc_ab_c(), kmalloc_array_3d(..., gfp | __GFP_ZERO)

Yes, it's not very regular.

> For the multiplication cases, I wonder if we could just have:
>
> kmalloc_multN(gfp, a, b, c, ...)
> kzalloc_multN(gfp, a, b, c, ...)
>
> and we can replace all kcalloc() users with kzalloc_mult2(), all
> kmalloc_array() users with kmalloc_mult2(), the abc uses with
> kmalloc_mult3().

I'm reluctant to do away with kcalloc() as it has the obvious heritage
from user-space calloc() with the addition of GFP flags.

> That said, I *do* like kmalloc_struct() as it's a very common pattern...

Thanks! And way harder to misuse than kmalloc_ab_c().

> Or maybe, just leave the pattern in the name? kmalloc_ab(),
> kmalloc_abc(), kmalloc_ab_c(), kmalloc_ab_cd() ?
>
> Getting the constant ordering right could be part of the macro
> definition, maybe? i.e.:
>
> static inline void *kmalloc_ab(size_t a, size_t b, gfp_t flags)
> {
> if (__builtin_constant_p(a) && a != 0 && \
> b > SIZE_MAX / a)
> return NULL;
> else if (__builtin_constant_p(b) && b != 0 && \
> a > SIZE_MAX / b)
> return NULL;
>
> return kmalloc(a * b, flags);
> }

Ooh, if neither a nor b is constant, it just didn't do a check ;-( This
stuff is hard.

> (I just wish C had a sensible way to catch overflow...)

Every CPU I ever worked with had an "overflow" bit ... do we have a
friend on the C standards ctte who might figure out a way to let us
write code that checks it?

> -Kees
>
> [1] git grep -E 'alloc\([^,]+[^(]\*[^)][^,]+[^(]\*[^)][^,]+[^(]\*[^)][^,]+,'

I'm impressed, but it's not going to catch

veryLongPointerNameThatsMeaningfulToMe = kmalloc(initialSize +
numberOfEntries * entrySize + someOtherThing * yourMum,
GFP_KERNEL);


2018-04-30 21:29:49

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On 2018-04-30 22:16, Matthew Wilcox wrote:
> On Mon, Apr 30, 2018 at 12:02:14PM -0700, Kees Cook wrote:
>>
>> Getting the constant ordering right could be part of the macro
>> definition, maybe? i.e.:
>>
>> static inline void *kmalloc_ab(size_t a, size_t b, gfp_t flags)
>> {
>> if (__builtin_constant_p(a) && a != 0 && \
>> b > SIZE_MAX / a)
>> return NULL;
>> else if (__builtin_constant_p(b) && b != 0 && \
>> a > SIZE_MAX / b)
>> return NULL;
>>
>> return kmalloc(a * b, flags);
>> }
>
> Ooh, if neither a nor b is constant, it just didn't do a check ;-( This
> stuff is hard.
>
>> (I just wish C had a sensible way to catch overflow...)
>
> Every CPU I ever worked with had an "overflow" bit ... do we have a
> friend on the C standards ctte who might figure out a way to let us
> write code that checks it?

gcc 5.1+ (I think) have the __builtin_OP_overflow checks that should
generate reasonable code. Too bad there's no completely generic
check_all_ops_in_this_expression(a+b*c+d/e, or_jump_here). Though it's
hard to define what they should be checked against - probably would
require all subexpressions (including the variables themselves) to have
the same type.

plug: https://lkml.org/lkml/2015/7/19/358

Rasmus


2018-04-30 22:31:11

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Mon, Apr 30, 2018 at 1:16 PM, Matthew Wilcox <[email protected]> wrote:
> On Mon, Apr 30, 2018 at 12:02:14PM -0700, Kees Cook wrote:
>> For any longer multiplications, I've only found[1]:
>>
>> drivers/staging/rtl8188eu/os_dep/osdep_service.c: void **a =
>> kzalloc(h * sizeof(void *) + h * w * size, GFP_KERNEL);
>
> That's pretty good, although it's just an atrocious vendor driver and
> it turns out all of those things are constants, and it'd be far better
> off with just declaring an array. I bet they used to declare one on
> the stack ...

Yeah, it was just a quick hack to look for stuff.

>
>> At the end of the day, though, I don't really like having all these
>> different names...
>>
>> kmalloc(), kmalloc_array(), kmalloc_ab_c(), kmalloc_array_3d()
>>
>> with their "matching" zeroing function:
>>
>> kzalloc(), kcalloc(), kzalloc_ab_c(), kmalloc_array_3d(..., gfp | __GFP_ZERO)
>
> Yes, it's not very regular.
>
>> For the multiplication cases, I wonder if we could just have:
>>
>> kmalloc_multN(gfp, a, b, c, ...)
>> kzalloc_multN(gfp, a, b, c, ...)
>>
>> and we can replace all kcalloc() users with kzalloc_mult2(), all
>> kmalloc_array() users with kmalloc_mult2(), the abc uses with
>> kmalloc_mult3().
>
> I'm reluctant to do away with kcalloc() as it has the obvious heritage
> from user-space calloc() with the addition of GFP flags.

But it encourages misuse with calloc(N * M, gfp) ... if we removed
calloc and kept k[mz]alloc_something(gfp, a, b, c...) I think we'd
have better adoption.

>> That said, I *do* like kmalloc_struct() as it's a very common pattern...
>
> Thanks! And way harder to misuse than kmalloc_ab_c().

Yes, quite so. It's really why I went with kmalloc_array_3d(), but now
I'm thinking better of it...

>> Or maybe, just leave the pattern in the name? kmalloc_ab(),
>> kmalloc_abc(), kmalloc_ab_c(), kmalloc_ab_cd() ?
>>
>> Getting the constant ordering right could be part of the macro
>> definition, maybe? i.e.:
>>
>> static inline void *kmalloc_ab(size_t a, size_t b, gfp_t flags)
>> {
>> if (__builtin_constant_p(a) && a != 0 && \
>> b > SIZE_MAX / a)
>> return NULL;
>> else if (__builtin_constant_p(b) && b != 0 && \
>> a > SIZE_MAX / b)
>> return NULL;
>>
>> return kmalloc(a * b, flags);
>> }
>
> Ooh, if neither a nor b is constant, it just didn't do a check ;-( This
> stuff is hard.

Yup, quite true. Obviously not the final form. ;) I meant to
illustrate that we could do compile-time tricks to reorder the
division in an efficient manner.

>> (I just wish C had a sensible way to catch overflow...)
>
> Every CPU I ever worked with had an "overflow" bit ... do we have a
> friend on the C standards ctte who might figure out a way to let us
> write code that checks it?

On the CPU it's not retained across multiple calculations. And the
type matters too. This came up recently in a separate thread too:
http://openwall.com/lists/kernel-hardening/2018/03/26/4

>> [1] git grep -E 'alloc\([^,]+[^(]\*[^)][^,]+[^(]\*[^)][^,]+[^(]\*[^)][^,]+,'
>
> I'm impressed, but it's not going to catch
>
> veryLongPointerNameThatsMeaningfulToMe = kmalloc(initialSize +
> numberOfEntries * entrySize + someOtherThing * yourMum,
> GFP_KERNEL);

Right, it wasn't meant to be exhaustive. I just included it in case
anyone wanted to go grepping around for themselves.

-Kees

--
Kees Cook
Pixel Security

2018-04-30 22:42:05

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Mon, Apr 30, 2018 at 11:29:04PM +0200, Rasmus Villemoes wrote:
> On 2018-04-30 22:16, Matthew Wilcox wrote:
> > On Mon, Apr 30, 2018 at 12:02:14PM -0700, Kees Cook wrote:
> >> (I just wish C had a sensible way to catch overflow...)
> >
> > Every CPU I ever worked with had an "overflow" bit ... do we have a
> > friend on the C standards ctte who might figure out a way to let us
> > write code that checks it?
>
> gcc 5.1+ (I think) have the __builtin_OP_overflow checks that should
> generate reasonable code. Too bad there's no completely generic
> check_all_ops_in_this_expression(a+b*c+d/e, or_jump_here). Though it's
> hard to define what they should be checked against - probably would
> require all subexpressions (including the variables themselves) to have
> the same type.

Nevertheless these generate much better code than our current safeguards!

extern void *malloc(unsigned long);

#define ULONG_MAX (~0UL)
#define SZ 8UL

void *a(unsigned long a)
{
if ((ULONG_MAX / SZ) > a)
return 0;
return malloc(a * SZ);
}

void *b(unsigned long a)
{
unsigned long c;
if (__builtin_mul_overflow(a, SZ, &c))
return 0;
return malloc(c);
}

(a lot of code uses a constant '8' as sizeof(void *)). Here's the
difference with gcc 7.3:

0: 48 b8 fe ff ff ff ff movabs $0x1ffffffffffffffe,%rax
7: ff ff 1f
a: 48 39 c7 cmp %rax,%rdi
d: 76 09 jbe 18 <a+0x18>
f: 48 c1 e7 03 shl $0x3,%rdi
13: e9 00 00 00 00 jmpq 18 <a+0x18>
14: R_X86_64_PLT32 malloc-0x4
18: 31 c0 xor %eax,%eax
1a: c3 retq

vs

20: 48 89 f8 mov %rdi,%rax
23: ba 08 00 00 00 mov $0x8,%edx
28: 48 f7 e2 mul %rdx
2b: 48 89 c7 mov %rax,%rdi
2e: 70 05 jo 35 <b+0x15>
30: e9 00 00 00 00 jmpq 35 <b+0x15>
31: R_X86_64_PLT32 malloc-0x4
35: 31 c0 xor %eax,%eax
37: c3 retq

We've traded a shl for a mul (because shl doesn't set Overflow, only
Carry, and that's only bit 65, not an OR of bits 35-n), but we lose the
movabs and cmp. I'd rather run the second code fragment than the first.

2018-05-01 17:02:29

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Mon, Apr 30, 2018 at 2:29 PM, Rasmus Villemoes
<[email protected]> wrote:
> On 2018-04-30 22:16, Matthew Wilcox wrote:
>> On Mon, Apr 30, 2018 at 12:02:14PM -0700, Kees Cook wrote:
>>>
>>> Getting the constant ordering right could be part of the macro
>>> definition, maybe? i.e.:
>>>
>>> static inline void *kmalloc_ab(size_t a, size_t b, gfp_t flags)
>>> {
>>> if (__builtin_constant_p(a) && a != 0 && \
>>> b > SIZE_MAX / a)
>>> return NULL;
>>> else if (__builtin_constant_p(b) && b != 0 && \
>>> a > SIZE_MAX / b)
>>> return NULL;
>>>
>>> return kmalloc(a * b, flags);
>>> }
>>
>> Ooh, if neither a nor b is constant, it just didn't do a check ;-( This
>> stuff is hard.
>>
>>> (I just wish C had a sensible way to catch overflow...)
>>
>> Every CPU I ever worked with had an "overflow" bit ... do we have a
>> friend on the C standards ctte who might figure out a way to let us
>> write code that checks it?
>
> gcc 5.1+ (I think) have the __builtin_OP_overflow checks that should
> generate reasonable code. Too bad there's no completely generic
> check_all_ops_in_this_expression(a+b*c+d/e, or_jump_here). Though it's
> hard to define what they should be checked against - probably would
> require all subexpressions (including the variables themselves) to have
> the same type.
>
> plug: https://lkml.org/lkml/2015/7/19/358

That's a very nice series. Why did it never get taken? It seems to do
the right things quite correctly.

Daniel, while this isn't a perfect solution, is this something you'd
use in graphics-land?

-Kees

--
Kees Cook
Pixel Security

2018-05-01 17:41:53

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct



On Tue, 1 May 2018, Kees Cook wrote:

> On Mon, Apr 30, 2018 at 2:29 PM, Rasmus Villemoes
> <[email protected]> wrote:
> > On 2018-04-30 22:16, Matthew Wilcox wrote:
> >> On Mon, Apr 30, 2018 at 12:02:14PM -0700, Kees Cook wrote:
> >>>
> >>> Getting the constant ordering right could be part of the macro
> >>> definition, maybe? i.e.:
> >>>
> >>> static inline void *kmalloc_ab(size_t a, size_t b, gfp_t flags)
> >>> {
> >>> if (__builtin_constant_p(a) && a != 0 && \
> >>> b > SIZE_MAX / a)
> >>> return NULL;
> >>> else if (__builtin_constant_p(b) && b != 0 && \
> >>> a > SIZE_MAX / b)
> >>> return NULL;
> >>>
> >>> return kmalloc(a * b, flags);
> >>> }
> >>
> >> Ooh, if neither a nor b is constant, it just didn't do a check ;-( This
> >> stuff is hard.
> >>
> >>> (I just wish C had a sensible way to catch overflow...)
> >>
> >> Every CPU I ever worked with had an "overflow" bit ... do we have a
> >> friend on the C standards ctte who might figure out a way to let us
> >> write code that checks it?
> >
> > gcc 5.1+ (I think) have the __builtin_OP_overflow checks that should
> > generate reasonable code. Too bad there's no completely generic
> > check_all_ops_in_this_expression(a+b*c+d/e, or_jump_here). Though it's
> > hard to define what they should be checked against - probably would
> > require all subexpressions (including the variables themselves) to have
> > the same type.
> >
> > plug: https://lkml.org/lkml/2015/7/19/358
>
> That's a very nice series. Why did it never get taken? It seems to do
> the right things quite correctly.
>
> Daniel, while this isn't a perfect solution, is this something you'd
> use in graphics-land?

Opportunities for this, found with the following are shown below:

@@
expression a,b;
@@

*if (a + b < a)
{ ... return ...; }

- at the beginning of a line indicates an opportunity, not a suggestion
for removal.

I haven't checked the results carefully, but most look relevant.

julia



diff -u -p /var/linuxes/linux-next/lib/zstd/decompress.c /tmp/nothing/lib/zstd/decompress.c
--- /var/linuxes/linux-next/lib/zstd/decompress.c
+++ /tmp/nothing/lib/zstd/decompress.c
@@ -343,7 +343,6 @@ unsigned long long ZSTD_findDecompressed
return ret;

/* check for overflow */
- if (totalDstSize + ret < totalDstSize)
return ZSTD_CONTENTSIZE_ERROR;
totalDstSize += ret;
}
diff -u -p /var/linuxes/linux-next/lib/scatterlist.c /tmp/nothing/lib/scatterlist.c
--- /var/linuxes/linux-next/lib/scatterlist.c
+++ /tmp/nothing/lib/scatterlist.c
@@ -503,7 +503,6 @@ struct scatterlist *sgl_alloc_order(unsi
nalloc = nent;
if (chainable) {
/* Check for integer overflow */
- if (nalloc + 1 < nalloc)
return NULL;
nalloc++;
}
diff -u -p /var/linuxes/linux-next/drivers/dma-buf/dma-buf.c /tmp/nothing/drivers/dma-buf/dma-buf.c
--- /var/linuxes/linux-next/drivers/dma-buf/dma-buf.c
+++ /tmp/nothing/drivers/dma-buf/dma-buf.c
@@ -954,7 +954,6 @@ int dma_buf_mmap(struct dma_buf *dmabuf,
return -EINVAL;

/* check for offset overflow */
- if (pgoff + vma_pages(vma) < pgoff)
return -EOVERFLOW;

/* check for overflowing the buffer's size */
diff -u -p /var/linuxes/linux-next/drivers/md/dm-verity-target.c /tmp/nothing/drivers/md/dm-verity-target.c
--- /var/linuxes/linux-next/drivers/md/dm-verity-target.c
+++ /tmp/nothing/drivers/md/dm-verity-target.c
@@ -1043,7 +1043,6 @@ static int verity_ctr(struct dm_target *
v->hash_level_block[i] = hash_position;
s = (v->data_blocks + ((sector_t)1 << ((i + 1) * v->hash_per_block_bits)) - 1)
>> ((i + 1) * v->hash_per_block_bits);
- if (hash_position + s < hash_position) {
ti->error = "Hash device offset overflow";
r = -E2BIG;
goto bad;
diff -u -p /var/linuxes/linux-next/drivers/md/dm-flakey.c /tmp/nothing/drivers/md/dm-flakey.c
--- /var/linuxes/linux-next/drivers/md/dm-flakey.c
+++ /tmp/nothing/drivers/md/dm-flakey.c
@@ -233,7 +233,6 @@ static int flakey_ctr(struct dm_target *
goto bad;
}

- if (fc->up_interval + fc->down_interval < fc->up_interval) {
ti->error = "Interval overflow";
r = -EINVAL;
goto bad;
diff -u -p /var/linuxes/linux-next/drivers/gpu/drm/vc4/vc4_validate.c /tmp/nothing/drivers/gpu/drm/vc4/vc4_validate.c
--- /var/linuxes/linux-next/drivers/gpu/drm/vc4/vc4_validate.c
+++ /tmp/nothing/drivers/gpu/drm/vc4/vc4_validate.c
@@ -306,7 +306,6 @@ validate_gl_array_primitive(VALIDATE_ARG
}
shader_state = &exec->shader_state[exec->shader_state_count - 1];

- if (length + base_index < length) {
DRM_DEBUG("primitive vertex count overflow\n");
return -EINVAL;
}
diff -u -p /var/linuxes/linux-next/drivers/vme/vme.c /tmp/nothing/drivers/vme/vme.c
--- /var/linuxes/linux-next/drivers/vme/vme.c
+++ /tmp/nothing/drivers/vme/vme.c
@@ -208,7 +208,6 @@ int vme_check_window(u32 aspace, unsigne
{
int retval = 0;

- if (vme_base + size < size)
return -EINVAL;

switch (aspace) {
diff -u -p /var/linuxes/linux-next/drivers/virtio/virtio_pci_modern.c /tmp/nothing/drivers/virtio/virtio_pci_modern.c
--- /var/linuxes/linux-next/drivers/virtio/virtio_pci_modern.c
+++ /tmp/nothing/drivers/virtio/virtio_pci_modern.c
@@ -99,7 +99,6 @@ static void __iomem *map_capability(stru

length -= start;

- if (start + offset < offset) {
dev_err(&dev->dev,
"virtio_pci: map wrap-around %u+%u\n",
start, offset);
diff -u -p /var/linuxes/linux-next/drivers/net/wireless/marvell/mwifiex/pcie.c /tmp/nothing/drivers/net/wireless/marvell/mwifiex/pcie.c
--- /var/linuxes/linux-next/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ /tmp/nothing/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -2015,7 +2015,6 @@ static int mwifiex_extract_wifi_fw(struc

switch (dnld_cmd) {
case MWIFIEX_FW_DNLD_CMD_1:
- if (offset + data_len < data_len) {
mwifiex_dbg(adapter, ERROR, "bad FW parse\n");
ret = -1;
goto done;
@@ -2039,7 +2038,6 @@ static int mwifiex_extract_wifi_fw(struc
case MWIFIEX_FW_DNLD_CMD_5:
first_cmd = true;
/* Check for integer overflow */
- if (offset + data_len < data_len) {
mwifiex_dbg(adapter, ERROR, "bad FW parse\n");
ret = -1;
goto done;
@@ -2049,7 +2047,6 @@ static int mwifiex_extract_wifi_fw(struc
case MWIFIEX_FW_DNLD_CMD_6:
first_cmd = true;
/* Check for integer overflow */
- if (offset + data_len < data_len) {
mwifiex_dbg(adapter, ERROR, "bad FW parse\n");
ret = -1;
goto done;
diff -u -p /var/linuxes/linux-next/drivers/net/ethernet/sun/niu.c /tmp/nothing/drivers/net/ethernet/sun/niu.c
--- /var/linuxes/linux-next/drivers/net/ethernet/sun/niu.c
+++ /tmp/nothing/drivers/net/ethernet/sun/niu.c
@@ -6884,7 +6884,6 @@ static int niu_get_eeprom(struct net_dev
offset = eeprom->offset;
len = eeprom->len;

- if (offset + len < offset)
return -EINVAL;
if (offset >= np->eeprom_len)
return -EINVAL;
diff -u -p /var/linuxes/linux-next/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c /tmp/nothing/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c
--- /var/linuxes/linux-next/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c
+++ /tmp/nothing/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c
@@ -389,7 +389,6 @@ ixgb_get_eeprom(struct net_device *netde

max_len = ixgb_get_eeprom_len(netdev);

- if (eeprom->offset > eeprom->offset + eeprom->len) {
ret_val = -EINVAL;
goto geeprom_error;
}
@@ -435,7 +434,6 @@ ixgb_set_eeprom(struct net_device *netde

max_len = ixgb_get_eeprom_len(netdev);

- if (eeprom->offset > eeprom->offset + eeprom->len)
return -EINVAL;

if ((eeprom->offset + eeprom->len) > max_len)
diff -u -p /var/linuxes/linux-next/drivers/crypto/axis/artpec6_crypto.c /tmp/nothing/drivers/crypto/axis/artpec6_crypto.c
--- /var/linuxes/linux-next/drivers/crypto/axis/artpec6_crypto.c
+++ /tmp/nothing/drivers/crypto/axis/artpec6_crypto.c
@@ -1193,7 +1193,6 @@ artpec6_crypto_ctr_crypt(struct skcipher
* the whole IV is a counter. So fallback if the counter is going to
* overlow.
*/
- if (counter + nblks < counter) {
int ret;

pr_debug("counter %x will overflow (nblks %u), falling back\n",
diff -u -p /var/linuxes/linux-next/drivers/vhost/vringh.c /tmp/nothing/drivers/vhost/vringh.c
--- /var/linuxes/linux-next/drivers/vhost/vringh.c
+++ /tmp/nothing/drivers/vhost/vringh.c
@@ -123,7 +123,6 @@ static inline bool range_check(struct vr
}

/* Otherwise, don't wrap. */
- if (addr + *len < addr) {
vringh_bad("Wrapping descriptor %zu@0x%llx",
*len, (unsigned long long)addr);
return false;
diff -u -p /var/linuxes/linux-next/drivers/infiniband/hw/cxgb3/iwch_qp.c /tmp/nothing/drivers/infiniband/hw/cxgb3/iwch_qp.c
--- /var/linuxes/linux-next/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ /tmp/nothing/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -224,8 +224,6 @@ static int iwch_sgl2pbl_map(struct iwch_
pr_debug("%s %d\n", __func__, __LINE__);
return -EINVAL;
}
- if (sg_list[i].addr + ((u64) sg_list[i].length) <
- sg_list[i].addr) {
pr_debug("%s %d\n", __func__, __LINE__);
return -EINVAL;
}
diff -u -p /var/linuxes/linux-next/drivers/infiniband/hw/hfi1/eprom.c /tmp/nothing/drivers/infiniband/hw/hfi1/eprom.c
--- /var/linuxes/linux-next/drivers/infiniband/hw/hfi1/eprom.c
+++ /tmp/nothing/drivers/infiniband/hw/hfi1/eprom.c
@@ -362,7 +362,6 @@ static int read_segment_platform_config(
}

/* check for bogus offset and size that wrap when added together */
- if (entry->offset + entry->size < entry->offset) {
dd_dev_err(dd,
"Bad configuration file start + size 0x%x+0x%x\n",
entry->offset, entry->size);
diff -u -p /var/linuxes/linux-next/drivers/fsi/fsi-core.c /tmp/nothing/drivers/fsi/fsi-core.c
--- /var/linuxes/linux-next/drivers/fsi/fsi-core.c
+++ /tmp/nothing/drivers/fsi/fsi-core.c
@@ -317,7 +317,6 @@ EXPORT_SYMBOL_GPL(fsi_slave_write);
extern int fsi_slave_claim_range(struct fsi_slave *slave,
uint32_t addr, uint32_t size)
{
- if (addr + size < addr)
return -EINVAL;

if (addr + size > slave->size)
diff -u -p /var/linuxes/linux-next/virt/kvm/arm/vgic/vgic-v2.c /tmp/nothing/virt/kvm/arm/vgic/vgic-v2.c
--- /var/linuxes/linux-next/virt/kvm/arm/vgic/vgic-v2.c
+++ /tmp/nothing/virt/kvm/arm/vgic/vgic-v2.c
@@ -274,9 +274,7 @@ void vgic_v2_enable(struct kvm_vcpu *vcp
/* check for overlapping regions and for regions crossing the end of memory */
static bool vgic_v2_check_base(gpa_t dist_base, gpa_t cpu_base)
{
- if (dist_base + KVM_VGIC_V2_DIST_SIZE < dist_base)
return false;
- if (cpu_base + KVM_VGIC_V2_CPU_SIZE < cpu_base)
return false;

if (dist_base + KVM_VGIC_V2_DIST_SIZE <= cpu_base)
diff -u -p /var/linuxes/linux-next/virt/kvm/kvm_main.c /tmp/nothing/virt/kvm/kvm_main.c
--- /var/linuxes/linux-next/virt/kvm/kvm_main.c
+++ /tmp/nothing/virt/kvm/kvm_main.c
@@ -921,7 +921,6 @@ int __kvm_set_memory_region(struct kvm *
goto out;
if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
goto out;
- if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
goto out;

slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
diff -u -p /var/linuxes/linux-next/virt/kvm/eventfd.c /tmp/nothing/virt/kvm/eventfd.c
--- /var/linuxes/linux-next/virt/kvm/eventfd.c
+++ /tmp/nothing/virt/kvm/eventfd.c
@@ -917,7 +917,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, st
}

/* check for range overflow */
- if (args->addr + args->len < args->addr)
return -EINVAL;

/* check for extra flags that we don't understand */
diff -u -p /var/linuxes/linux-next/virt/kvm/coalesced_mmio.c /tmp/nothing/virt/kvm/coalesced_mmio.c
--- /var/linuxes/linux-next/virt/kvm/coalesced_mmio.c
+++ /tmp/nothing/virt/kvm/coalesced_mmio.c
@@ -31,7 +31,6 @@ static int coalesced_mmio_in_range(struc
*/
if (len < 0)
return 0;
- if (addr + len < addr)
return 0;
if (addr < dev->zone.addr)
return 0;
diff -u -p /var/linuxes/linux-next/arch/x86/kernel/e820.c /tmp/nothing/arch/x86/kernel/e820.c
--- /var/linuxes/linux-next/arch/x86/kernel/e820.c
+++ /tmp/nothing/arch/x86/kernel/e820.c
@@ -306,7 +306,6 @@ int __init e820__update_table(struct e82

/* Bail out if we find any unreasonable addresses in the map: */
for (i = 0; i < table->nr_entries; i++) {
- if (entries[i].addr + entries[i].size < entries[i].addr)
return -1;
}

diff -u -p /var/linuxes/linux-next/arch/powerpc/platforms/powernv/opal-prd.c /tmp/nothing/arch/powerpc/platforms/powernv/opal-prd.c
--- /var/linuxes/linux-next/arch/powerpc/platforms/powernv/opal-prd.c
+++ /tmp/nothing/arch/powerpc/platforms/powernv/opal-prd.c
@@ -52,7 +52,6 @@ static bool opal_prd_range_is_valid(uint
struct device_node *parent, *node;
bool found;

- if (addr + size < addr)
return false;

parent = of_find_node_by_path("/reserved-memory");
diff -u -p /var/linuxes/linux-next/arch/powerpc/kernel/kexec_elf_64.c /tmp/nothing/arch/powerpc/kernel/kexec_elf_64.c
--- /var/linuxes/linux-next/arch/powerpc/kernel/kexec_elf_64.c
+++ /tmp/nothing/arch/powerpc/kernel/kexec_elf_64.c
@@ -206,7 +206,6 @@ static bool elf_is_phdr_sane(const struc
} else if (phdr->p_offset + phdr->p_filesz > buf_len) {
pr_debug("ELF segment not in file.\n");
return false;
- } else if (phdr->p_paddr + phdr->p_memsz < phdr->p_paddr) {
pr_debug("ELF segment address wraps around.\n");
return false;
}
@@ -322,7 +321,6 @@ static bool elf_is_shdr_sane(const struc
if (!size_ok) {
pr_debug("ELF section with wrong entry size.\n");
return false;
- } else if (shdr->sh_addr + shdr->sh_size < shdr->sh_addr) {
pr_debug("ELF section address wraps around.\n");
return false;
}
diff -u -p /var/linuxes/linux-next/arch/mips/kernel/setup.c /tmp/nothing/arch/mips/kernel/setup.c
--- /var/linuxes/linux-next/arch/mips/kernel/setup.c
+++ /tmp/nothing/arch/mips/kernel/setup.c
@@ -97,7 +97,6 @@ void __init add_memory_region(phys_addr_
--size;

/* Sanity check */
- if (start + size < start) {
pr_warn("Trying to add an invalid memory region, skipped\n");
return;
}
diff -u -p /var/linuxes/linux-next/arch/blackfin/kernel/setup.c /tmp/nothing/arch/blackfin/kernel/setup.c
--- /var/linuxes/linux-next/arch/blackfin/kernel/setup.c
+++ /tmp/nothing/arch/blackfin/kernel/setup.c
@@ -351,7 +351,6 @@ static int __init sanitize_memmap(struct

/* bail out if we find any unreasonable addresses in memmap */
for (i = 0; i < old_nr; i++)
- if (map[i].addr + map[i].size < map[i].addr)
return -1;

/* create pointers for initial change-point information (for sorting) */
diff -u -p /var/linuxes/linux-next/arch/blackfin/kernel/ptrace.c /tmp/nothing/arch/blackfin/kernel/ptrace.c
--- /var/linuxes/linux-next/arch/blackfin/kernel/ptrace.c
+++ /tmp/nothing/arch/blackfin/kernel/ptrace.c
@@ -123,7 +123,6 @@ is_user_addr_valid(struct task_struct *c
struct sram_list_struct *sraml;

/* overflow */
- if (start + len < start)
return -EIO;

down_read(&child->mm->mmap_sem);
diff -u -p /var/linuxes/linux-next/arch/nios2/kernel/sys_nios2.c /tmp/nothing/arch/nios2/kernel/sys_nios2.c
--- /var/linuxes/linux-next/arch/nios2/kernel/sys_nios2.c
+++ /tmp/nothing/arch/nios2/kernel/sys_nios2.c
@@ -31,7 +31,6 @@ asmlinkage int sys_cacheflush(unsigned l
return -EINVAL;

/* Check for overflow */
- if (addr + len < addr)
return -EFAULT;

/*
diff -u -p /var/linuxes/linux-next/arch/m68k/kernel/sys_m68k.c /tmp/nothing/arch/m68k/kernel/sys_m68k.c
--- /var/linuxes/linux-next/arch/m68k/kernel/sys_m68k.c
+++ /tmp/nothing/arch/m68k/kernel/sys_m68k.c
@@ -392,7 +392,6 @@ sys_cacheflush (unsigned long addr, int
struct vm_area_struct *vma;

/* Check for overflow. */
- if (addr + len < addr)
goto out;

/*
diff -u -p /var/linuxes/linux-next/arch/sh/kernel/sys_sh.c /tmp/nothing/arch/sh/kernel/sys_sh.c
--- /var/linuxes/linux-next/arch/sh/kernel/sys_sh.c
+++ /tmp/nothing/arch/sh/kernel/sys_sh.c
@@ -66,7 +66,6 @@ asmlinkage int sys_cacheflush(unsigned l
* Verify that the specified address region actually belongs
* to this process.
*/
- if (addr + len < addr)
return -EFAULT;

down_read(&current->mm->mmap_sem);
diff -u -p /var/linuxes/linux-next/mm/vmalloc.c /tmp/nothing/mm/vmalloc.c
--- /var/linuxes/linux-next/mm/vmalloc.c
+++ /tmp/nothing/mm/vmalloc.c
@@ -456,12 +456,10 @@ nocache:
addr = ALIGN(first->va_end, align);
if (addr < vstart)
goto nocache;
- if (addr + size < addr)
goto overflow;

} else {
addr = ALIGN(vstart, align);
- if (addr + size < addr)
goto overflow;

n = vmap_area_root.rb_node;
@@ -488,7 +486,6 @@ nocache:
if (addr + cached_hole_size < first->va_start)
cached_hole_size = first->va_start - addr;
addr = ALIGN(first->va_end, align);
- if (addr + size < addr)
goto overflow;

if (list_is_last(&first->list, &vmap_area_list))
diff -u -p /var/linuxes/linux-next/mm/memory.c /tmp/nothing/mm/memory.c
--- /var/linuxes/linux-next/mm/memory.c
+++ /tmp/nothing/mm/memory.c
@@ -2136,7 +2136,6 @@ int vm_iomap_memory(struct vm_area_struc
unsigned long vm_len, pfn, pages;

/* Check that the physical memory area passed in looks valid */
- if (start + len < start)
return -EINVAL;
/*
* You *really* shouldn't map things that aren't page-aligned,
@@ -2146,7 +2145,6 @@ int vm_iomap_memory(struct vm_area_struc
len += start & ~PAGE_MASK;
pfn = start >> PAGE_SHIFT;
pages = (len + ~PAGE_MASK) >> PAGE_SHIFT;
- if (pfn + pages < pfn)
return -EINVAL;

/* We start the mapping 'vm_pgoff' pages into the area */
diff -u -p /var/linuxes/linux-next/mm/mmap.c /tmp/nothing/mm/mmap.c
--- /var/linuxes/linux-next/mm/mmap.c
+++ /tmp/nothing/mm/mmap.c
@@ -2792,7 +2792,6 @@ SYSCALL_DEFINE5(remap_file_pages, unsign
return ret;

/* Does pgoff wrap? */
- if (pgoff + (size >> PAGE_SHIFT) < pgoff)
return ret;

if (down_write_killable(&mm->mmap_sem))
diff -u -p /var/linuxes/linux-next/mm/nommu.c /tmp/nothing/mm/nommu.c
--- /var/linuxes/linux-next/mm/nommu.c
+++ /tmp/nothing/mm/nommu.c
@@ -1860,7 +1860,6 @@ int access_process_vm(struct task_struct
{
struct mm_struct *mm;

- if (addr + len < addr)
return 0;

mm = get_task_mm(tsk);
diff -u -p /var/linuxes/linux-next/mm/mremap.c /tmp/nothing/mm/mremap.c
--- /var/linuxes/linux-next/mm/mremap.c
+++ /tmp/nothing/mm/mremap.c
@@ -411,7 +411,6 @@ static struct vm_area_struct *vma_to_res
/* Need to be careful about a growing mapping */
pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
pgoff += vma->vm_pgoff;
- if (pgoff + (new_len >> PAGE_SHIFT) < pgoff)
return ERR_PTR(-EINVAL);

if (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP))
diff -u -p /var/linuxes/linux-next/tools/perf/util/unwind-libunwind-local.c /tmp/nothing/tools/perf/util/unwind-libunwind-local.c
--- /var/linuxes/linux-next/tools/perf/util/unwind-libunwind-local.c
+++ /tmp/nothing/tools/perf/util/unwind-libunwind-local.c
@@ -517,7 +517,6 @@ static int access_mem(unw_addr_space_t _
end = start + stack->size;

/* Check overflow. */
- if (addr + sizeof(unw_word_t) < addr)
return -EINVAL;

if (addr < start || addr + sizeof(unw_word_t) >= end) {
diff -u -p /var/linuxes/linux-next/tools/perf/util/dso.c /tmp/nothing/tools/perf/util/dso.c
--- /var/linuxes/linux-next/tools/perf/util/dso.c
+++ /tmp/nothing/tools/perf/util/dso.c
@@ -964,7 +964,6 @@ static ssize_t data_read_offset(struct d
if (offset > dso->data.file_size)
return -1;

- if (offset + size < offset)
return -1;

return cached_read(dso, machine, offset, data, size);
diff -u -p /var/linuxes/linux-next/tools/perf/util/unwind-libdw.c /tmp/nothing/tools/perf/util/unwind-libdw.c
--- /var/linuxes/linux-next/tools/perf/util/unwind-libdw.c
+++ /tmp/nothing/tools/perf/util/unwind-libdw.c
@@ -145,7 +145,6 @@ static bool memory_read(Dwfl *dwfl __may
end = start + stack->size;

/* Check overflow. */
- if (addr + sizeof(Dwarf_Word) < addr)
return false;

if (addr < start || addr + sizeof(Dwarf_Word) > end) {
diff -u -p /var/linuxes/linux-next/net/netfilter/xt_u32.c /tmp/nothing/net/netfilter/xt_u32.c
--- /var/linuxes/linux-next/net/netfilter/xt_u32.c
+++ /tmp/nothing/net/netfilter/xt_u32.c
@@ -57,7 +57,6 @@ static bool u32_match_it(const struct xt
val >>= number;
break;
case XT_U32_AT:
- if (at + val < at)
return false;
at += val;
pos = number;
diff -u -p /var/linuxes/linux-next/net/netfilter/nft_limit.c /tmp/nothing/net/netfilter/nft_limit.c
--- /var/linuxes/linux-next/net/netfilter/nft_limit.c
+++ /tmp/nothing/net/netfilter/nft_limit.c
@@ -71,7 +71,6 @@ static int nft_limit_init(struct nft_lim
else
limit->burst = 0;

- if (limit->rate + limit->burst < limit->rate)
return -EOVERFLOW;

/* The token bucket size limits the number of tokens can be
diff -u -p /var/linuxes/linux-next/fs/btrfs/extent_map.c /tmp/nothing/fs/btrfs/extent_map.c
--- /var/linuxes/linux-next/fs/btrfs/extent_map.c
+++ /tmp/nothing/fs/btrfs/extent_map.c
@@ -84,7 +84,6 @@ void free_extent_map(struct extent_map *
/* simple helper to do math around the end of an extent, handling wrap */
static u64 range_end(u64 start, u64 len)
{
- if (start + len < start)
return (u64)-1;
return start + len;
}
diff -u -p /var/linuxes/linux-next/fs/btrfs/ordered-data.c /tmp/nothing/fs/btrfs/ordered-data.c
--- /var/linuxes/linux-next/fs/btrfs/ordered-data.c
+++ /tmp/nothing/fs/btrfs/ordered-data.c
@@ -31,7 +31,6 @@ static struct kmem_cache *btrfs_ordered_

static u64 entry_end(struct btrfs_ordered_extent *entry)
{
- if (entry->file_offset + entry->len < entry->file_offset)
return (u64)-1;
return entry->file_offset + entry->len;
}
diff -u -p /var/linuxes/linux-next/fs/ext4/resize.c /tmp/nothing/fs/ext4/resize.c
--- /var/linuxes/linux-next/fs/ext4/resize.c
+++ /tmp/nothing/fs/ext4/resize.c
@@ -1615,14 +1615,10 @@ int ext4_group_add(struct super_block *s
return -EPERM;
}

- if (ext4_blocks_count(es) + input->blocks_count <
- ext4_blocks_count(es)) {
ext4_warning(sb, "blocks_count overflow");
return -EINVAL;
}

- if (le32_to_cpu(es->s_inodes_count) + EXT4_INODES_PER_GROUP(sb) <
- le32_to_cpu(es->s_inodes_count)) {
ext4_warning(sb, "inodes_count overflow");
return -EINVAL;
}
@@ -1770,7 +1766,6 @@ int ext4_group_extend(struct super_block

add = EXT4_BLOCKS_PER_GROUP(sb) - last;

- if (o_blocks_count + add < o_blocks_count) {
ext4_warning(sb, "blocks_count overflow");
return -EINVAL;
}
diff -u -p /var/linuxes/linux-next/sound/core/info.c /tmp/nothing/sound/core/info.c
--- /var/linuxes/linux-next/sound/core/info.c
+++ /tmp/nothing/sound/core/info.c
@@ -109,7 +109,6 @@ static bool valid_pos(loff_t pos, size_t
{
if (pos < 0 || (long) pos != pos || (ssize_t) count < 0)
return false;
- if ((unsigned long) pos + (unsigned long) count < (unsigned long) pos)
return false;
return true;
}
diff -u -p /var/linuxes/linux-next/ipc/mqueue.c /tmp/nothing/ipc/mqueue.c
--- /var/linuxes/linux-next/ipc/mqueue.c
+++ /tmp/nothing/ipc/mqueue.c
@@ -291,7 +291,6 @@ static struct inode *mqueue_get_inode(st
min_t(unsigned int, info->attr.mq_maxmsg, MQ_PRIO_MAX) *
sizeof(struct posix_msg_tree_node);
mq_bytes = info->attr.mq_maxmsg * info->attr.mq_msgsize;
- if (mq_bytes + mq_treesize < mq_bytes)
goto out_inode;
mq_bytes += mq_treesize;
spin_lock(&mq_lock);
diff -u -p /var/linuxes/linux-next/ipc/shm.c /tmp/nothing/ipc/shm.c
--- /var/linuxes/linux-next/ipc/shm.c
+++ /tmp/nothing/ipc/shm.c
@@ -1417,7 +1417,6 @@ long do_shmat(int shmid, char __user *sh

if (addr && !(shmflg & SHM_REMAP)) {
err = -EINVAL;
- if (addr + size < addr)
goto invalid;

if (find_vma_intersection(current->mm, addr, addr + size))

2018-05-03 23:02:37

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On 2018-05-01 19:00, Kees Cook wrote:
> On Mon, Apr 30, 2018 at 2:29 PM, Rasmus Villemoes
> <[email protected]> wrote:
>>
>> gcc 5.1+ (I think) have the __builtin_OP_overflow checks that should
>> generate reasonable code. Too bad there's no completely generic
>> check_all_ops_in_this_expression(a+b*c+d/e, or_jump_here). Though it's
>> hard to define what they should be checked against - probably would
>> require all subexpressions (including the variables themselves) to have
>> the same type.
>>
>> plug: https://lkml.org/lkml/2015/7/19/358
>
> That's a very nice series. Why did it never get taken?

Well, nobody seemed particularly interested, and then
https://lkml.org/lkml/2015/10/28/215 happened... but he did later seem
to admit that it could be useful for the multiplication checking, and
that "the gcc interface for multiplication overflow is fine".

I still think even for unsigned types overflow checking can be subtle. E.g.

u32 somevar;

if (somevar + sizeof(foo) < somevar)
return -EOVERFLOW;
somevar += sizeof(this);

is broken, because the LHS is promoted to unsigned long/size_t, then so
is the RHS for the comparison, and the comparison is thus always false
(on 64bit). It gets worse if the two types are more "opaque", and in any
case it's not always easy to verify at a glance that the types are the
same, or at least that the expression of the widest type is on the RHS.

> It seems to do the right things quite correctly.

Yes, I wouldn't suggest it without the test module verifying corner
cases, and checking it has the same semantics whether used with old or
new gcc.

Would you shepherd it through if I updated the patches and resent?

Rasmus

2018-05-04 00:36:58

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Thu, May 3, 2018 at 4:00 PM, Rasmus Villemoes
<[email protected]> wrote:
> On 2018-05-01 19:00, Kees Cook wrote:
>> On Mon, Apr 30, 2018 at 2:29 PM, Rasmus Villemoes
>> <[email protected]> wrote:
>>>
>>> gcc 5.1+ (I think) have the __builtin_OP_overflow checks that should
>>> generate reasonable code. Too bad there's no completely generic
>>> check_all_ops_in_this_expression(a+b*c+d/e, or_jump_here). Though it's
>>> hard to define what they should be checked against - probably would
>>> require all subexpressions (including the variables themselves) to have
>>> the same type.
>>>
>>> plug: https://lkml.org/lkml/2015/7/19/358
>>
>> That's a very nice series. Why did it never get taken?
>
> Well, nobody seemed particularly interested, and then
> https://lkml.org/lkml/2015/10/28/215 happened... but he did later seem
> to admit that it could be useful for the multiplication checking, and
> that "the gcc interface for multiplication overflow is fine".

Oh, excellent. Thank you for that pointer! That conversation covered a
lot of ground. I need to think a little more about how to apply the
thoughts there with the kmalloc() needs and the GPU driver needs...

> I still think even for unsigned types overflow checking can be subtle. E.g.
>
> u32 somevar;
>
> if (somevar + sizeof(foo) < somevar)
> return -EOVERFLOW;
> somevar += sizeof(this);
>
> is broken, because the LHS is promoted to unsigned long/size_t, then so
> is the RHS for the comparison, and the comparison is thus always false
> (on 64bit). It gets worse if the two types are more "opaque", and in any
> case it's not always easy to verify at a glance that the types are the
> same, or at least that the expression of the widest type is on the RHS.

That's an excellent example, yes. (And likely worth including in the
commit log somewhere.)

>
>> It seems to do the right things quite correctly.
>
> Yes, I wouldn't suggest it without the test module verifying corner
> cases, and checking it has the same semantics whether used with old or
> new gcc.
>
> Would you shepherd it through if I updated the patches and resent?

Yes, though we may need reworking if we actually want to do the
try/catch style (since that was talked about with GPU stuff too...)

Either way, yes, a refresh would be lovely! :)

-Kees

--
Kees Cook
Pixel Security

2018-05-04 00:42:11

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Thu, May 3, 2018 at 5:36 PM, Kees Cook <[email protected]> wrote:
> On Thu, May 3, 2018 at 4:00 PM, Rasmus Villemoes
> <[email protected]> wrote:
>> On 2018-05-01 19:00, Kees Cook wrote:
>>> On Mon, Apr 30, 2018 at 2:29 PM, Rasmus Villemoes
>>> <[email protected]> wrote:
>>>>
>>>> gcc 5.1+ (I think) have the __builtin_OP_overflow checks that should
>>>> generate reasonable code. Too bad there's no completely generic
>>>> check_all_ops_in_this_expression(a+b*c+d/e, or_jump_here). Though it's
>>>> hard to define what they should be checked against - probably would
>>>> require all subexpressions (including the variables themselves) to have
>>>> the same type.
>>>>
>>>> plug: https://lkml.org/lkml/2015/7/19/358
>>>
>>> That's a very nice series. Why did it never get taken?
>>
>> Well, nobody seemed particularly interested, and then
>> https://lkml.org/lkml/2015/10/28/215 happened... but he did later seem
>> to admit that it could be useful for the multiplication checking, and
>> that "the gcc interface for multiplication overflow is fine".
>
> Oh, excellent. Thank you for that pointer! That conversation covered a
> lot of ground. I need to think a little more about how to apply the
> thoughts there with the kmalloc() needs and the GPU driver needs...
>
>> I still think even for unsigned types overflow checking can be subtle. E.g.
>>
>> u32 somevar;
>>
>> if (somevar + sizeof(foo) < somevar)
>> return -EOVERFLOW;
>> somevar += sizeof(this);
>>
>> is broken, because the LHS is promoted to unsigned long/size_t, then so
>> is the RHS for the comparison, and the comparison is thus always false
>> (on 64bit). It gets worse if the two types are more "opaque", and in any
>> case it's not always easy to verify at a glance that the types are the
>> same, or at least that the expression of the widest type is on the RHS.
>
> That's an excellent example, yes. (And likely worth including in the
> commit log somewhere.)
>
>>
>>> It seems to do the right things quite correctly.
>>
>> Yes, I wouldn't suggest it without the test module verifying corner
>> cases, and checking it has the same semantics whether used with old or
>> new gcc.
>>
>> Would you shepherd it through if I updated the patches and resent?
>
> Yes, though we may need reworking if we actually want to do the
> try/catch style (since that was talked about with GPU stuff too...)
>
> Either way, yes, a refresh would be lovely! :)

Whatever the case, I think we need to clean up all the kmalloc() math
anyway. As mentioned earlier, there are a handful of more complex
cases, but the vast majority are just A * B. I've put up a series here
now, and I'll send it out soon. I want to think more about 3-factor
products, addition, etc:

https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=kspp/kmalloc/2-factor-products

The commit logs need more details (i.e. about making constants the
second argument for optimal compiler results, etc), but there's a
Coccinelle-generated first pass.

-Kees

--
Kees Cook
Pixel Security

2018-05-04 07:44:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Wed, Feb 14, 2018 at 8:27 AM Matthew Wilcox <[email protected]> wrote:
> +static inline __must_check
> +void *kvmalloc_ab_c(size_t n, size_t size, size_t c, gfp_t gfp)
> +{
> + if (size != 0 && n > (SIZE_MAX - c) / size)
> + return NULL;
> +
> + return kvmalloc(n * size + c, gfp);

Ok, so some more bikeshedding:

- I really don't want to encourage people to use kvmalloc().

In fact, the conversion I saw was buggy. You can *not* convert a GFP_ATOMIC
user of kmalloc() to use kvmalloc.

- that divide is really really expensive on many architectures.

Note that these issues kind of go hand in hand.

Normal kernel allocations are *limited*. It's simply not ok to allocate
megabytes (or gigabytes) of mmory in general. We have serious limits, and
we *should* have serious limits. If people worry about the multiply
overflowing because a user is controlling the size valus, then dammit, such
a user should not be able to do a huge gigabyte vmalloc() that exhausts
memory and then spends time clearing it all!

So the whole notion that "overflows are dangerous, let's get rid of them"
somehow fixes a bug is BULLSHIT. You literally introduced a *new* bug by
removing the normal kmalloc() size limit because you thought that pverflows
are the only problem.

So stop doing things like this. We should not do a stupid divide, because
we damn well know that it is NOT VALID to allocate arrays that have
hundreds of fthousands of elements, or where the size of one element is
very big.

So get rid of the stupid divide, and make the limits be much stricter. Like
saying "the array element size had better be smaller than one page"
(because honestly, bigger elements are not valid in the kernel), and "the
size of the array cannot be more than "pick-some-number-out-of-your-ass".

So just make the divide go the hell away, a and check the size for validity.

Something like

if (size > PAGE_SIZE)
return NULL;
if (elem > 65535)
return NULL;
if (offset > PAGE_SIZE)
return NULL;
return kzalloc(size*elem+offset);

and now you (a) guarantee it can't overflow and (b) don't make people use
crazy vmalloc() allocations when they damn well shouldn't.

And yeah, if somebody has a page size bigger than 64k, then the above can
still overflow. I'm sorry, that architecture s broken shit.

Are there cases where vmalloc() is ok? Yes. But they should be rare, and
they should have a good reason for them. And honestly, even then the above
limits really really sound quite reasonable. There is no excuse for
million-entry arrays in the kernel. You are doing something seriously wrong
if you do those.

Linus

2018-05-04 13:15:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Fri, May 04, 2018 at 07:42:52AM +0000, Linus Torvalds wrote:
> On Wed, Feb 14, 2018 at 8:27 AM Matthew Wilcox <[email protected]> wrote:
> > +static inline __must_check
> > +void *kvmalloc_ab_c(size_t n, size_t size, size_t c, gfp_t gfp)
> > +{
> > + if (size != 0 && n > (SIZE_MAX - c) / size)
> > + return NULL;
> > +
> > + return kvmalloc(n * size + c, gfp);
>
> Ok, so some more bikeshedding:

I'm putting up the bikeshed against your house ... the colour is your
choice!

> - I really don't want to encourage people to use kvmalloc().
>
> In fact, the conversion I saw was buggy. You can *not* convert a GFP_ATOMIC
> user of kmalloc() to use kvmalloc.

Not sure which conversion you're referring to; not one of mine, I hope?

> - that divide is really really expensive on many architectures.

'c' and 'size' are _supposed_ to be constant and get evaluated at
compile-time. ie you should get something like this on x86:

0: 48 b8 fe ff ff ff ff movabs $0x1ffffffffffffffe,%rax
7: ff ff 1f
a: 48 39 c7 cmp %rax,%rdi
d: 76 09 jbe 18 <a+0x18>
f: 48 c1 e7 03 shl $0x3,%rdi
13: e9 00 00 00 00 jmpq 18 <a+0x18>
14: R_X86_64_PLT32 malloc-0x4
18: 31 c0 xor %eax,%eax
1a: c3 retq

Now, if someone's an idiot, then you'll get the divide done at runtime,
and that'll be expensive.

> Normal kernel allocations are *limited*. It's simply not ok to allocate
> megabytes (or gigabytes) of mmory in general. We have serious limits, and
> we *should* have serious limits. If people worry about the multiply
> overflowing because a user is controlling the size valus, then dammit, such
> a user should not be able to do a huge gigabyte vmalloc() that exhausts
> memory and then spends time clearing it all!

I agree.

> So the whole notion that "overflows are dangerous, let's get rid of them"
> somehow fixes a bug is BULLSHIT. You literally introduced a *new* bug by
> removing the normal kmalloc() size limit because you thought that pverflows
> are the only problem.

Rather, I replaced one bug with another. The removed bug was one
where we allocated 24 bytes and then indexed into the next slab object.
The added bug was that someone can now persuade the driver to allocate
gigabytes of memory. It's a less severe bug, but I take your point.
We do have _some_ limits in vmalloc -- we fail if you're trying to
allocate more memory than is in the machine, and we fail if there's
insufficient contiguous space in the virtual address space. But, yes,
this does allow people to allocate more memory than kmalloc would allow.

> So stop doing things like this. We should not do a stupid divide, because
> we damn well know that it is NOT VALID to allocate arrays that have
> hundreds of fthousands of elements, or where the size of one element is
> very big.
>
> So get rid of the stupid divide, and make the limits be much stricter. Like
> saying "the array element size had better be smaller than one page"
> (because honestly, bigger elements are not valid in the kernel), and "the
> size of the array cannot be more than "pick-some-number-out-of-your-ass".
>
> So just make the divide go the hell away, a and check the size for validity.
>
> Something like
>
> if (size > PAGE_SIZE)
> return NULL;
> if (elem > 65535)
> return NULL;
> if (offset > PAGE_SIZE)
> return NULL;
> return kzalloc(size*elem+offset);
>
> and now you (a) guarantee it can't overflow and (b) don't make people use
> crazy vmalloc() allocations when they damn well shouldn't.

I find your faith in the size of structs in the kernel touching ;-)

struct cmp_data {
/* size: 290904, cachelines: 4546, members: 11 */
struct dec_data {
/* size: 274520, cachelines: 4290, members: 10 */
struct cpu_entry_area {
/* size: 180224, cachelines: 2816, members: 7 */
struct saved_cmdlines_buffer {
/* size: 131104, cachelines: 2049, members: 5 */
struct debug_store_buffers {
/* size: 131072, cachelines: 2048, members: 2 */
struct bunzip_data {
/* size: 42648, cachelines: 667, members: 23 */
struct inflate_workspace {
/* size: 42312, cachelines: 662, members: 2 */
struct xz_dec_lzma2 {
/* size: 28496, cachelines: 446, members: 5 */
struct lzma_dec {
/* size: 28304, cachelines: 443, members: 21 */
struct rcu_state {
/* size: 17344, cachelines: 271, members: 34 */
struct pglist_data {
/* size: 18304, cachelines: 286, members: 34 */
struct tss_struct {
/* size: 12288, cachelines: 192, members: 2 */
struct bts_ctx {
/* size: 12288, cachelines: 192, members: 3 */

Those are just the ones above 10kB. Sure, I can see some of them are
boot time use only, or we allocate one per node, or whatever. But people
do create arrays of these things. The biggest object we have in the
slab_cache today is 23488 bytes (kvm_vcpu) -- at least on my laptop. Maybe
there's some insane driver out there that's creating larger things.

> And yeah, if somebody has a page size bigger than 64k, then the above can
> still overflow. I'm sorry, that architecture s broken shit.
>
> Are there cases where vmalloc() is ok? Yes. But they should be rare, and
> they should have a good reason for them. And honestly, even then the above
> limits really really sound quite reasonable. There is no excuse for
> million-entry arrays in the kernel. You are doing something seriously wrong
> if you do those.

Normally, yes. But then you get people like Google who want to have
a million file descriptors open in a single process. So I'm leery of
putting hard limits on, like the ones you suggest above, because I'm not
going to be the one who Google come to when they want to exceed the limit.
If you want to draw that line in the sand, then I'm happy to respin the
patch in that direction.

We really have two reasons for using vmalloc -- one is "fragmentation
currently makes it impossible to allocate enough contiguous memory
to satisfy your needs" and the other is "this request is for too much
memory to satisfy through the buddy allocator". kvmalloc is normally
(not always; see file descriptor example above) for the first kind of
problem, but I wonder if kvmalloc() shouldn't have the same limit as
kmalloc (2048 pages), then add a kvmalloc_large() which will not impose
that limit check.

2018-05-04 15:36:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Fri, May 4, 2018 at 3:14 AM Matthew Wilcox <[email protected]> wrote:

> > In fact, the conversion I saw was buggy. You can *not* convert a
GFP_ATOMIC
> > user of kmalloc() to use kvmalloc.

> Not sure which conversion you're referring to; not one of mine, I hope?

I was thinking of the coccinelle patch in this thread, but just realized
that that actually only did it for GFP_KERNEL, so I guess it would work,
apart from the "oops, now it doesn't enforce the kmalloc limits any more"
issue.

> > - that divide is really really expensive on many architectures.

> 'c' and 'size' are _supposed_ to be constant and get evaluated at
> compile-time. ie you should get something like this on x86:

I guess that willalways be true of the 'kvzalloc_struct() version that
will always use a sizeof(). I was more thinking of any bare kvalloc_ab_c()
cases, but maybe we'd discourage that to ever be used as such?

Because we definitely have things like that, ie a quick grep finds

f = kmalloc (sizeof (*f) + size*num, GFP_KERNEL);

where 'size' is not obviously a constant. There may be others, but I didn't
really try to grep any further.

Maybe they aren't common, and maybe the occasional divide doesn't matter,
but particularly if we use scripting to then catch and convert users, I
really hate the idea of "let's introduce something that is potentially much
more expensive than it needs to be".

(And the automated coccinelle scripting it's also something where we must
very much avoid then subtly lifting allocation size limits)

> > Something like
> >
> > if (size > PAGE_SIZE)
> > return NULL;
> > if (elem > 65535)
> > return NULL;
> > if (offset > PAGE_SIZE)
> > return NULL;
> > return kzalloc(size*elem+offset);
> >
> > and now you (a) guarantee it can't overflow and (b) don't make people
use
> > crazy vmalloc() allocations when they damn well shouldn't.

> I find your faith in the size of structs in the kernel touching ;-)

I *really* hope some of those examples of yours aren't allocated with
kmalloc using this pattern..

But yes, I may be naive on the sizes.

> We really have two reasons for using vmalloc -- one is "fragmentation
> currently makes it impossible to allocate enough contiguous memory
> to satisfy your needs" and the other is "this request is for too much
> memory to satisfy through the buddy allocator". kvmalloc is normally
> (not always; see file descriptor example above) for the first kind of
> problem, but I wonder if kvmalloc() shouldn't have the same limit as
> kmalloc (2048 pages), then add a kvmalloc_large() which will not impose
> that limit check.

That would definitely solve at least one worry.

We do potentially have users which use kmalloc optimistically and do not
want to fall back to vmalloc (they'll fall back to smaller allocations
entirely), but I guess if fwe make sure to not convert any
__GFP_NOWARN/NORETRY users, that should all be ok.

But honestly, I'd rather see just kmalloc users stay as kmalloc users.

If instread of "kzvmalloc_ab_c()" you introduce the "struct size
calculation" part as a "saturating non-overflow calculation", then it
should be fairly easy to just do

#define kzalloc_struct(p, member, n, gfp) \
kzalloc(struct_size(p, member, n, gfp)

and you basically *know* that the only thing you changed was purely the
overflow semantics.

That also would take care of my diide worry, because now there wouldn't be
any ab_c() calculations that might take a non-constant size. The
"struct_size()" thing would always do the sizeof.

So you'd have something like

/* 'b' and 'c' are always constants */
static inline sizeof_t __ab_c_saturating(size_t a, size_t b, size_t c)
{
if (b && n > (SIZE_MAX -c) / size)
return SIZE_MAX;
return a*b+c;
}
#define struct_size(p, member, n) \
__ab_c_saturating(n, \
sizeof(*(p)->member) + __must_be_array((p)->member), \
offsetof(typeof(*(p)), member))

and then that kzalloc_struct() wrapper define above should just work.

The above is entirely untested, but it *looks* sane, and avoids all
semantic changes outside of the overflow protection. And nobody would use
that __ab_c_saturating() by mistake, I hope.

No?

Linus

2018-05-04 16:05:30

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

On Fri, May 4, 2018 at 8:35 AM, Linus Torvalds
<[email protected]> wrote:
> On Fri, May 4, 2018 at 3:14 AM Matthew Wilcox <[email protected]> wrote:
>
>> > In fact, the conversion I saw was buggy. You can *not* convert a
> GFP_ATOMIC
>> > user of kmalloc() to use kvmalloc.
>
>> Not sure which conversion you're referring to; not one of mine, I hope?
>
> I was thinking of the coccinelle patch in this thread, but just realized
> that that actually only did it for GFP_KERNEL, so I guess it would work,
> apart from the "oops, now it doesn't enforce the kmalloc limits any more"
> issue.

Just to be clear: the Coccinelle scripts I'm building aren't doing a
kmalloc -> kvmalloc conversion. I'm just removing all the 2-factor
multiplication and replacing it with the appropriate calls to the
allocator family's *calloc or *alloc_array(). This will get us to the
place where we can do all the sanity-checking in the allocator
functions (whatever that checking ends up being). As it turns out,
though, we have kind of a lot of allocator families. Some are
wrappers, like devm_*alloc(), etc.

All that said, the overwhelming majority of *alloc() multiplications
are just "count * sizeof()". It really feels like everything should
just be using a new *alloc_struct() which can do the type checking,
etc, etc, but we can get there. The remaining "count * size" are a
minority and could be dealt with some other way.

>> > - that divide is really really expensive on many architectures.
>
>> 'c' and 'size' are _supposed_ to be constant and get evaluated at
>> compile-time. ie you should get something like this on x86:
>
> I guess that willalways be true of the 'kvzalloc_struct() version that
> will always use a sizeof(). I was more thinking of any bare kvalloc_ab_c()
> cases, but maybe we'd discourage that to ever be used as such?

Yeah, bare *alloc_ab_c() is not great. Perhaps a leading "__" can hint to that?

> Because we definitely have things like that, ie a quick grep finds
>
> f = kmalloc (sizeof (*f) + size*num, GFP_KERNEL);
>
> where 'size' is not obviously a constant. There may be others, but I didn't
> really try to grep any further.
>
> Maybe they aren't common, and maybe the occasional divide doesn't matter,
> but particularly if we use scripting to then catch and convert users, I
> really hate the idea of "let's introduce something that is potentially much
> more expensive than it needs to be".

Yup: I'm not after that either. I just want to be able to get at the
multiplication factors before they're multiplied. :)

> (And the automated coccinelle scripting it's also something where we must
> very much avoid then subtly lifting allocation size limits)

Agreed. I think most cases are already getting lifted to size_t due to
the sizeof(). It's the "two variables" cases I want to double-check.
Another completely insane idea would be to have a macro that did type
size checking and would DTRT, but with all the alloc families, it
looks nasty. This is all RFC stage, as far as I'm concerned.

Fun example: devm_kzalloc(dev, sizeof(...) * num, gfp...)

$ git grep 'devm_kzalloc([^,]*, *sizeof([^,]*,' | egrep '\* *sizeof|\)
*\*' | wc -l
88

some are constants:
drivers/video/fbdev/au1100fb.c: devm_kzalloc(&dev->dev,
sizeof(u32) * 16, GFP_KERNEL);

but many aren't:
sound/soc/generic/audio-graph-card.c: dai_link = devm_kzalloc(dev,
sizeof(*dai_link) * num, GFP_KERNEL);

While type-checking on the non-sizeof factor would let us know if it
was safe, so would the division, and most of those could happen at
compile time. It's the size_t variables that we want to catch.

So, mainly I'm just trying to get the arguments reordered (for a
compile-time division) into the correct helpers so the existing logic
can do the right thing, and only for 2-factor products. After that,
then I'm hoping to tackle the multi-factor products, of which the
*alloc_struct() helper seems to cover the vast majority of the
remaining cases.

-Kees

--
Kees Cook
Pixel Security