2018-05-05 01:08:51

by Kees Cook

[permalink] [raw]
Subject: *alloc API changes

Hi,

After writing up this email, I think I don't like this idea, but I'm
still curious to see what people think of the general observations...


The number of permutations for our various allocation function is
rather huge. Currently, it is:

system or wrapper:
kmem_cache_alloc, kmalloc, vmalloc, kvmalloc, devm_kmalloc,
dma_alloc_coherent, pci_alloc_consistent, kmem_alloc, f2fs_kvalloc,
and probably others I haven't found yet.

allocation method (not all available in all APIs):
regular (kmalloc), zeroed (kzalloc), array (kmalloc_array), zeroed
array (kcalloc)

with or without node argument:
+_node

so, for example, we have things like: kmalloc_array_node() or
pci_zalloc_consistent()

Using some attempts at static analysis with git grep and coccinelle,
nearly all of these take GFP flags, but something near 80% of
functions are using GFP_KERNEL. Roughly half of the callers are
including a sizeof(). Only 20% are using zeroing, 15% are using
products as a size, and 3% are using ..._node, etc.

I wonder if we might be able to rearrange our APIs for the general
case and include a common "kitchen sink" API for the less common
options. I.e. why do we have an entire set of _node() APIs, 2 sets for
zeroing (kzalloc, kcalloc), etc.

kmalloc()-family was meant to be a simplification of
kmem_cache_alloc(). vmalloc() duplicated the kmalloc()-family, and
kvmalloc() does too. Then we have "specialty" allocators (devm, dma,
pci, f2fs, xfs's kmem) that have subsets and want to perform other
actions around the base allocators or have their own entirely (e.g.
dma).

Instead of all the variations, it seems like we just want a per-family
alloc() and alloc_attrs(), where alloc_attrs() could handle the less
common stuff (e.g. gfp, zero, node).

kmalloc(size, GFP_KERNEL)
becomes a nice:
kmalloc(size)

However, then
kzalloc(size, GFP_KERNEL)
becomes the ugly:
kmalloc_attrs(size, GFP_KERNEL, ALLOC_ZERO, NUMA_NO_NODE);

(But I guess we could make macro wrappers for zeroing, or node?)

But this doesn't solve the multiplication overflow case at all, which
is my real goal. Trying to incorporate some of the ideas from other
threads, maybe we could have a multiplication helper that would
saturate and the allocator would see that as a signal to return NULL?
e.g.:

inline size_t mult(size_t a, size_t b)
{
if (b != 0 && a >= SIZE_MAX / b)
return SIZE_MAX;
return a * b;
}
(really, this kind of helper should be based on Rasmus's helpers which
do correct type handling)

void *kmalloc(size_t size)
{
if (size == SIZE_MAX)
return NULL;
kmalloc_attrs(size, GFP_KERNEL, ALLOC_NOZERO, NUMA_NO_NODE);
}

then we get:
var = kmalloc(mult(num, sizeof(*var)));

we could drop the *calloc(), *zalloc(), and *_array(), leaving only
*alloc() and *alloc_attrs() for all the allocator families.

I honestly can't tell if this is worse than doing all the family
conversions to *calloc() and *_array() for the 1000ish instances of
2-factor products used for size arguments in the *alloc() functions.
We could still nest them for the 3-factor ones?
var = kmalloc(multi(row, mult(column, sizeof(*var))));

But now we're just pretending to be LISP.

And really, I'd like to keep the nicer *alloc_struct() with all its
type checking. But then do we do *zalloc_struct(),
*alloc_struct_node(), etc, etc?

Bleh. C sucks for this.

-Kees

--
Kees Cook
Pixel Security


2018-05-05 03:49:36

by Matthew Wilcox

[permalink] [raw]
Subject: Re: *alloc API changes

On Fri, May 04, 2018 at 06:08:23PM -0700, Kees Cook wrote:
> The number of permutations for our various allocation function is
> rather huge. Currently, it is:
>
> system or wrapper:
> kmem_cache_alloc, kmalloc, vmalloc, kvmalloc, devm_kmalloc,
> dma_alloc_coherent, pci_alloc_consistent, kmem_alloc, f2fs_kvalloc,
> and probably others I haven't found yet.

dma_pool_alloc, page_frag_alloc, gen_pool_alloc, __alloc_bootmem_node,
cma_alloc, quicklist_alloc (deprecated), mempool_alloc

and if you're counting f2fs_*alloc, there's a metric tonne of *alloc
wrappers out there.

> allocation method (not all available in all APIs):
> regular (kmalloc), zeroed (kzalloc), array (kmalloc_array), zeroed
> array (kcalloc)

... other initialiser (kmem_cache_alloc)

> I wonder if we might be able to rearrange our APIs for the general
> case and include a common "kitchen sink" API for the less common
> options. I.e. why do we have an entire set of _node() APIs, 2 sets for
> zeroing (kzalloc, kcalloc), etc.

I'd love it if we had a common pattern for these things. A regular API
appeals to me (I blame those RISC people in my formative years).

> kmalloc()-family was meant to be a simplification of
> kmem_cache_alloc().

That's a little revisionist ;-) We had kmalloc before we had the slab
allocator (kernel 1.2, I think?). But I see your point, and that's
certainly how it's implemented these days.

> vmalloc() duplicated the kmalloc()-family, and
> kvmalloc() does too. Then we have "specialty" allocators (devm, dma,
> pci, f2fs, xfs's kmem) that have subsets and want to perform other
> actions around the base allocators or have their own entirely (e.g.
> dma).
>
> Instead of all the variations, it seems like we just want a per-family
> alloc() and alloc_attrs(), where alloc_attrs() could handle the less
> common stuff (e.g. gfp, zero, node).
>
> kmalloc(size, GFP_KERNEL)
> becomes a nice:
> kmalloc(size)

I got shot down for proposing adding
#define malloc(x) kmalloc(x, GFP_KERNEL)
on the grounds that driver writers will then use malloc in interrupt
context. So I think our base version has to be foo_alloc(size, gfp_t).

> But this doesn't solve the multiplication overflow case at all, which
> is my real goal. Trying to incorporate some of the ideas from other
> threads, maybe we could have a multiplication helper that would
> saturate and the allocator would see that as a signal to return NULL?
> e.g.:
>
> inline size_t mult(size_t a, size_t b)
> {
> if (b != 0 && a >= SIZE_MAX / b)
> return SIZE_MAX;
> return a * b;
> }
> (really, this kind of helper should be based on Rasmus's helpers which
> do correct type handling)

Right, I was thinking:

static inline size_t mul_ab(size_t a, size_t b)
{
#if COMPILER_SUPPORTS_OVERFLOW
unsigned long c;
if (__builtin_mul_overflow(a, b, &c))
return SIZE_MAX;
return c;
#else
if (b != 0 && a >= SIZE_MAX / b)
return SIZE_MAX;
return a * b;
#endif
}

> void *kmalloc(size_t size)
> {
> if (size == SIZE_MAX)
> return NULL;
> kmalloc_attrs(size, GFP_KERNEL, ALLOC_NOZERO, NUMA_NO_NODE);
> }

You don't need the size check here. We have the size check buried deep in
alloc_pages (look for MAX_ORDER), so kmalloc and then alloc_pages will try
a bunch of paths all of which fail before returning NULL.

> then we get:
> var = kmalloc(mult(num, sizeof(*var)));
>
> we could drop the *calloc(), *zalloc(), and *_array(), leaving only
> *alloc() and *alloc_attrs() for all the allocator families.
>
> I honestly can't tell if this is worse than doing all the family
> conversions to *calloc() and *_array() for the 1000ish instances of
> 2-factor products used for size arguments in the *alloc() functions.
> We could still nest them for the 3-factor ones?
> var = kmalloc(multi(row, mult(column, sizeof(*var))));
>
> But now we're just pretending to be LISP.

I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest
calls to mult().

> And really, I'd like to keep the nicer *alloc_struct() with all its
> type checking. But then do we do *zalloc_struct(),
> *alloc_struct_node(), etc, etc?

Nono, Linus had the better proposal, struct_size(p, member, n).

> Bleh. C sucks for this.

Ooh, we could instantiate classes and ... yeah, no, not C++. We *could*
abuse the C preprocessor to autogenerate every variant, but I hate that
because you can't grep for it.

One of the problems with having the single-argument foo_alloc be a static
inline for foo_alloc_attrs is that you then have to marshall four arguments
for the call instead of just one. I would have two exported symbols for
each variant.

2018-05-05 04:27:31

by Kees Cook

[permalink] [raw]
Subject: Re: *alloc API changes

On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <[email protected]> wrote:
> and if you're counting f2fs_*alloc, there's a metric tonne of *alloc
> wrappers out there.

Yeah. *sob*

> That's a little revisionist ;-) We had kmalloc before we had the slab
> allocator (kernel 1.2, I think?). But I see your point, and that's
> certainly how it's implemented these days.

Okay, yes, that's true. I did think of that briefly. :)

> I got shot down for proposing adding
> #define malloc(x) kmalloc(x, GFP_KERNEL)
> on the grounds that driver writers will then use malloc in interrupt
> context. So I think our base version has to be foo_alloc(size, gfp_t).

Okay, fair enough.

> Right, I was thinking:
>
> static inline size_t mul_ab(size_t a, size_t b)
> {
> #if COMPILER_SUPPORTS_OVERFLOW
> unsigned long c;
> if (__builtin_mul_overflow(a, b, &c))
> return SIZE_MAX;
> return c;
> #else
> if (b != 0 && a >= SIZE_MAX / b)
> return SIZE_MAX;
> return a * b;
> #endif
> }

Rasmus, what do you think of a saturating version of your helpers?

The only fear I have with the saturating helpers is that we'll end up
using them in places that don't recognize SIZE_MAX. Like, say:

size = mul(a, b) + 1;

then *poof* size == 0. Now, I'd hope that code would use add(mul(a,
b), 1), but still... it makes me nervous.

> You don't need the size check here. We have the size check buried deep in
> alloc_pages (look for MAX_ORDER), so kmalloc and then alloc_pages will try
> a bunch of paths all of which fail before returning NULL.

Good point. Though it does kind of creep me out to let a known-bad
size float around in the allocator until it decides to reject it. I
would think an early:

if (unlikely(size == SIZE_MAX))
return NULL;

would have virtually no cycle count difference...

> I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest
> calls to mult().

Agreed. I think having exactly those would cover almost everything,
and the two places where a 4-factor product is needed could just nest
them. (bikeshed: the very common mul_ab() should just be mul(), IMO.)

> Nono, Linus had the better proposal, struct_size(p, member, n).

Oh, yes! I totally missed that in the threads.

> Ooh, we could instantiate classes and ... yeah, no, not C++. We *could*
> abuse the C preprocessor to autogenerate every variant, but I hate that
> because you can't grep for it.

Right, no. I think if we can ditch *calloc() and _array() by using
saturating helpers, we'll have the API in a much better form:

kmalloc(foo * bar, GFP_KERNEL);
into
kmalloc_array(foo, bar, GFP_KERNEL);
into
kmalloc(mul(foo, bar), GFP_KERNEL);

and

kmalloc(foo * bar, GFP_KERNEL | __GFP_ZERO);
into
kzalloc(foo * bar, GFP_KERNEL);
into
kcalloc(foo, bar, GFP_KERNEL);
into
kzalloc(mul(foo, bar), GFP_KERNEL);

and the fun

kzalloc(sizeof(*header) + count * sizeof(*header->element), GFP_KERNEL);
into
kzalloc(struct_size(header, element, count), GFP_KERNEL);

modulo all *alloc* families...

?

-Kees

--
Kees Cook
Pixel Security

2018-05-05 04:31:51

by Matthew Wilcox

[permalink] [raw]
Subject: Re: *alloc API changes

On Fri, May 04, 2018 at 08:46:46PM -0700, Matthew Wilcox wrote:
> On Fri, May 04, 2018 at 06:08:23PM -0700, Kees Cook wrote:
> > The number of permutations for our various allocation function is
> > rather huge. Currently, it is:
> >
> > system or wrapper:
> > kmem_cache_alloc, kmalloc, vmalloc, kvmalloc, devm_kmalloc,
> > dma_alloc_coherent, pci_alloc_consistent, kmem_alloc, f2fs_kvalloc,
> > and probably others I haven't found yet.
>
> dma_pool_alloc, page_frag_alloc, gen_pool_alloc, __alloc_bootmem_node,
> cma_alloc, quicklist_alloc (deprecated), mempool_alloc
>
> > allocation method (not all available in all APIs):
> > regular (kmalloc), zeroed (kzalloc), array (kmalloc_array), zeroed
> > array (kcalloc)
>
> ... other initialiser (kmem_cache_alloc)

I meant to say that we have a shocking dearth of foo_realloc() functions.
Instead we have drivers and core parts of the kernel implementing their
own stupid slow alloc-a-new-chunk-of-memory-and-memcpy-from-the-old-then-
free when the allocator can probably do a better job (eg vmalloc may
be able to expand the existing are, and if it can't do that, it can
at least remap the underlying pages; the slab allocator may be able to
resize without growing, eg if you krealloc from 1200 bytes to 2000 bytes,
that's going to come out of the same slab).

So, yeah, adding those increases the API permutations even further.
And don't ask about what happens if you allocate with GFP_DMA then
realloc with GFP_HIGHMEM.

2018-05-07 11:39:38

by Matthew Wilcox

[permalink] [raw]
Subject: Re: *alloc API changes

On Fri, May 04, 2018 at 09:24:56PM -0700, Kees Cook wrote:
> On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <[email protected]> wrote:
> The only fear I have with the saturating helpers is that we'll end up
> using them in places that don't recognize SIZE_MAX. Like, say:
>
> size = mul(a, b) + 1;
>
> then *poof* size == 0. Now, I'd hope that code would use add(mul(a,
> b), 1), but still... it makes me nervous.

That's reasonable. So let's add:

#define ALLOC_TOO_BIG (PAGE_SIZE << MAX_ORDER)

(there's a presumably somewhat obsolete CONFIG_FORCE_MAX_ZONEORDER on some
architectures which allows people to configure MAX_ORDER all the way up
to 64. That config option needs to go away, or at least be limited to
a much lower value).

On x86, that's 4k << 11 = 8MB. On PPC, that might be 64k << 9 == 32MB.
Those values should be relatively immune to further arithmetic causing
an additional overflow.

> Good point. Though it does kind of creep me out to let a known-bad
> size float around in the allocator until it decides to reject it. I
> would think an early:
>
> if (unlikely(size == SIZE_MAX))
> return NULL;
>
> would have virtually no cycle count difference...

I don't think it should go in the callers though ... where it goes in
the allocator is up to the allocator maintainers ;-)

> > I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest
> > calls to mult().
>
> Agreed. I think having exactly those would cover almost everything,
> and the two places where a 4-factor product is needed could just nest
> them. (bikeshed: the very common mul_ab() should just be mul(), IMO.)
>
> > Nono, Linus had the better proposal, struct_size(p, member, n).
>
> Oh, yes! I totally missed that in the threads.

so we're agreed on struct_size(). I think rather than the explicit 'mul',
perhaps we should have array_size() and array3_size().

> Right, no. I think if we can ditch *calloc() and _array() by using
> saturating helpers, we'll have the API in a much better form:
>
> kmalloc(foo * bar, GFP_KERNEL);
> into
> kmalloc_array(foo, bar, GFP_KERNEL);
> into
> kmalloc(mul(foo, bar), GFP_KERNEL);

kmalloc(array_size(foo, bar), GFP_KERNEL);

> and the fun
>
> kzalloc(sizeof(*header) + count * sizeof(*header->element), GFP_KERNEL);
> into
> kzalloc(struct_size(header, element, count), GFP_KERNEL);
>
> modulo all *alloc* families...
>
> ?

I think we're broadly in agreement here!

2018-05-07 16:04:37

by Kees Cook

[permalink] [raw]
Subject: Re: *alloc API changes

On Mon, May 7, 2018 at 4:39 AM, Matthew Wilcox <[email protected]> wrote:
> On Fri, May 04, 2018 at 09:24:56PM -0700, Kees Cook wrote:
>> On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <[email protected]> wrote:
>> The only fear I have with the saturating helpers is that we'll end up
>> using them in places that don't recognize SIZE_MAX. Like, say:
>>
>> size = mul(a, b) + 1;
>>
>> then *poof* size == 0. Now, I'd hope that code would use add(mul(a,
>> b), 1), but still... it makes me nervous.
>
> That's reasonable. So let's add:
>
> #define ALLOC_TOO_BIG (PAGE_SIZE << MAX_ORDER)
>
> (there's a presumably somewhat obsolete CONFIG_FORCE_MAX_ZONEORDER on some
> architectures which allows people to configure MAX_ORDER all the way up
> to 64. That config option needs to go away, or at least be limited to
> a much lower value).
>
> On x86, that's 4k << 11 = 8MB. On PPC, that might be 64k << 9 == 32MB.
> Those values should be relatively immune to further arithmetic causing
> an additional overflow.

But we can do larger than 8MB allocations with vmalloc, can't we?

> I don't think it should go in the callers though ... where it goes in
> the allocator is up to the allocator maintainers ;-)

We need a self-test regardless, so checking that each allocator
returns NULL with the saturated value can be done.

>> > I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest
>> > calls to mult().
>>
>> Agreed. I think having exactly those would cover almost everything,
>> and the two places where a 4-factor product is needed could just nest
>> them. (bikeshed: the very common mul_ab() should just be mul(), IMO.)
>>
>> > Nono, Linus had the better proposal, struct_size(p, member, n).
>>
>> Oh, yes! I totally missed that in the threads.
>
> so we're agreed on struct_size(). I think rather than the explicit 'mul',
> perhaps we should have array_size() and array3_size().

I do like the symmetry there. My earlier "what if someone does +1"
continues to scratch at my brain, though I think it's likely
unimportant: there's no indication (in the name) that these calls
saturate. Will someone ever do something crazy like: array_size(a, b)
/ array_size(c, d) and they can, effectively, a truncated value (if
"a, b" saturated and "c, d" didn't...)?

>> Right, no. I think if we can ditch *calloc() and _array() by using
>> saturating helpers, we'll have the API in a much better form:
>>
>> kmalloc(foo * bar, GFP_KERNEL);
>> into
>> kmalloc_array(foo, bar, GFP_KERNEL);
>> into
>> kmalloc(mul(foo, bar), GFP_KERNEL);
>
> kmalloc(array_size(foo, bar), GFP_KERNEL);

I can't come up with a better name. :P When it was "mul()" I was
thinking "smul()" for "saturating multiply". sarray_size() seems ...
bonkers.

> I think we're broadly in agreement here!

Do we want addition helpers? (And division and subtraction?)

-Kees

--
Kees Cook
Pixel Security

2018-05-07 20:20:54

by Matthew Wilcox

[permalink] [raw]
Subject: Re: *alloc API changes

On Mon, May 07, 2018 at 09:03:54AM -0700, Kees Cook wrote:
> On Mon, May 7, 2018 at 4:39 AM, Matthew Wilcox <[email protected]> wrote:
> > On Fri, May 04, 2018 at 09:24:56PM -0700, Kees Cook wrote:
> >> On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <[email protected]> wrote:
> >> The only fear I have with the saturating helpers is that we'll end up
> >> using them in places that don't recognize SIZE_MAX. Like, say:
> >>
> >> size = mul(a, b) + 1;
> >>
> >> then *poof* size == 0. Now, I'd hope that code would use add(mul(a,
> >> b), 1), but still... it makes me nervous.
> >
> > That's reasonable. So let's add:
> >
> > #define ALLOC_TOO_BIG (PAGE_SIZE << MAX_ORDER)
> >
> > (there's a presumably somewhat obsolete CONFIG_FORCE_MAX_ZONEORDER on some
> > architectures which allows people to configure MAX_ORDER all the way up
> > to 64. That config option needs to go away, or at least be limited to
> > a much lower value).
> >
> > On x86, that's 4k << 11 = 8MB. On PPC, that might be 64k << 9 == 32MB.
> > Those values should be relatively immune to further arithmetic causing
> > an additional overflow.
>
> But we can do larger than 8MB allocations with vmalloc, can't we?

Yes. And today with kvmalloc. However, I proposed to Linus that
kvmalloc() shouldn't allow it -- we should have kvmalloc_large() which
would, but kvmalloc wouldn't. He liked that idea, so I'm going with it.

There are very, very few places which should need kvmalloc_large.
That's one million 8-byte pointers. If you need more than that inside
the kernel, you're doing something really damn weird and should do
something that looks obviously different.

> > I don't think it should go in the callers though ... where it goes in
> > the allocator is up to the allocator maintainers ;-)
>
> We need a self-test regardless, so checking that each allocator
> returns NULL with the saturated value can be done.

Yes, absolutely.

> >> > I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest
> >> > calls to mult().
> >>
> >> Agreed. I think having exactly those would cover almost everything,
> >> and the two places where a 4-factor product is needed could just nest
> >> them. (bikeshed: the very common mul_ab() should just be mul(), IMO.)
> >>
> >> > Nono, Linus had the better proposal, struct_size(p, member, n).
> >>
> >> Oh, yes! I totally missed that in the threads.
> >
> > so we're agreed on struct_size(). I think rather than the explicit 'mul',
> > perhaps we should have array_size() and array3_size().
>
> I do like the symmetry there. My earlier "what if someone does +1"
> continues to scratch at my brain, though I think it's likely
> unimportant: there's no indication (in the name) that these calls
> saturate. Will someone ever do something crazy like: array_size(a, b)
> / array_size(c, d) and they can, effectively, a truncated value (if
> "a, b" saturated and "c, d" didn't...)?

Without CPU support for a saturated value, there's no "safe" saturated
value. You can always come up with some arithmetic that will bring it
back into the valid range. All we can do is go "large enough" and hope.

> >> Right, no. I think if we can ditch *calloc() and _array() by using
> >> saturating helpers, we'll have the API in a much better form:
> >>
> >> kmalloc(foo * bar, GFP_KERNEL);
> >> into
> >> kmalloc_array(foo, bar, GFP_KERNEL);
> >> into
> >> kmalloc(mul(foo, bar), GFP_KERNEL);
> >
> > kmalloc(array_size(foo, bar), GFP_KERNEL);
>
> I can't come up with a better name. :P When it was "mul()" I was
> thinking "smul()" for "saturating multiply". sarray_size() seems ...
> bonkers.

smul would mean 'signed multiply' to me, having read the GCC manual:
-- Built-in Function: bool __builtin_smul_overflow (int a, int b, int *res)

but I thought of another problem with array_size. We already have
ARRAY_SIZE and it means "the number of elements in the array".

so ... struct_bytes(), array_bytes(), array3_bytes()?

> > I think we're broadly in agreement here!
>
> Do we want addition helpers? (And division and subtraction?)

Keeping our focus on allocations ... do we have plain additions (as
opposed to multiply-and-add?) And subtraction?

2018-05-07 20:28:28

by Kees Cook

[permalink] [raw]
Subject: Re: *alloc API changes

On Mon, May 7, 2018 at 1:19 PM, Matthew Wilcox <[email protected]> wrote:
> On Mon, May 07, 2018 at 09:03:54AM -0700, Kees Cook wrote:
>> On Mon, May 7, 2018 at 4:39 AM, Matthew Wilcox <[email protected]> wrote:
>> > On Fri, May 04, 2018 at 09:24:56PM -0700, Kees Cook wrote:
>> >> On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <[email protected]> wrote:
>> >> The only fear I have with the saturating helpers is that we'll end up
>> >> using them in places that don't recognize SIZE_MAX. Like, say:
>> >>
>> >> size = mul(a, b) + 1;
>> >>
>> >> then *poof* size == 0. Now, I'd hope that code would use add(mul(a,
>> >> b), 1), but still... it makes me nervous.
>> >
>> > That's reasonable. So let's add:
>> >
>> > #define ALLOC_TOO_BIG (PAGE_SIZE << MAX_ORDER)
>> >
>> > (there's a presumably somewhat obsolete CONFIG_FORCE_MAX_ZONEORDER on some
>> > architectures which allows people to configure MAX_ORDER all the way up
>> > to 64. That config option needs to go away, or at least be limited to
>> > a much lower value).
>> >
>> > On x86, that's 4k << 11 = 8MB. On PPC, that might be 64k << 9 == 32MB.
>> > Those values should be relatively immune to further arithmetic causing
>> > an additional overflow.
>>
>> But we can do larger than 8MB allocations with vmalloc, can't we?
>
> Yes. And today with kvmalloc. However, I proposed to Linus that
> kvmalloc() shouldn't allow it -- we should have kvmalloc_large() which
> would, but kvmalloc wouldn't. He liked that idea, so I'm going with it.

How would we handle size calculations for _large?

> There are very, very few places which should need kvmalloc_large.
> That's one million 8-byte pointers. If you need more than that inside
> the kernel, you're doing something really damn weird and should do
> something that looks obviously different.

I'm CCing John since I remember long ago running into problems loading
the AppArmor DFA with kmalloc and switching it to kvmalloc. John, how
large can the DFAs for AppArmor get? Would an 8MB limit be a problem?

And do we have any large IO or network buffers >8MB?

>> > I don't think it should go in the callers though ... where it goes in
>> > the allocator is up to the allocator maintainers ;-)
>>
>> We need a self-test regardless, so checking that each allocator
>> returns NULL with the saturated value can be done.
>
> Yes, absolutely.
>
>> >> > I'd rather have a mul_ab(), mul_abc(), mul_ab_add_c(), etc. than nest
>> >> > calls to mult().
>> >>
>> >> Agreed. I think having exactly those would cover almost everything,
>> >> and the two places where a 4-factor product is needed could just nest
>> >> them. (bikeshed: the very common mul_ab() should just be mul(), IMO.)
>> >>
>> >> > Nono, Linus had the better proposal, struct_size(p, member, n).
>> >>
>> >> Oh, yes! I totally missed that in the threads.
>> >
>> > so we're agreed on struct_size(). I think rather than the explicit 'mul',
>> > perhaps we should have array_size() and array3_size().
>>
>> I do like the symmetry there. My earlier "what if someone does +1"
>> continues to scratch at my brain, though I think it's likely
>> unimportant: there's no indication (in the name) that these calls
>> saturate. Will someone ever do something crazy like: array_size(a, b)
>> / array_size(c, d) and they can, effectively, a truncated value (if
>> "a, b" saturated and "c, d" didn't...)?
>
> Without CPU support for a saturated value, there's no "safe" saturated
> value. You can always come up with some arithmetic that will bring it
> back into the valid range. All we can do is go "large enough" and hope.
>
>> >> Right, no. I think if we can ditch *calloc() and _array() by using
>> >> saturating helpers, we'll have the API in a much better form:
>> >>
>> >> kmalloc(foo * bar, GFP_KERNEL);
>> >> into
>> >> kmalloc_array(foo, bar, GFP_KERNEL);
>> >> into
>> >> kmalloc(mul(foo, bar), GFP_KERNEL);
>> >
>> > kmalloc(array_size(foo, bar), GFP_KERNEL);
>>
>> I can't come up with a better name. :P When it was "mul()" I was
>> thinking "smul()" for "saturating multiply". sarray_size() seems ...
>> bonkers.
>
> smul would mean 'signed multiply' to me, having read the GCC manual:
> -- Built-in Function: bool __builtin_smul_overflow (int a, int b, int *res)
>
> but I thought of another problem with array_size. We already have
> ARRAY_SIZE and it means "the number of elements in the array".
>
> so ... struct_bytes(), array_bytes(), array3_bytes()?

Maybe "calc"? struct_calc(), array_calc(), array3_calc()? This has the
benefit of actually saying more about what it is doing, rather than
its return value... In the end, I don't care. :)

>> > I think we're broadly in agreement here!
>>
>> Do we want addition helpers? (And division and subtraction?)
>
> Keeping our focus on allocations ... do we have plain additions (as
> opposed to multiply-and-add?) And subtraction?

All I've seen are just rare "weird" cases of lots of mult/add. Some
are way worse than others:
http://www.ozlabs.org/~akpm/mmotm/broken-out/exofs-avoid-vla-in-structures.patch

Just having the mult/add saturation would be lovely.

-Kees

--
Kees Cook
Pixel Security

2018-05-07 20:50:03

by Matthew Wilcox

[permalink] [raw]
Subject: Re: *alloc API changes

On Mon, May 07, 2018 at 01:27:38PM -0700, Kees Cook wrote:
> On Mon, May 7, 2018 at 1:19 PM, Matthew Wilcox <[email protected]> wrote:
> > Yes. And today with kvmalloc. However, I proposed to Linus that
> > kvmalloc() shouldn't allow it -- we should have kvmalloc_large() which
> > would, but kvmalloc wouldn't. He liked that idea, so I'm going with it.
>
> How would we handle size calculations for _large?

I'm not sure we should, at least initially. The very few places which
need a large kvmalloc really are special and can do their own careful
checking. Because, as Linus pointed out, we shouldn't be letting the
user ask us to allocate a terabyte of RAM. We should just fail that.

let's see how those users pan out, and then see what we can offer in
terms of safety.

> > There are very, very few places which should need kvmalloc_large.
> > That's one million 8-byte pointers. If you need more than that inside
> > the kernel, you're doing something really damn weird and should do
> > something that looks obviously different.
>
> I'm CCing John since I remember long ago running into problems loading
> the AppArmor DFA with kmalloc and switching it to kvmalloc. John, how
> large can the DFAs for AppArmor get? Would an 8MB limit be a problem?

Great! Opinions from people who'll use this interface are exceptionally
useful.

> And do we have any large IO or network buffers >8MB?

Not that get allocated with kvmalloc ... because you can't DMA map vmalloc
(without doing some unusual contortions).

> > but I thought of another problem with array_size. We already have
> > ARRAY_SIZE and it means "the number of elements in the array".
> >
> > so ... struct_bytes(), array_bytes(), array3_bytes()?
>
> Maybe "calc"? struct_calc(), array_calc(), array3_calc()? This has the
> benefit of actually saying more about what it is doing, rather than
> its return value... In the end, I don't care. :)

I don't have a strong feeling on this either.

> > Keeping our focus on allocations ... do we have plain additions (as
> > opposed to multiply-and-add?) And subtraction?
>
> All I've seen are just rare "weird" cases of lots of mult/add. Some
> are way worse than others:
> http://www.ozlabs.org/~akpm/mmotm/broken-out/exofs-avoid-vla-in-structures.patch
>
> Just having the mult/add saturation would be lovely.

Ow. My brain just oozed out of my ears.

2018-05-07 21:16:07

by Kees Cook

[permalink] [raw]
Subject: Re: *alloc API changes

On Mon, May 7, 2018 at 1:49 PM, Matthew Wilcox <[email protected]> wrote:
> On Mon, May 07, 2018 at 01:27:38PM -0700, Kees Cook wrote:
>> On Mon, May 7, 2018 at 1:19 PM, Matthew Wilcox <[email protected]> wrote:
>> > Yes. And today with kvmalloc. However, I proposed to Linus that
>> > kvmalloc() shouldn't allow it -- we should have kvmalloc_large() which
>> > would, but kvmalloc wouldn't. He liked that idea, so I'm going with it.
>>
>> How would we handle size calculations for _large?
>
> I'm not sure we should, at least initially. The very few places which
> need a large kvmalloc really are special and can do their own careful
> checking. Because, as Linus pointed out, we shouldn't be letting the
> user ask us to allocate a terabyte of RAM. We should just fail that.
>
> let's see how those users pan out, and then see what we can offer in
> terms of safety.
>
>> > There are very, very few places which should need kvmalloc_large.
>> > That's one million 8-byte pointers. If you need more than that inside
>> > the kernel, you're doing something really damn weird and should do
>> > something that looks obviously different.
>>
>> I'm CCing John since I remember long ago running into problems loading
>> the AppArmor DFA with kmalloc and switching it to kvmalloc. John, how
>> large can the DFAs for AppArmor get? Would an 8MB limit be a problem?
>
> Great! Opinions from people who'll use this interface are exceptionally
> useful.
>
>> And do we have any large IO or network buffers >8MB?
>
> Not that get allocated with kvmalloc ... because you can't DMA map vmalloc
> (without doing some unusual contortions).

Er, yes, right. I meant for _all_ allocators, though. If 8MB is going
to be the new "saturated" value? Maybe I misunderstood? What are you
proposing for the code of array_size()?

>> > but I thought of another problem with array_size. We already have
>> > ARRAY_SIZE and it means "the number of elements in the array".
>> >
>> > so ... struct_bytes(), array_bytes(), array3_bytes()?
>>
>> Maybe "calc"? struct_calc(), array_calc(), array3_calc()? This has the
>> benefit of actually saying more about what it is doing, rather than
>> its return value... In the end, I don't care. :)
>
> I don't have a strong feeling on this either.

I lean ever so slightly towards *_size(). It'll be hard to mix up
ARRAY_SIZE() and array_size(), given the parameters.

-Kees

--
Kees Cook
Pixel Security

2018-05-07 21:41:55

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: *alloc API changes

On 2018-05-05 06:24, Kees Cook wrote:
>
>> Right, I was thinking:
>>
>> static inline size_t mul_ab(size_t a, size_t b)
>> {
>> #if COMPILER_SUPPORTS_OVERFLOW
>> unsigned long c;
>> if (__builtin_mul_overflow(a, b, &c))
>> return SIZE_MAX;
>> return c;
>> #else
>> if (b != 0 && a >= SIZE_MAX / b)
>> return SIZE_MAX;
>> return a * b;
>> #endif
>> }
>
> Rasmus, what do you think of a saturating version of your helpers?

They'd be trivial to implement (using the __type_max that I had to
introduce anyway), and I'd prefer they'd have sat_ in their name - not
just smult, s would more likely be interpreted as "signed". On that
note, I'd want to enforce the sat_ ones are only used for unsigned
types, because there's no sane value to saturate to for signed operands.

But I don't think we should do the allocation-overflow in terms of
saturate-and-rely-on-allocator-rejecting it. I suppose we'll still have
the computation done in a static inline (to let the compiler see if one
or both operands are constants, and generate code accordingly). If we do

static inline void *kmalloc_array(size_t n, size_t size, gfp_t flags)
{
size_t p;
p = sat_mul(n, size);
return __kmalloc(p, flags);
}

with sat_mul being implemented in terms of __builtin_mul_overflow(), gcc
will probably waste a temp register on loading SIZE_MAX (or whatever
sentinel we use), and do a pipeline-stalling cmov to the register used
to pass p.

If instead we do

static inline void *kmalloc_array(size_t n, size_t size, gfp_t flags)
{
size_t p;
if (check_mul_overflow(n, size, &p))
return NULL;
return __kmalloc(p, flags);
}

we'd not get any extra code in the caller (that is, just a "mul" and
"jo", instead of a load-immediate, mul, cmov), because gcc should be
smart enough to combine the "return NULL" with the test for NULL which
the caller code has, and thus make the jump go directly to the error
handling (that error handling is likely itself a jump, but the "jo" will
just get redirected to the target of that one).

Also, I'd hate to have sat_mul not really saturating to type_max(t), but
some large-enough-that-all-underlying-allocators-reject-it.

> The only fear I have with the saturating helpers is that we'll end up
> using them in places that don't recognize SIZE_MAX. Like, say:
>
> size = mul(a, b) + 1;
>
> then *poof* size == 0. Now, I'd hope that code would use add(mul(a,
> b), 1), but still... it makes me nervous.
>
>> You don't need the size check here. We have the size check buried deep in
>> alloc_pages (look for MAX_ORDER), so kmalloc and then alloc_pages will try
>> a bunch of paths all of which fail before returning NULL.
>
> Good point. Though it does kind of creep me out to let a known-bad
> size float around in the allocator until it decides to reject it. I
> would think an early:
>
> if (unlikely(size == SIZE_MAX))
> return NULL;
>
> would have virtually no cycle count difference...

All allocators still need to reject insane sizes, since those can happen
without coming from a multiplication. So sure, some early size >
MAX_SANE_SIZE check in the out-of-line functions should be rather cheap,
and they most likely already exist in some form. But we don't _have_ to
go out of our way to make the multiplication overflow handling depend on
those.

>
> Right, no. I think if we can ditch *calloc() and _array() by using
> saturating helpers, we'll have the API in a much better form:
>
> kmalloc(foo * bar, GFP_KERNEL);
> into
> kmalloc_array(foo, bar, GFP_KERNEL);
> into
> kmalloc(mul(foo, bar), GFP_KERNEL);

Urgh. Do you want to get completely rid of kmalloc_array() and move the
mul() into the call-sites? That obviously necessitates mul returning a
big-enough sentinel. I'd hate that. Not just because of the code-gen,
but also because of the problem with giving mul() sane semantics that
still make it immune to the extra arithmetic that will inevitably be
done. There's also the problem with foo and bar having different,
possibly signed, types - how should mul() handle those? A nice benefit
from having the static inline wrappers taking size_t is that a negative
value gets converted to a huge positive value, and then the whole thing
overflows. Sure, you can build that into mul() (maybe make that itself a
static inline), but then it doesn't really deserve that generic name
anymore.

> and
>
> kmalloc(foo * bar, GFP_KERNEL | __GFP_ZERO);
> into
> kzalloc(foo * bar, GFP_KERNEL);
> into
> kcalloc(foo, bar, GFP_KERNEL);
> into
> kzalloc(mul(foo, bar), GFP_KERNEL);

Yeah, part of the API mess is just copied from C (malloc vs calloc). We
could make it a bit less messy by calling it kzalloc_array, but we have
1700 callers of kcalloc(), so...

Rasmus

2018-05-07 21:49:32

by John Johansen

[permalink] [raw]
Subject: Re: *alloc API changes

On 05/07/2018 01:27 PM, Kees Cook wrote:
> On Mon, May 7, 2018 at 1:19 PM, Matthew Wilcox <[email protected]> wrote:
>> On Mon, May 07, 2018 at 09:03:54AM -0700, Kees Cook wrote:
>>> On Mon, May 7, 2018 at 4:39 AM, Matthew Wilcox <[email protected]> wrote:
>>>> On Fri, May 04, 2018 at 09:24:56PM -0700, Kees Cook wrote:
>>>>> On Fri, May 4, 2018 at 8:46 PM, Matthew Wilcox <[email protected]> wrote:
>>>>> The only fear I have with the saturating helpers is that we'll end up
>>>>> using them in places that don't recognize SIZE_MAX. Like, say:
>>>>>
>>>>> size = mul(a, b) + 1;
>>>>>
>>>>> then *poof* size == 0. Now, I'd hope that code would use add(mul(a,
>>>>> b), 1), but still... it makes me nervous.
>>>>
>>>> That's reasonable. So let's add:
>>>>
>>>> #define ALLOC_TOO_BIG (PAGE_SIZE << MAX_ORDER)
>>>>
>>>> (there's a presumably somewhat obsolete CONFIG_FORCE_MAX_ZONEORDER on some
>>>> architectures which allows people to configure MAX_ORDER all the way up
>>>> to 64. That config option needs to go away, or at least be limited to
>>>> a much lower value).
>>>>
>>>> On x86, that's 4k << 11 = 8MB. On PPC, that might be 64k << 9 == 32MB.
>>>> Those values should be relatively immune to further arithmetic causing
>>>> an additional overflow.
>>>
>>> But we can do larger than 8MB allocations with vmalloc, can't we?
>>
>> Yes. And today with kvmalloc. However, I proposed to Linus that
>> kvmalloc() shouldn't allow it -- we should have kvmalloc_large() which
>> would, but kvmalloc wouldn't. He liked that idea, so I'm going with it.
>
> How would we handle size calculations for _large?
>
>> There are very, very few places which should need kvmalloc_large.
>> That's one million 8-byte pointers. If you need more than that inside
>> the kernel, you're doing something really damn weird and should do
>> something that looks obviously different.
>
> I'm CCing John since I remember long ago running into problems loading
> the AppArmor DFA with kmalloc and switching it to kvmalloc. John, how
> large can the DFAs for AppArmor get? Would an 8MB limit be a problem?
>

theoretically yes, and I have done tests with policy larger than that,
but in practice I have never seen it. The largest I have seen in
practice is about 1.5MB. The policy container that wraps the dfa,
could be larger if if its wrapping multiple policy sets (think
pre-loading policy for multiple containers in one go), but we don't do
that currently and there is no requirement for that to be handled with
a single allocation.

We have some improvements coming that will reduce our policy size, and
enable it so that we can split some of the larger dfas into multiple
allocations so I really don't expect this will be a problem.

If it becomes an issue we know the size of the allocation needed and
can just have a condition that calls vmalloc_large when needed.


2018-05-07 22:57:59

by Kees Cook

[permalink] [raw]
Subject: Re: *alloc API changes

On Mon, May 7, 2018 at 2:41 PM, Rasmus Villemoes
<[email protected]> wrote:
> If instead we do
>
> static inline void *kmalloc_array(size_t n, size_t size, gfp_t flags)
> {
> size_t p;
> if (check_mul_overflow(n, size, &p))
> return NULL;
> return __kmalloc(p, flags);
> }
>
> we'd not get any extra code in the caller (that is, just a "mul" and
> "jo", instead of a load-immediate, mul, cmov), because gcc should be
> smart enough to combine the "return NULL" with the test for NULL which
> the caller code has, and thus make the jump go directly to the error
> handling (that error handling is likely itself a jump, but the "jo" will
> just get redirected to the target of that one).
>
> Also, I'd hate to have sat_mul not really saturating to type_max(t), but
> some large-enough-that-all-underlying-allocators-reject-it.

Yeah, this continues to worry me too.

> All allocators still need to reject insane sizes, since those can happen
> without coming from a multiplication. So sure, some early size >
> MAX_SANE_SIZE check in the out-of-line functions should be rather cheap,
> and they most likely already exist in some form. But we don't _have_ to
> go out of our way to make the multiplication overflow handling depend on
> those.
>
>>
>> Right, no. I think if we can ditch *calloc() and _array() by using
>> saturating helpers, we'll have the API in a much better form:
>>
>> kmalloc(foo * bar, GFP_KERNEL);
>> into
>> kmalloc_array(foo, bar, GFP_KERNEL);
>> into
>> kmalloc(mul(foo, bar), GFP_KERNEL);
>
> Urgh. Do you want to get completely rid of kmalloc_array() and move the
> mul() into the call-sites? That obviously necessitates mul returning a
> big-enough sentinel. I'd hate that. Not just because of the code-gen,
> but also because of the problem with giving mul() sane semantics that
> still make it immune to the extra arithmetic that will inevitably be
> done. There's also the problem with foo and bar having different,
> possibly signed, types - how should mul() handle those? A nice benefit
> from having the static inline wrappers taking size_t is that a negative
> value gets converted to a huge positive value, and then the whole thing
> overflows. Sure, you can build that into mul() (maybe make that itself a
> static inline), but then it doesn't really deserve that generic name
> anymore.

If the struct_size(), array_size(), and array3_size() all take size_t,
then I think we gain the same protection, yes? i.e. they are built out
of your overflow checking routines. Even though I'm nervous about
SIZE_MAX wrapping on other uses, I do agree that doing early rejection
in the allocators makes a lot more sense. I think we can have bot the
SIZE_MAX early-abort of "something went very wrong", and the internal
"I refuse to store that much" that is per-allocator-tuned.

>> kmalloc(foo * bar, GFP_KERNEL | __GFP_ZERO);
>> into
>> kzalloc(foo * bar, GFP_KERNEL);
>> into
>> kcalloc(foo, bar, GFP_KERNEL);
>> into
>> kzalloc(mul(foo, bar), GFP_KERNEL);
>
> Yeah, part of the API mess is just copied from C (malloc vs calloc). We
> could make it a bit less messy by calling it kzalloc_array, but we have
> 1700 callers of kcalloc(), so...

Coccinelle can fix those callers. ;) But we need to make sure we're
still generating sane machine code before we do that...

-Kees

--
Kees Cook
Pixel Security