2022-12-07 00:02:23

by Kees Cook

[permalink] [raw]
Subject: [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()

When build_skb() is passed a frag_size of 0, it means the buffer came
from kmalloc. In these cases, ksize() is used to find its actual size,
but since the allocation may not have been made to that size, actually
perform the krealloc() call so that all the associated buffer size
checking will be correctly notified. For example, syzkaller reported:

BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295

For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
build_skb().

Reported-by: [email protected]
Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: pepsipu <[email protected]>
Cc: [email protected]
Cc: Vlastimil Babka <[email protected]>
Cc: kasan-dev <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: [email protected]
Cc: bpf <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Jesper Dangaard Brouer <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: [email protected]
Cc: KP Singh <[email protected]>
Cc: [email protected]
Cc: Stanislav Fomichev <[email protected]>
Cc: [email protected]
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: LKML <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
---
net/core/skbuff.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1d9719e72f9d..b55d061ed8b4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -274,7 +274,23 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
unsigned int frag_size)
{
struct skb_shared_info *shinfo;
- unsigned int size = frag_size ? : ksize(data);
+ unsigned int size = frag_size;
+
+ /* When frag_size == 0, the buffer came from kmalloc, so we
+ * must find its true allocation size (and grow it to match).
+ */
+ if (unlikely(size == 0)) {
+ void *resized;
+
+ size = ksize(data);
+ /* krealloc() will immediate return "data" when
+ * "ksize(data)" is requested: it is the existing upper
+ * bounds. As a result, GFP_ATOMIC will be ignored.
+ */
+ resized = krealloc(data, size, GFP_ATOMIC);
+ if (WARN_ON(resized != data))
+ data = resized;
+ }

size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));

--
2.34.1


2022-12-07 02:20:34

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()

On Tue, 6 Dec 2022 15:17:14 -0800 Kees Cook wrote:
> - unsigned int size = frag_size ? : ksize(data);
> + unsigned int size = frag_size;
> +
> + /* When frag_size == 0, the buffer came from kmalloc, so we
> + * must find its true allocation size (and grow it to match).
> + */
> + if (unlikely(size == 0)) {
> + void *resized;
> +
> + size = ksize(data);
> + /* krealloc() will immediate return "data" when
> + * "ksize(data)" is requested: it is the existing upper
> + * bounds. As a result, GFP_ATOMIC will be ignored.
> + */
> + resized = krealloc(data, size, GFP_ATOMIC);
> + if (WARN_ON(resized != data))
> + data = resized;
> + }
>

Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
using kmalloc()'ed heads is large because GRO can't free the metadata.
So we end up carrying per-MTU skbs across to the application and then
freeing them one by one. With pages we just aggregate up to 64k of data
in a single skb.

I can only grep out 3 cases of build_skb(.. 0), could we instead
convert them into a new build_skb_slab(), and handle all the silliness
in such a new helper? That'd be a win both for the memory safety and one
fewer branch for the fast path.

I think it's worth doing, so LMK if you're okay to do this extra work,
otherwise I can help (unless e.g. Eric tells me I'm wrong..).

2022-12-07 04:00:12

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()

On December 6, 2022 5:55:57 PM PST, Jakub Kicinski <[email protected]> wrote:
>On Tue, 6 Dec 2022 15:17:14 -0800 Kees Cook wrote:
>> - unsigned int size = frag_size ? : ksize(data);
>> + unsigned int size = frag_size;
>> +
>> + /* When frag_size == 0, the buffer came from kmalloc, so we
>> + * must find its true allocation size (and grow it to match).
>> + */
>> + if (unlikely(size == 0)) {
>> + void *resized;
>> +
>> + size = ksize(data);
>> + /* krealloc() will immediate return "data" when
>> + * "ksize(data)" is requested: it is the existing upper
>> + * bounds. As a result, GFP_ATOMIC will be ignored.
>> + */
>> + resized = krealloc(data, size, GFP_ATOMIC);
>> + if (WARN_ON(resized != data))
>> + data = resized;
>> + }
>>
>
>Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
>using kmalloc()'ed heads is large because GRO can't free the metadata.
>So we end up carrying per-MTU skbs across to the application and then
>freeing them one by one. With pages we just aggregate up to 64k of data
>in a single skb.

This isn't changed by this patch, though? The users of kmalloc+build_skb are pre-existing.

>I can only grep out 3 cases of build_skb(.. 0), could we instead
>convert them into a new build_skb_slab(), and handle all the silliness
>in such a new helper? That'd be a win both for the memory safety and one
>fewer branch for the fast path.

When I went through callers, it was many more than 3. Regardless, I don't see the point: my patch has no more branches than the original code (in fact, it may actually be faster because I made the initial assignment unconditional, and zero-test-after-assign is almost free, where as before it tested before the assign. And now it's marked as unlikely to keep it out-of-line.

>I think it's worth doing, so LMK if you're okay to do this extra work,
>otherwise I can help (unless e.g. Eric tells me I'm wrong..).

I had been changing callers to round up (e.g. bnx2), but it seemed like centralizing this makes more sense. I don't think a different helper will clean this up.

-Kees


--
Kees Cook

2022-12-07 05:12:19

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()

On Tue, 06 Dec 2022 19:47:13 -0800 Kees Cook wrote:
> >Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
> >using kmalloc()'ed heads is large because GRO can't free the metadata.
> >So we end up carrying per-MTU skbs across to the application and then
> >freeing them one by one. With pages we just aggregate up to 64k of data
> >in a single skb.
>
> This isn't changed by this patch, though? The users of
> kmalloc+build_skb are pre-existing.

Yes.

> >I can only grep out 3 cases of build_skb(.. 0), could we instead
> >convert them into a new build_skb_slab(), and handle all the silliness
> >in such a new helper? That'd be a win both for the memory safety and one
> >fewer branch for the fast path.
>
> When I went through callers, it was many more than 3. Regardless, I
> don't see the point: my patch has no more branches than the original
> code (in fact, it may actually be faster because I made the initial
> assignment unconditional, and zero-test-after-assign is almost free,
> where as before it tested before the assign. And now it's marked as
> unlikely to keep it out-of-line.

Maybe.

> >I think it's worth doing, so LMK if you're okay to do this extra
> >work, otherwise I can help (unless e.g. Eric tells me I'm wrong..).
>
> I had been changing callers to round up (e.g. bnx2), but it seemed
> like centralizing this makes more sense. I don't think a different
> helper will clean this up.

It's a combination of the fact that I think "0 is magic" falls in
the "garbage" category of APIs, and the fact that driver developers
have many things to worry about, so they often don't know that using
slab is a bad idea. So I want a helper out of the normal path, where
I can put a kdoc warning that says "if you're doing this - GRO will
suck, use page frags".

2022-12-07 09:50:16

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()

On 12/7/22 00:17, Kees Cook wrote:
> When build_skb() is passed a frag_size of 0, it means the buffer came
> from kmalloc. In these cases, ksize() is used to find its actual size,
> but since the allocation may not have been made to that size, actually
> perform the krealloc() call so that all the associated buffer size
> checking will be correctly notified. For example, syzkaller reported:
>
> BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
> Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295
>
> For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
> build_skb().

Weren't all such kmalloc() users converted to kmalloc_size_roundup() to
prevent this?

> Reported-by: [email protected]
> Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
> Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
> Cc: "David S. Miller" <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: Paolo Abeni <[email protected]>
> Cc: Pavel Begunkov <[email protected]>
> Cc: pepsipu <[email protected]>
> Cc: [email protected]
> Cc: Vlastimil Babka <[email protected]>
> Cc: kasan-dev <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: [email protected]
> Cc: bpf <[email protected]>
> Cc: Daniel Borkmann <[email protected]>
> Cc: Hao Luo <[email protected]>
> Cc: Jesper Dangaard Brouer <[email protected]>
> Cc: John Fastabend <[email protected]>
> Cc: [email protected]
> Cc: KP Singh <[email protected]>
> Cc: [email protected]
> Cc: Stanislav Fomichev <[email protected]>
> Cc: [email protected]
> Cc: Yonghong Song <[email protected]>
> Cc: [email protected]
> Cc: LKML <[email protected]>
> Signed-off-by: Kees Cook <[email protected]>
> ---
> net/core/skbuff.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 1d9719e72f9d..b55d061ed8b4 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -274,7 +274,23 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
> unsigned int frag_size)
> {
> struct skb_shared_info *shinfo;
> - unsigned int size = frag_size ? : ksize(data);
> + unsigned int size = frag_size;
> +
> + /* When frag_size == 0, the buffer came from kmalloc, so we
> + * must find its true allocation size (and grow it to match).
> + */
> + if (unlikely(size == 0)) {
> + void *resized;
> +
> + size = ksize(data);
> + /* krealloc() will immediate return "data" when
> + * "ksize(data)" is requested: it is the existing upper
> + * bounds. As a result, GFP_ATOMIC will be ignored.
> + */
> + resized = krealloc(data, size, GFP_ATOMIC);
> + if (WARN_ON(resized != data))

WARN_ON_ONCE() could be sufficient as either this is impossible to hit by
definition, or something went very wrong (a patch screwed ksize/krealloc?)
and it can be hit many times?

> + data = resized;

In that "impossible" case, this could also end up as NULL due to GFP_ATOMIC
allocation failure, but maybe it's really impractical to do anything about it...

> + }
>
> size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>

2022-12-07 10:51:19

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()

On Wed, Dec 7, 2022 at 2:56 AM Jakub Kicinski <[email protected]> wrote:
>
> On Tue, 6 Dec 2022 15:17:14 -0800 Kees Cook wrote:
> > - unsigned int size = frag_size ? : ksize(data);
> > + unsigned int size = frag_size;
> > +
> > + /* When frag_size == 0, the buffer came from kmalloc, so we
> > + * must find its true allocation size (and grow it to match).
> > + */
> > + if (unlikely(size == 0)) {
> > + void *resized;
> > +
> > + size = ksize(data);
> > + /* krealloc() will immediate return "data" when
> > + * "ksize(data)" is requested: it is the existing upper
> > + * bounds. As a result, GFP_ATOMIC will be ignored.
> > + */
> > + resized = krealloc(data, size, GFP_ATOMIC);
> > + if (WARN_ON(resized != data))
> > + data = resized;
> > + }
> >
>
> Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
> using kmalloc()'ed heads is large because GRO can't free the metadata.
> So we end up carrying per-MTU skbs across to the application and then
> freeing them one by one. With pages we just aggregate up to 64k of data
> in a single skb.
>
> I can only grep out 3 cases of build_skb(.. 0), could we instead
> convert them into a new build_skb_slab(), and handle all the silliness
> in such a new helper? That'd be a win both for the memory safety and one
> fewer branch for the fast path.
>
> I think it's worth doing, so LMK if you're okay to do this extra work,
> otherwise I can help (unless e.g. Eric tells me I'm wrong..).

I totally agree, I would indeed remove ksize() use completely,
let callers give us the size, and the head_frag boolean,
instead of inferring from size==0