2023-11-30 20:01:13

by Kees Cook

[permalink] [raw]
Subject: [PATCH] netlink: Return unsigned value for nla_len()

The return value from nla_len() is never expected to be negative, and can
never be more than struct nlattr::nla_len (a u16). Adjust the prototype
on the function, and explicitly bounds check the subtraction. This will
let GCC's value range optimization passes know that the return can never
be negative, and can never be larger than u16. As recently discussed[1],
this silences the following warning in GCC 12+:

net/wireless/nl80211.c: In function 'nl80211_set_cqm_rssi.isra':
net/wireless/nl80211.c:12892:17: warning: 'memcpy' specified bound 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
12892 | memcpy(cqm_config->rssi_thresholds, thresholds,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12893 | flex_array_size(cqm_config, rssi_thresholds,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12894 | n_thresholds));
| ~~~~~~~~~~~~~~

This has the additional benefit of being defensive in the face of nlattr
corruption or logic errors (i.e. nla_len being set smaller than
NLA_HDRLEN).

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ [1]
Cc: Jakub Kicinski <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Jeff Johnson <[email protected]>
Cc: Michael Walle <[email protected]>
Cc: Max Schulze <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
include/net/netlink.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 167b91348e57..c59679524705 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -1214,9 +1214,9 @@ static inline void *nla_data(const struct nlattr *nla)
* nla_len - length of payload
* @nla: netlink attribute
*/
-static inline int nla_len(const struct nlattr *nla)
+static inline u16 nla_len(const struct nlattr *nla)
{
- return nla->nla_len - NLA_HDRLEN;
+ return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN : 0;
}

/**
--
2.34.1



2023-12-01 01:25:28

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] netlink: Return unsigned value for nla_len()

On Thu, 30 Nov 2023 12:01:01 -0800 Kees Cook wrote:
> This has the additional benefit of being defensive in the face of nlattr
> corruption or logic errors (i.e. nla_len being set smaller than
> NLA_HDRLEN).

As Johannes predicted I'd rather not :(

The callers should put the nlattr thru nla_ok() during validation
(nla_validate()), or walking (nla_for_each_* call nla_ok()).

> -static inline int nla_len(const struct nlattr *nla)
> +static inline u16 nla_len(const struct nlattr *nla)
> {
> - return nla->nla_len - NLA_HDRLEN;
> + return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN : 0;
> }

Note the the NLA_HDRLEN is the length of struct nlattr.
I mean of the @nla object that gets passed in as argument here.
So accepting that nla->nla_len may be < NLA_HDRLEN means
that we are okay with dereferencing a truncated object...

We can consider making the return unsinged without the condition maybe?
--
pw-bot: cr

2023-12-01 07:45:49

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] netlink: Return unsigned value for nla_len()

On Thu, 2023-11-30 at 17:25 -0800, Jakub Kicinski wrote:
> On Thu, 30 Nov 2023 12:01:01 -0800 Kees Cook wrote:
> > This has the additional benefit of being defensive in the face of nlattr
> > corruption or logic errors (i.e. nla_len being set smaller than
> > NLA_HDRLEN).
>
> As Johannes predicted I'd rather not :(

:)

> The callers should put the nlattr thru nla_ok() during validation
> (nla_validate()), or walking (nla_for_each_* call nla_ok()).

Which we do, since we have just normal input validation on generic
netlink. Actually nla_validate() only does it via walking either ;-)

The thing is that's something the compiler can't really see, it happens
out-of-line in completely different code (generic netlink) before you
even get into nl80211.

> > -static inline int nla_len(const struct nlattr *nla)
> > +static inline u16 nla_len(const struct nlattr *nla)
> > {
> > - return nla->nla_len - NLA_HDRLEN;
> > + return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN : 0;
> > }
>
> Note the the NLA_HDRLEN is the length of struct nlattr.
> I mean of the @nla object that gets passed in as argument here.
> So accepting that nla->nla_len may be < NLA_HDRLEN means
> that we are okay with dereferencing a truncated object...
>
> We can consider making the return unsinged without the condition maybe?

That seems problematic too though - better for an (unvalidated)
attribute with a bad size to actually show up with a negative payload
length rather than an underflow to a really big size.

Anyway I really don't mind the workaround in nl80211 (which was to make
the variables holding this unsigned), since we *do* know that we
validated there, that's not an issue wrt. the length.

johannes

2023-12-01 18:17:19

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] netlink: Return unsigned value for nla_len()

On Thu, Nov 30, 2023 at 05:25:20PM -0800, Jakub Kicinski wrote:
> On Thu, 30 Nov 2023 12:01:01 -0800 Kees Cook wrote:
> > This has the additional benefit of being defensive in the face of nlattr
> > corruption or logic errors (i.e. nla_len being set smaller than
> > NLA_HDRLEN).
>
> As Johannes predicted I'd rather not :(
>
> The callers should put the nlattr thru nla_ok() during validation
> (nla_validate()), or walking (nla_for_each_* call nla_ok()).
>
> > -static inline int nla_len(const struct nlattr *nla)
> > +static inline u16 nla_len(const struct nlattr *nla)
> > {
> > - return nla->nla_len - NLA_HDRLEN;
> > + return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN : 0;
> > }
>
> Note the the NLA_HDRLEN is the length of struct nlattr.
> I mean of the @nla object that gets passed in as argument here.
> So accepting that nla->nla_len may be < NLA_HDRLEN means
> that we are okay with dereferencing a truncated object...
>
> We can consider making the return unsinged without the condition maybe?

Yes, if we did it without the check, it'd do "less" damage on
wrap-around. (i.e. off by U16_MAX instead off by INT_MAX).

But I'd like to understand: what's the harm in adding the clamp? The
changes to the assembly are tiny:
https://godbolt.org/z/Ecvbzn1a1

i.e. a likely dropped-from-the-pipeline xor and a "free" cmov (checking
the bit from the subtraction). I don't think it could even get measured
in real-world cycle counts. This is much like the refcount_t work:
checking for the overflow condition has almost 0 overhead.

(It looks like I should use __builtin_sub_overflow() to correctly hint
GCC, but Clang gets it right without such hinting. Also I changed
NLA_HDRLEN to u16 to get the best result, which suggests there might be
larger savings throughout the code base just from that change...)

--
Kees Cook

2023-12-01 18:45:10

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] netlink: Return unsigned value for nla_len()

On Fri, 1 Dec 2023 10:17:02 -0800 Kees Cook wrote:
> > > -static inline int nla_len(const struct nlattr *nla)
> > > +static inline u16 nla_len(const struct nlattr *nla)
> > > {
> > > - return nla->nla_len - NLA_HDRLEN;
> > > + return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN : 0;
> > > }
> >
> > Note the the NLA_HDRLEN is the length of struct nlattr.
> > I mean of the @nla object that gets passed in as argument here.
> > So accepting that nla->nla_len may be < NLA_HDRLEN means
> > that we are okay with dereferencing a truncated object...
> >
> > We can consider making the return unsinged without the condition maybe?
>
> Yes, if we did it without the check, it'd do "less" damage on
> wrap-around. (i.e. off by U16_MAX instead off by INT_MAX).
>
> But I'd like to understand: what's the harm in adding the clamp? The
> changes to the assembly are tiny:
> https://godbolt.org/z/Ecvbzn1a1

Hm, I wonder if my explanation was unclear or you disagree..

This is the structure:

struct nlattr {
__u16 nla_len; // attr len, incl. this header
__u16 nla_type;
};

and (removing no-op wrappers):

#define NLA_HDRLEN sizeof(struct nlattr)

So going back to the code:

return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN...

We are reading nla->nla_len, which is the first 2 bytes of the structure.
And then we check if the structure is... there?

If we don't trust that struct nlattr which gets passed here is at least
NLA_HDRLEN (4B) then why do we think it's safe to read nla_len (the
first 2B of it)?

That's why I was pointing at nla_ok(). nla_ok() takes the size of the
buffer / message as an arg, so that it can also check if looking at
nla_len itself is not going to be an OOB access. 99% of netlink buffers
we parse come from user space. So it's not like someone could have
mis-initialized the nla_len in the kernel and being graceful is helpful.

The extra conditional is just a minor thing. The major thing is that
unless I'm missing something the check makes me go ????️


2023-12-02 04:39:57

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] netlink: Return unsigned value for nla_len()

On Fri, Dec 01, 2023 at 10:45:05AM -0800, Jakub Kicinski wrote:
> On Fri, 1 Dec 2023 10:17:02 -0800 Kees Cook wrote:
> > > > -static inline int nla_len(const struct nlattr *nla)
> > > > +static inline u16 nla_len(const struct nlattr *nla)
> > > > {
> > > > - return nla->nla_len - NLA_HDRLEN;
> > > > + return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN : 0;
> > > > }
> > >
> > > Note the the NLA_HDRLEN is the length of struct nlattr.
> > > I mean of the @nla object that gets passed in as argument here.
> > > So accepting that nla->nla_len may be < NLA_HDRLEN means
> > > that we are okay with dereferencing a truncated object...
> > >
> > > We can consider making the return unsinged without the condition maybe?
> >
> > Yes, if we did it without the check, it'd do "less" damage on
> > wrap-around. (i.e. off by U16_MAX instead off by INT_MAX).
> >
> > But I'd like to understand: what's the harm in adding the clamp? The
> > changes to the assembly are tiny:
> > https://godbolt.org/z/Ecvbzn1a1
>
> Hm, I wonder if my explanation was unclear or you disagree..
>
> This is the structure:
>
> struct nlattr {
> __u16 nla_len; // attr len, incl. this header
> __u16 nla_type;
> };
>
> and (removing no-op wrappers):
>
> #define NLA_HDRLEN sizeof(struct nlattr)
>
> So going back to the code:
>
> return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN...
>
> We are reading nla->nla_len, which is the first 2 bytes of the structure.
> And then we check if the structure is... there?

I'm not debating whether it's there or not -- I'm saying the _contents_ of
"nlattr::nla_len", in the face of corruption or lack of initialization,
may be less than NLA_HDRLEN. (There's a lot of "but that's can't happen"
that _does_ happen in the kernel, so I'm extra paranoid.)

> If we don't trust that struct nlattr which gets passed here is at least
> NLA_HDRLEN (4B) then why do we think it's safe to read nla_len (the
> first 2B of it)?

Type confusion (usually due to Use-after-Free flaws) means that a memory
region is valid (i.e. good pointer), but that the contents might have
gotten changed through other means. (To see examples of this with
struct msg_msg, see: https://syst3mfailure.io/wall-of-perdition/)

(On a related note, why does nla_len start at 4 instead of 0? i.e. why
does it include the size of nlattr? That seems redundant based on the
same logic you're using here.)

> That's why I was pointing at nla_ok(). nla_ok() takes the size of the
> buffer / message as an arg, so that it can also check if looking at
> nla_len itself is not going to be an OOB access. 99% of netlink buffers
> we parse come from user space. So it's not like someone could have
> mis-initialized the nla_len in the kernel and being graceful is helpful.
>
> The extra conditional is just a minor thing. The major thing is that
> unless I'm missing something the check makes me go ????️

My concern is that there are 562 callers of nla_len():

$ git grep '\bnla_len(\b' | wc -l
562

We have no way to be certain that all callers follow a successful
nla_ok() call.

Regardless, just moving from "int" to "u16" solves a bunch of value
range tracking pain that GCC appears to get upset about, so if you
really don't want the (tiny) sanity check, I can just send the u16
change.

-Kees

--
Kees Cook

2023-12-02 05:16:25

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] netlink: Return unsigned value for nla_len()

On Fri, 1 Dec 2023 20:39:44 -0800 Kees Cook wrote:
> > We are reading nla->nla_len, which is the first 2 bytes of the structure.
> > And then we check if the structure is... there?
>
> I'm not debating whether it's there or not -- I'm saying the _contents_ of
> "nlattr::nla_len", in the face of corruption or lack of initialization,
> may be less than NLA_HDRLEN. (There's a lot of "but that's can't happen"
> that _does_ happen in the kernel, so I'm extra paranoid.)

nlattr is not an object someone has allocated. It's a header of a TLV
in a byte stream of nested TLVs which comes from user space.
If the attr did not go thru nla_ok() or some other careful validation
we're toast regardless.

> > If we don't trust that struct nlattr which gets passed here is at least
> > NLA_HDRLEN (4B) then why do we think it's safe to read nla_len (the
> > first 2B of it)?
>
> Type confusion (usually due to Use-after-Free flaws) means that a memory
> region is valid (i.e. good pointer), but that the contents might have
> gotten changed through other means. (To see examples of this with
> struct msg_msg, see: https://syst3mfailure.io/wall-of-perdition/)

A bit of a long read.

> (On a related note, why does nla_len start at 4 instead of 0? i.e. why
> does it include the size of nlattr? That seems redundant based on the
> same logic you're using here.)

Beats me.