2018-08-26 05:59:25

by Kees Cook

[permalink] [raw]
Subject: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

Via u32_change(), TCA_U32_SEL has an unspecified type in the netlink
policy, so max length isn't enforced, only minimum. This means nkeys
(from userspace) was being trusted without checking the actual size of
nla_len(), which could lead to a memory over-read, and ultimately an
exposure via a call to u32_dump(). Reachability is CAP_NET_ADMIN within
a namespace.

Reported-by: Al Viro <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Jiri Pirko <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
This should go through -stable please, but I have left off the "Cc:
stable" as per netdev patch policy. Note that use of struct_size()
will need manual expansion in backports, such as:
sel_size = sizeof(*s) + sizeof(*s->keys) * s->nkeys;
---
net/sched/cls_u32.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index d5d2a6dc3921..f218ccf1e2d9 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -914,6 +914,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
struct nlattr *opt = tca[TCA_OPTIONS];
struct nlattr *tb[TCA_U32_MAX + 1];
u32 htid, flags = 0;
+ size_t sel_size;
int err;
#ifdef CONFIG_CLS_U32_PERF
size_t size;
@@ -1076,8 +1077,13 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
}

s = nla_data(tb[TCA_U32_SEL]);
+ sel_size = struct_size(s, keys, s->nkeys);
+ if (nla_len(tb[TCA_U32_SEL]) < sel_size) {
+ err = -EINVAL;
+ goto erridr;
+ }

- n = kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL);
+ n = kzalloc(offsetof(typeof(*n), sel) + sel_size, GFP_KERNEL);
if (n == NULL) {
err = -ENOBUFS;
goto erridr;
@@ -1092,7 +1098,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
}
#endif

- memcpy(&n->sel, s, sizeof(*s) + s->nkeys*sizeof(struct tc_u32_key));
+ memcpy(&n->sel, s, sel_size);
RCU_INIT_POINTER(n->ht_up, ht);
n->handle = handle;
n->fshift = s->hmask ? ffs(ntohl(s->hmask)) - 1 : 0;
--
2.17.1


--
Kees Cook
Pixel Security


2018-08-26 06:17:29

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Sat, Aug 25, 2018 at 10:58:01PM -0700, Kees Cook wrote:
> Via u32_change(), TCA_U32_SEL has an unspecified type in the netlink
> policy, so max length isn't enforced, only minimum. This means nkeys
> (from userspace) was being trusted without checking the actual size of
> nla_len(), which could lead to a memory over-read, and ultimately an
> exposure via a call to u32_dump(). Reachability is CAP_NET_ADMIN within
> a namespace.
>
> Reported-by: Al Viro <[email protected]>
> Cc: Jamal Hadi Salim <[email protected]>
> Cc: Cong Wang <[email protected]>
> Cc: Jiri Pirko <[email protected]>
> Cc: "David S. Miller" <[email protected]>
> Cc: [email protected]
> Signed-off-by: Kees Cook <[email protected]>
> ---
> This should go through -stable please, but I have left off the "Cc:
> stable" as per netdev patch policy. Note that use of struct_size()
> will need manual expansion in backports, such as:
> sel_size = sizeof(*s) + sizeof(*s->keys) * s->nkeys;

Saner approach would be sel_size = offsetof(struct tc_u32_sel, keys[s->nkeys])...

> + sel_size = struct_size(s, keys, s->nkeys);
> + if (nla_len(tb[TCA_U32_SEL]) < sel_size) {
> + err = -EINVAL;
> + goto erridr;
> + }
>
> - n = kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL);
> + n = kzalloc(offsetof(typeof(*n), sel) + sel_size, GFP_KERNEL);

ITYM
n = kzalloc(offsetof(struct tc_u_common, sel.keys[s->nkeys]), GFP_KERNEL);

2018-08-26 06:20:56

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Sat, Aug 25, 2018 at 11:15 PM, Al Viro <[email protected]> wrote:
> On Sat, Aug 25, 2018 at 10:58:01PM -0700, Kees Cook wrote:
>> Via u32_change(), TCA_U32_SEL has an unspecified type in the netlink
>> policy, so max length isn't enforced, only minimum. This means nkeys
>> (from userspace) was being trusted without checking the actual size of
>> nla_len(), which could lead to a memory over-read, and ultimately an
>> exposure via a call to u32_dump(). Reachability is CAP_NET_ADMIN within
>> a namespace.
>>
>> Reported-by: Al Viro <[email protected]>
>> Cc: Jamal Hadi Salim <[email protected]>
>> Cc: Cong Wang <[email protected]>
>> Cc: Jiri Pirko <[email protected]>
>> Cc: "David S. Miller" <[email protected]>
>> Cc: [email protected]
>> Signed-off-by: Kees Cook <[email protected]>
>> ---
>> This should go through -stable please, but I have left off the "Cc:
>> stable" as per netdev patch policy. Note that use of struct_size()
>> will need manual expansion in backports, such as:
>> sel_size = sizeof(*s) + sizeof(*s->keys) * s->nkeys;
>
> Saner approach would be sel_size = offsetof(struct tc_u32_sel, keys[s->nkeys])...

Either is fine by me.

>> + sel_size = struct_size(s, keys, s->nkeys);
>> + if (nla_len(tb[TCA_U32_SEL]) < sel_size) {
>> + err = -EINVAL;
>> + goto erridr;
>> + }
>>
>> - n = kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL);
>> + n = kzalloc(offsetof(typeof(*n), sel) + sel_size, GFP_KERNEL);
>
> ITYM
> n = kzalloc(offsetof(struct tc_u_common, sel.keys[s->nkeys]), GFP_KERNEL);

I prefer to reuse sel_size and keep typeof() to keep things tied to
"n" more directly. *shrug*

-Kees

--
Kees Cook
Pixel Security

2018-08-26 17:32:27

by Jamal Hadi Salim

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On 2018-08-26 2:19 a.m., Kees Cook wrote:
> On Sat, Aug 25, 2018 at 11:15 PM, Al Viro <[email protected]> wrote:
>> On Sat, Aug 25, 2018 at 10:58:01PM -0700, Kees Cook wrote:
>> Saner approach would be sel_size = offsetof(struct tc_u32_sel, keys[s->nkeys])...
>
> Either is fine by me.
>
>>> + sel_size = struct_size(s, keys, s->nkeys);
>>> + if (nla_len(tb[TCA_U32_SEL]) < sel_size) {
>>> + err = -EINVAL;
>>> + goto erridr;
>>> + }
>>>
>>> - n = kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL);
>>> + n = kzalloc(offsetof(typeof(*n), sel) + sel_size, GFP_KERNEL);
>>
>> ITYM
>> n = kzalloc(offsetof(struct tc_u_common, sel.keys[s->nkeys]), GFP_KERNEL);
>
> I prefer to reuse sel_size and keep typeof() to keep things tied to
> "n" more directly. *shrug*

Looks good to me.
We should add an nla_policy later.

Acked-by: Jamal Hadi Salim <[email protected]>

cheers,
jamal

2018-08-26 17:34:00

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Sat, Aug 25, 2018 at 11:19:30PM -0700, Kees Cook wrote:
> >> - n = kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL);
> >> + n = kzalloc(offsetof(typeof(*n), sel) + sel_size, GFP_KERNEL);
> >
> > ITYM
> > n = kzalloc(offsetof(struct tc_u_common, sel.keys[s->nkeys]), GFP_KERNEL);
>
> I prefer to reuse sel_size and keep typeof() to keep things tied to
> "n" more directly. *shrug*

This is rather search-hostile, though. Fresh example from the same
area: where are struct tcf_proto instances created? Is it true that
each is followed by ->ops->init()? Is it true that ->ops->init()
is never called twice for the same instance? Is it true that
->ops->destroy() is called exactly once between successful ->ops->init()
and freeing the object?

That's precisely the kind of questions you end up asking when learning
a new area. Your variant makes those harder to answer; it does make
it easier to catch local problems on casual grep, but it's hell both
on the newbies trying to make sense of an area and on the old hands
from different areas.

That, BTW, is why I hate the use of sizeof(*p) in kmalloc, etc.
arguments. typeof is even worse in that respect.

As for the questions above... Do try to grep for ->init calls. Good
luck getting through the damn pile. And "it must see the definition
of tcf_proto_ops" doesn't narrow it - it's defined in net/sch_generic.h,
which gets pulled by linux/filter.h, which gets pulled by net/sock.h,
which gets pulled by arseloads of code.

As far as I can tell, the solution is
* outside of net/sched/*.c, tcf_proto_ops is mentioned only
in
include/net/pkt_cls.h:23:int register_tcf_proto_ops(struct tcf_proto_ops *ops);
include/net/pkt_cls.h:24:int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
and
include/net/sch_generic.h:304: const struct tcf_proto_ops *ops;
include/net/sch_generic.h:327: const struct tcf_proto_ops *tmplt_ops;
* the first two are irrelevant - externs don't get you any
access to the data structure
* tmplt_ops is only used in net/sched/*.c; for everything else
it could've been opaque pointer - it's not even looked at
* tcf_proto ->ops is, of course, ungreppable. However,
tcf_proto itself, outside of net/sched/*.c, is only mentioned in
include/net/act_api.h:175:int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
include/net/act_api.h:179:struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
include/net/pkt_cls.h:20: int (*fn)(struct tcf_proto *, void *node, struct tcf_walker *);
include/net/pkt_cls.h:48: struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
include/net/pkt_cls.h:90:int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
include/net/pkt_cls.h:96: struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
include/net/pkt_cls.h:196:static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
include/net/pkt_cls.h:221:tcf_bind_filter(struct tcf_proto *tp, struct tcf_result *r, unsigned long base)
include/net/pkt_cls.h:238:tcf_unbind_filter(struct tcf_proto *tp, struct tcf_result *r)
include/net/pkt_cls.h:385:int tcf_exts_validate(struct net *net, struct tcf_proto *tp,
include/net/pkt_cls.h:497:int tcf_em_tree_validate(struct tcf_proto *, struct nlattr *,
include/net/pkt_cls.h:713: const struct tcf_proto *tp, u32 flags,
include/net/sch_generic.h:237: const struct tcf_proto *goto_tp;
include/net/sch_generic.h:254: const struct tcf_proto *,
include/net/sch_generic.h:256: int (*init)(struct tcf_proto*);
include/net/sch_generic.h:257: void (*destroy)(struct tcf_proto *tp,
include/net/sch_generic.h:260: void* (*get)(struct tcf_proto*, u32 handle);
include/net/sch_generic.h:262: struct tcf_proto*, unsigned long,
include/net/sch_generic.h:266: int (*delete)(struct tcf_proto *tp, void *arg,
include/net/sch_generic.h:269: void (*walk)(struct tcf_proto*, struct tcf_walker *arg);
include/net/sch_generic.h:270: int (*reoffload)(struct tcf_proto *tp, bool add,
include/net/sch_generic.h:281: int (*dump)(struct net*, struct tcf_proto*, void *,
include/net/sch_generic.h:290:struct tcf_proto {
include/net/sch_generic.h:292: struct tcf_proto __rcu *next;
include/net/sch_generic.h:297: const struct tcf_proto *,
include/net/sch_generic.h:317:typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv);
include/net/sch_generic.h:320: struct tcf_proto __rcu *filter_chain;
include/net/sch_generic.h:1098: struct tcf_proto *filter_list;
include/net/sch_generic.h:1122: struct tcf_proto *tp_head);
* excluding externs, arguments in function pointers or a typedef for
such (neither would give an access to thus typed pointer), we are left with
include/net/pkt_cls.h:96: struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
include/net/pkt_cls.h:196:static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
include/net/pkt_cls.h:221:tcf_bind_filter(struct tcf_proto *tp, struct tcf_result *r, unsigned long base)
include/net/pkt_cls.h:238:tcf_unbind_filter(struct tcf_proto *tp, struct tcf_result *r)
include/net/pkt_cls.h:713: const struct tcf_proto *tp, u32 flags,
include/net/sch_generic.h:237: const struct tcf_proto *goto_tp;
include/net/sch_generic.h:290:struct tcf_proto {
include/net/sch_generic.h:292: struct tcf_proto __rcu *next;
include/net/sch_generic.h:320: struct tcf_proto __rcu *filter_chain;
include/net/sch_generic.h:1098: struct tcf_proto *filter_list;
* the first two are in arguments of static inlines which do not
use the arguments in question.
* the next three (tcf_bind_filter, tcf_unbind_filter and
tc_cls_common_offload_init) do not use ->ops or pass tcf_proto * to
anyone. Incidentally, they are only used in net/sched/*.c
* goto_tp is also used only in net/sched/*.c. Moreover,
all its uses anywhere could as well have been an opaque pointer.
*?grepping for filter_chain catches an unrelated local field
of the same name in mellanox, a function with the same name in
uprobes.c and a bunch of uses in net/sched/*.c.
* search for filter_list gets false positives in trace_events_filter.c,
a bunch of uses in net/sched/*.c and
net/core/dev.c:3533: switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
net/core/dev.c:4593: switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
both of which are opaque.
* the last remaining source of such pointers is
include/net/sch_generic.h:292: struct tcf_proto __rcu *next;
Of course, that's ungreppable. However, it's within the struct tcf_proto itself, so
anyone accessing it as something non-opaque would already have to access tcf_proto
pointers, and not in provably opaque fashion. Theoretically, the three tcf_...
inlines above need another look; fortunately, they don't use ->next at all, not to
mention not being used anywhere outside of net/sched/*.c

The 80 lines above prove that we only need to grep net/sched/*.c for
tcf_proto_ops method calls. And only because we don't have (thank $DEITY)
anything that could deconstruct types - as soon as some bastard grows means
to say "type of the second argument of the function pointed to by p", this
kind of analysis, painful as it is, goes out of window. Even as it is,
do you really like the idea of newbies trying to get through the exercises
like the one above?

Incidentally, that's not the end -
git grep -n '[-]>[ ]*init\>' net/sched/
git grep -n '\.[ ]*init\>' net/sched/
does catch 93 hits. Excluding comparisons, assignments and initializers,
we are down to
net/sched/act_api.c:878: err = a_o->init(net, tb[TCA_ACT_OPTIONS], est, &a, ovr, bind,
net/sched/act_api.c:881: err = a_o->init(net, nla, est, &a, ovr, bind, rtnl_held,
net/sched/cls_api.c:173: err = tp->ops->init(tp);
net/sched/sch_api.c:1155: err = ops->init(sch, tca[TCA_OPTIONS], extack);
net/sched/sch_generic.c:901: if (!ops->init || ops->init(sch, NULL, extack) == 0)
Note that we have no less than 3 different methods of the same name
here, going just by the number of arguments. Fortunately, only one candidate
for tcf_proto one, that in tcf_proto_create(). And there it's obviously done
on new object, with nothing else seeing it until the call.

Only that's not quite all I wanted to know - are there any other places
where tcf_proto instances get created? They are not members of any structs,
unions or arrays, thankfully (that we can see from grep) and there's no
variables of that type, be it auto or static duration. So it should all be
dynamically allocated. Moreover, from the above it looks like no twit could've
done such allocation in his/her/its misbegotten driver (they would have to
find the size somehow, and the search above would've spotted that). It has to
be somewhere in net/sched, if anywhere at all. And no, you can't rely upon
k.*alloc being used to allocate them - not apriori in unfamiliar code, not when
looking for a bug somewhere, etc. net/sched/*.c is 49KLoC; "read through and
see if it's done anywhere" is neither feasible nor supportable (what, do it
again each cycle?)

*IF* nobody plays games with sizeof(expression) (or typeof()), one
could look for mentionings of struct tcf_proto in there and exclude the
ones that actually mention pointers. That would've shown both the allocations
and places where container_of() gets used, etc. No such luck, thanks to
misguided souls preaching the "robust" uses of sizeof...

Sure, I can find all places in net/sched/*.c where we are declaring
pointers to tcf_proto or get those out of mentionings of fields of that
type. 256 hits total, and of course a lot of those are declarations of
function arguments, which means that the function needs to be read through,
thanks to the possibility of wonders like
tp->next = kmalloc(sizeof(*tp), GFP_KERNEL);
AFAICS, nothing exaclty like that exists there, but...

Most of such arguments are thankfully called 'tp', so grepping for
that in there allows to drop such declarations from the list. The list,
of course, grows, but it no longer contains that number of "need to look
through the entire" function hits. That, and some judicious use of search
and replace reduces it to something I'd been able to get through in about
an hour. All instances *are* created by tcf_proto_create(). The same fun-filled
activity has proven (modulo misreadings in that fun, of course) that all
instances are either freed before getting returned by tcf_proto_create() (in
case of ->init() failure) or go out in tcf_proto_destroy() via kfree_rcu(),
after ->ops->destroy() call done to them. And apparently that's the only
caller of ->destroy(), so modulo locking questions (I hadn't even started
to look into that) the answers to all questions in the beginning are
"yes".

Now, I'm fairly used to that kind of digging (and have a bunch of
useful vi macros, search patterns, etc.), so I'd managed to get through all
that. Took me about an hour and a half total. Do you really expect the
newbies to get through that joy? Sure, they (and I) can ask the maintainers,
who would've answered those questions instantly (well, modulo the email
latency, etc.) And for newbies asking that kind of questions is certainly
the right thing to do (or noting the suspected answer down and moving along,
to verify it later). But then the same maintainers have to verify that
this answer doesn't rot - that changes there (and elsewhere - never underestimate
the amount of weirdness cropping up as one-off hack in the bowels of drivers/*)
do not invalidate it? Same search, more or less...

VFS-side I'm trying to enforce "no sizeof(expression), unless it's
sizeof(local_variable)". Not religiously so, but any new instances of
sizeof in there are checked for that (once I get around to that). typeof
is rare as hens teeth in there, and should bloody remain so, TYVM.
It belongs inside very low-level macros and (almost) nowhere else.

There is a conflict of interests between "I don't give a damn what's
being allocated here, it does get sufficient size for resulting pointer
type and that's all I'm interested in" and "I'm looking for the places where
>this< is allocated". Your variant is firmly on the former side...

2018-08-26 18:59:28

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Sun, 2018-08-26 at 18:32 +0100, Al Viro wrote:
> On Sat, Aug 25, 2018 at 11:19:30PM -0700, Kees Cook wrote:
> > > > - n = kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL);
> > > > + n = kzalloc(offsetof(typeof(*n), sel) + sel_size, GFP_KERNEL);
> > >
> > > ITYM
> > > n = kzalloc(offsetof(struct tc_u_common, sel.keys[s->nkeys]), GFP_KERNEL);
> >
> > I prefer to reuse sel_size and keep typeof() to keep things tied to
> > "n" more directly. *shrug*
>
> This is rather search-hostile, though. Fresh example from the same
> area: where are struct tcf_proto instances created? Is it true that
> each is followed by ->ops->init()? Is it true that ->ops->init()
> is never called twice for the same instance? Is it true that
> ->ops->destroy() is called exactly once between successful ->ops->init()
> and freeing the object?
>
> That's precisely the kind of questions you end up asking when learning
> a new area. Your variant makes those harder to answer; it does make
> it easier to catch local problems on casual grep, but it's hell both
> on the newbies trying to make sense of an area and on the old hands
> from different areas.
>
> That, BTW, is why I hate the use of sizeof(*p) in kmalloc, etc.
> arguments. typeof is even worse in that respect.

True. Semantic searches via tools like coccinelle could help here
but those searches are quite a bit slower than straightforward greps.


2018-08-26 21:24:27

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

From: Kees Cook <[email protected]>
Date: Sat, 25 Aug 2018 22:58:01 -0700

> Via u32_change(), TCA_U32_SEL has an unspecified type in the netlink
> policy, so max length isn't enforced, only minimum. This means nkeys
> (from userspace) was being trusted without checking the actual size of
> nla_len(), which could lead to a memory over-read, and ultimately an
> exposure via a call to u32_dump(). Reachability is CAP_NET_ADMIN within
> a namespace.
>
> Reported-by: Al Viro <[email protected]>
> Signed-off-by: Kees Cook <[email protected]>

I'll apply this as-is and queued it up for -stable.

If we want to avoid sizeof(*p) type stuff, it can be done as a follow-up.

Thanks.

2018-08-26 21:25:54

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Sun, Aug 26, 2018 at 11:57:57AM -0700, Joe Perches wrote:

> > That, BTW, is why I hate the use of sizeof(*p) in kmalloc, etc.
> > arguments. typeof is even worse in that respect.
>
> True. Semantic searches via tools like coccinelle could help here
> but those searches are quite a bit slower than straightforward greps.

Those searches are .config-sensitive as well, which can be much more
unpleasant than being slow...

2018-08-26 21:58:23

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Sun, Aug 26, 2018 at 10:30 AM, Jamal Hadi Salim <[email protected]> wrote:
> We should add an nla_policy later.

What's the right way to do that for cases like this?

-Kees

--
Kees Cook
Pixel Security

2018-08-26 22:29:17

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Sun, 2018-08-26 at 22:24 +0100, Al Viro wrote:
> On Sun, Aug 26, 2018 at 11:57:57AM -0700, Joe Perches wrote:
>
> > > That, BTW, is why I hate the use of sizeof(*p) in kmalloc, etc.
> > > arguments. typeof is even worse in that respect.
> >
> > True. Semantic searches via tools like coccinelle could help here
> > but those searches are quite a bit slower than straightforward greps.
>
> Those searches are .config-sensitive as well, which can be much more
> unpleasant than being slow...

Are they? Julia?

2018-08-26 22:45:47

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Sun, Aug 26, 2018 at 03:26:54PM -0700, Joe Perches wrote:
> On Sun, 2018-08-26 at 22:24 +0100, Al Viro wrote:
> > On Sun, Aug 26, 2018 at 11:57:57AM -0700, Joe Perches wrote:
> >
> > > > That, BTW, is why I hate the use of sizeof(*p) in kmalloc, etc.
> > > > arguments. typeof is even worse in that respect.
> > >
> > > True. Semantic searches via tools like coccinelle could help here
> > > but those searches are quite a bit slower than straightforward greps.
> >
> > Those searches are .config-sensitive as well, which can be much more
> > unpleasant than being slow...
>
> Are they? Julia?

They work pretty much on preprocessor output level; if something it ifdef'ed
out on given config, it won't be seen...

2018-08-26 22:59:25

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Sun, Aug 26, 2018 at 06:32:37PM +0100, Al Viro wrote:

> As far as I can tell, the solution is
[snip long and painful reasoning]
> pointers, and not in provably opaque fashion. Theoretically, the three tcf_...
> inlines above need another look; fortunately, they don't use ->next at all, not to
> mention not being used anywhere outside of net/sched/*.c
>
> The 80 lines above prove that we only need to grep net/sched/*.c for
> tcf_proto_ops method calls. And only because we don't have (thank $DEITY)
> anything that could deconstruct types - as soon as some bastard grows means
> to say "type of the second argument of the function pointed to by p", this
> kind of analysis, painful as it is, goes out of window. Even as it is,
> do you really like the idea of newbies trying to get through the exercises
> like the one above?

BTW, would there be any problem if we took the definitions of tcf_proto and
tcf_proto_ops to e.g. net/sched/tcf_proto.h (along with the three inlines in
in pkt_cls.h), left forwards in sch_generic.h and added includes of "tcf_proto.h"
where needed in net/sched/*.c?

That would make tcf_proto/tcf_proto_ops opaque outside of net/sched, reducing
the exposure of internals. Something like a diff below (against net/master,
builds clean, ought to result in identical binary):

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index ef727f71336e..35f8eec3f7c0 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -217,35 +217,6 @@ cls_set_class(struct Qdisc *q, unsigned long *clp, unsigned long cl)
return old_cl;
}

-static inline void
-tcf_bind_filter(struct tcf_proto *tp, struct tcf_result *r, unsigned long base)
-{
- struct Qdisc *q = tp->chain->block->q;
- unsigned long cl;
-
- /* Check q as it is not set for shared blocks. In that case,
- * setting class is not supported.
- */
- if (!q)
- return;
- cl = q->ops->cl_ops->bind_tcf(q, base, r->classid);
- cl = cls_set_class(q, &r->class, cl);
- if (cl)
- q->ops->cl_ops->unbind_tcf(q, cl);
-}
-
-static inline void
-tcf_unbind_filter(struct tcf_proto *tp, struct tcf_result *r)
-{
- struct Qdisc *q = tp->chain->block->q;
- unsigned long cl;
-
- if (!q)
- return;
- if ((cl = __cls_set_class(&r->class, 0)) != 0)
- q->ops->cl_ops->unbind_tcf(q, cl);
-}
-
struct tcf_exts {
#ifdef CONFIG_NET_CLS_ACT
__u32 type; /* for backward compat(TCA_OLD_COMPAT) */
@@ -708,18 +679,6 @@ static inline bool tc_in_hw(u32 flags)
return (flags & TCA_CLS_FLAGS_IN_HW) ? true : false;
}

-static inline void
-tc_cls_common_offload_init(struct tc_cls_common_offload *cls_common,
- const struct tcf_proto *tp, u32 flags,
- struct netlink_ext_ack *extack)
-{
- cls_common->chain_index = tp->chain->index;
- cls_common->protocol = tp->protocol;
- cls_common->prio = tp->prio;
- if (tc_skip_sw(flags) || flags & TCA_CLS_FLAGS_VERBOSE)
- cls_common->extack = extack;
-}
-
enum tc_fl_command {
TC_CLSFLOWER_REPLACE,
TC_CLSFLOWER_DESTROY,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a6d00093f35e..72dbb96fc549 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -246,65 +246,7 @@ struct tcf_result {

struct tcf_chain;

-struct tcf_proto_ops {
- struct list_head head;
- char kind[IFNAMSIZ];
-
- int (*classify)(struct sk_buff *,
- const struct tcf_proto *,
- struct tcf_result *);
- int (*init)(struct tcf_proto*);
- void (*destroy)(struct tcf_proto *tp,
- struct netlink_ext_ack *extack);
-
- void* (*get)(struct tcf_proto*, u32 handle);
- int (*change)(struct net *net, struct sk_buff *,
- struct tcf_proto*, unsigned long,
- u32 handle, struct nlattr **,
- void **, bool,
- struct netlink_ext_ack *);
- int (*delete)(struct tcf_proto *tp, void *arg,
- bool *last,
- struct netlink_ext_ack *);
- void (*walk)(struct tcf_proto*, struct tcf_walker *arg);
- int (*reoffload)(struct tcf_proto *tp, bool add,
- tc_setup_cb_t *cb, void *cb_priv,
- struct netlink_ext_ack *extack);
- void (*bind_class)(void *, u32, unsigned long);
- void * (*tmplt_create)(struct net *net,
- struct tcf_chain *chain,
- struct nlattr **tca,
- struct netlink_ext_ack *extack);
- void (*tmplt_destroy)(void *tmplt_priv);
-
- /* rtnetlink specific */
- int (*dump)(struct net*, struct tcf_proto*, void *,
- struct sk_buff *skb, struct tcmsg*);
- int (*tmplt_dump)(struct sk_buff *skb,
- struct net *net,
- void *tmplt_priv);
-
- struct module *owner;
-};
-
-struct tcf_proto {
- /* Fast access part */
- struct tcf_proto __rcu *next;
- void __rcu *root;
-
- /* called under RCU BH lock*/
- int (*classify)(struct sk_buff *,
- const struct tcf_proto *,
- struct tcf_result *);
- __be16 protocol;
-
- /* All the rest */
- u32 prio;
- void *data;
- const struct tcf_proto_ops *ops;
- struct tcf_chain *chain;
- struct rcu_head rcu;
-};
+struct tcf_proto_ops;

struct qdisc_skb_cb {
unsigned int pkt_len;
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 229d63c99be2..e946ada18299 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -25,11 +25,12 @@
#include <linux/list.h>
#include <net/net_namespace.h>
#include <net/sock.h>
-#include <net/sch_generic.h>
#include <net/pkt_cls.h>
#include <net/act_api.h>
#include <net/netlink.h>

+#include "tcf_proto.h"
+
static int tcf_action_goto_chain_init(struct tc_action *a, struct tcf_proto *tp)
{
u32 chain_index = a->tcfa_action & TC_ACT_EXT_VAL_MASK;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 31bd1439cf60..be5fba6355c5 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -31,6 +31,8 @@
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+
/* The list of all installed classifier types */
static LIST_HEAD(tcf_proto_base);

diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 6a5dce8baf19..3772432889f2 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -22,6 +22,8 @@
#include <net/act_api.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+
struct basic_head {
struct list_head flist;
struct idr handle_idr;
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index fa6fe2fe0f32..fb2478e357cd 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -23,6 +23,8 @@
#include <net/pkt_cls.h>
#include <net/sock.h>

+#include "tcf_proto.h"
+
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Daniel Borkmann <[email protected]>");
MODULE_DESCRIPTION("TC BPF based classifier");
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 3bc01bdde165..5638c711e53c 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -18,6 +18,8 @@
#include <net/sock.h>
#include <net/cls_cgroup.h>

+#include "tcf_proto.h"
+
struct cls_cgroup_head {
u32 handle;
struct tcf_exts exts;
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 2bb043cd436b..7e60e432e3a8 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -33,6 +33,8 @@
#include <net/netfilter/nf_conntrack.h>
#endif

+#include "tcf_proto.h"
+
struct flow_head {
struct list_head filters;
struct rcu_head rcu;
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 6fd9bdd93796..b36c61f7ee44 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -20,7 +20,6 @@
#include <linux/ip.h>
#include <linux/mpls.h>

-#include <net/sch_generic.h>
#include <net/pkt_cls.h>
#include <net/ip.h>
#include <net/flow_dissector.h>
@@ -29,6 +28,8 @@
#include <net/dst.h>
#include <net/dst_metadata.h>

+#include "tcf_proto.h"
+
struct fl_flow_key {
int indev_ifindex;
struct flow_dissector_key_control control;
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 29eeeaf3ea44..be872b1653f5 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -28,7 +28,8 @@
#include <net/netlink.h>
#include <net/act_api.h>
#include <net/pkt_cls.h>
-#include <net/sch_generic.h>
+
+#include "tcf_proto.h"

#define HTSIZE 256

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 856fa79d4ffd..708faf62ecab 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -13,9 +13,10 @@
#include <linux/init.h>
#include <linux/module.h>

-#include <net/sch_generic.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+
struct cls_mall_head {
struct tcf_exts exts;
struct tcf_result res;
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index 0404aa5fa7cb..d40ae6d14b2d 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -22,6 +22,8 @@
#include <net/act_api.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+
/*
* 1. For now we assume that route tags < 256.
* It allows to use direct table lookups, instead of hash tables.
diff --git a/net/sched/cls_rsvp.c b/net/sched/cls_rsvp.c
index cbb5e0d600f3..131a81aeaa4e 100644
--- a/net/sched/cls_rsvp.c
+++ b/net/sched/cls_rsvp.c
@@ -20,6 +20,8 @@
#include <net/act_api.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+
#define RSVP_DST_LEN 1
#define RSVP_ID "rsvp"
#define RSVP_OPS cls_rsvp_ops
diff --git a/net/sched/cls_rsvp6.c b/net/sched/cls_rsvp6.c
index dd08aea2aee5..159dc01cf251 100644
--- a/net/sched/cls_rsvp6.c
+++ b/net/sched/cls_rsvp6.c
@@ -20,6 +20,8 @@
#include <net/pkt_cls.h>
#include <net/netlink.h>

+#include "tcf_proto.h"
+
#define RSVP_DST_LEN 4
#define RSVP_ID "rsvp6"
#define RSVP_OPS cls_rsvp6_ops
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 9ccc93f257db..e7d06c3d40a3 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -13,7 +13,8 @@
#include <net/act_api.h>
#include <net/netlink.h>
#include <net/pkt_cls.h>
-#include <net/sch_generic.h>
+
+#include "tcf_proto.h"

/*
* Passing parameters to the root seems to be done more awkwardly than really
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index d5d2a6dc3921..7b3bdfd80001 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -47,6 +47,8 @@
#include <net/pkt_cls.h>
#include <linux/idr.h>

+#include "tcf_proto.h"
+
struct tc_u_knode {
struct tc_u_knode __rcu *next;
u32 handle;
diff --git a/net/sched/ematch.c b/net/sched/ematch.c
index 1331a4c2d8ff..b123880fbe07 100644
--- a/net/sched/ematch.c
+++ b/net/sched/ematch.c
@@ -90,6 +90,8 @@
#include <linux/skbuff.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+
static LIST_HEAD(ematch_ops);
static DEFINE_RWLOCK(ematch_mod_lock);

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 98541c6399db..d6ac218811d0 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -37,6 +37,8 @@
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+
/*

Short review.
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index cd49afca9617..6bf259e55319 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -17,6 +17,8 @@
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+
/*
* The ATM queuing discipline provides a framework for invoking classifiers
* (aka "filters"), which in turn select classes of this queuing discipline.
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 35fc7252187c..fcfd5f321447 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -75,6 +75,8 @@
#include <net/netfilter/nf_conntrack_core.h>
#endif

+#include "tcf_proto.h"
+
#define CAKE_SET_WAYS (8)
#define CAKE_MAX_TINS (8)
#define CAKE_QUEUES (1024)
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index f42025d53cfe..8021ba377dfd 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -21,6 +21,8 @@
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+

/* Class-Based Queueing (CBQ) algorithm.
=======================================
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index e0b0cf8a9939..19a48fa95b9b 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -18,6 +18,8 @@
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+
struct drr_class {
struct Qdisc_class_common common;
unsigned int filter_cnt;
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 049714c57075..b3a4537afbcb 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -18,6 +18,8 @@
#include <net/inet_ecn.h>
#include <asm/byteorder.h>

+#include "tcf_proto.h"
+
/*
* classid class marking
* ------- ----- -------
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 6c0a9d5dbf94..8868a8e1a81f 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -28,6 +28,8 @@
#include <net/codel_impl.h>
#include <net/codel_qdisc.h>

+#include "tcf_proto.h"
+
/* Fair Queue CoDel.
*
* Principles :
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 3278a76f6861..9c75b77da56e 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -68,6 +68,8 @@
#include <net/pkt_cls.h>
#include <asm/div64.h>

+#include "tcf_proto.h"
+
/*
* kernel internal service curve representation:
* coordinates are given by 64 bit unsigned integers.
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 43c4bfe625a9..c206b3cfdfb2 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -38,10 +38,10 @@
#include <linux/workqueue.h>
#include <linux/slab.h>
#include <net/netlink.h>
-#include <net/sch_generic.h>
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
/* HTB algorithm.
Author: [email protected]
========================================================================
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 1da7ea8de0ad..107563c14e24 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -27,6 +27,8 @@
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+
struct multiq_sched_data {
u16 bands;
u16 max_bands;
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 222e53d3d27a..4fed3fd38dd3 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -22,6 +22,8 @@
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+
struct prio_sched_data {
int bands;
struct tcf_proto __rcu *filter_list;
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index bb1a9c11fc54..32f68e639037 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -19,6 +19,8 @@
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>

+#include "tcf_proto.h"
+

/* Quick Fair Queueing Plus
========================
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 7cbdad8419b7..5465249c600f 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -28,6 +28,8 @@
#include <net/pkt_cls.h>
#include <net/inet_ecn.h>

+#include "tcf_proto.h"
+
/*
* SFB uses two B[l][n] : L x N arrays of bins (L levels, N bins per level)
* This implementation uses L = 8 and N = 16
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 2f2678197760..abc1598e87e7 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -26,6 +26,8 @@
#include <net/pkt_cls.h>
#include <net/red.h>

+#include "tcf_proto.h"
+

/* Stochastic Fairness Queuing algorithm.
=======================================
diff --git a/net/sched/tcf_proto.h b/net/sched/tcf_proto.h
new file mode 100644
index 000000000000..b8d0e15e7f26
--- /dev/null
+++ b/net/sched/tcf_proto.h
@@ -0,0 +1,104 @@
+/* struct tcf_proto internal details - outside of net/sched it's opaque */
+
+#include <net/sch_generic.h>
+
+struct tcf_proto {
+ /* Fast access part */
+ struct tcf_proto __rcu *next;
+ void __rcu *root;
+
+ /* called under RCU BH lock*/
+ int (*classify)(struct sk_buff *,
+ const struct tcf_proto *,
+ struct tcf_result *);
+ __be16 protocol;
+
+ /* All the rest */
+ u32 prio;
+ void *data;
+ const struct tcf_proto_ops *ops;
+ struct tcf_chain *chain;
+ struct rcu_head rcu;
+};
+
+struct tcf_proto_ops {
+ struct list_head head;
+ char kind[IFNAMSIZ];
+
+ int (*classify)(struct sk_buff *,
+ const struct tcf_proto *,
+ struct tcf_result *);
+ int (*init)(struct tcf_proto*);
+ void (*destroy)(struct tcf_proto *tp,
+ struct netlink_ext_ack *extack);
+
+ void* (*get)(struct tcf_proto*, u32 handle);
+ int (*change)(struct net *net, struct sk_buff *,
+ struct tcf_proto*, unsigned long,
+ u32 handle, struct nlattr **,
+ void **, bool,
+ struct netlink_ext_ack *);
+ int (*delete)(struct tcf_proto *tp, void *arg,
+ bool *last,
+ struct netlink_ext_ack *);
+ void (*walk)(struct tcf_proto*, struct tcf_walker *arg);
+ int (*reoffload)(struct tcf_proto *tp, bool add,
+ tc_setup_cb_t *cb, void *cb_priv,
+ struct netlink_ext_ack *extack);
+ void (*bind_class)(void *, u32, unsigned long);
+ void * (*tmplt_create)(struct net *net,
+ struct tcf_chain *chain,
+ struct nlattr **tca,
+ struct netlink_ext_ack *extack);
+ void (*tmplt_destroy)(void *tmplt_priv);
+
+ /* rtnetlink specific */
+ int (*dump)(struct net*, struct tcf_proto*, void *,
+ struct sk_buff *skb, struct tcmsg*);
+ int (*tmplt_dump)(struct sk_buff *skb,
+ struct net *net,
+ void *tmplt_priv);
+
+ struct module *owner;
+};
+
+static inline void
+tcf_bind_filter(struct tcf_proto *tp, struct tcf_result *r, unsigned long base)
+{
+ struct Qdisc *q = tp->chain->block->q;
+ unsigned long cl;
+
+ /* Check q as it is not set for shared blocks. In that case,
+ * setting class is not supported.
+ */
+ if (!q)
+ return;
+ cl = q->ops->cl_ops->bind_tcf(q, base, r->classid);
+ cl = cls_set_class(q, &r->class, cl);
+ if (cl)
+ q->ops->cl_ops->unbind_tcf(q, cl);
+}
+
+static inline void
+tcf_unbind_filter(struct tcf_proto *tp, struct tcf_result *r)
+{
+ struct Qdisc *q = tp->chain->block->q;
+ unsigned long cl;
+
+ if (!q)
+ return;
+ if ((cl = __cls_set_class(&r->class, 0)) != 0)
+ q->ops->cl_ops->unbind_tcf(q, cl);
+}
+
+static inline void
+tc_cls_common_offload_init(struct tc_cls_common_offload *cls_common,
+ const struct tcf_proto *tp, u32 flags,
+ struct netlink_ext_ack *extack)
+{
+ cls_common->chain_index = tp->chain->index;
+ cls_common->protocol = tp->protocol;
+ cls_common->prio = tp->prio;
+ if (tc_skip_sw(flags) || flags & TCA_CLS_FLAGS_VERBOSE)
+ cls_common->extack = extack;
+}

2018-08-27 02:00:59

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL



On Sun, 26 Aug 2018, Joe Perches wrote:

> On Sun, 2018-08-26 at 22:24 +0100, Al Viro wrote:
> > On Sun, Aug 26, 2018 at 11:57:57AM -0700, Joe Perches wrote:
> >
> > > > That, BTW, is why I hate the use of sizeof(*p) in kmalloc, etc.
> > > > arguments. typeof is even worse in that respect.
> > >
> > > True. Semantic searches via tools like coccinelle could help here
> > > but those searches are quite a bit slower than straightforward greps.
> >
> > Those searches are .config-sensitive as well, which can be much more
> > unpleasant than being slow...
>
> Are they? Julia?

I don't completely understand the question. Coccinelle doens't know
anything about the configuration.

julia

2018-08-27 02:02:38

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL



On Sun, 26 Aug 2018, Al Viro wrote:

> On Sun, Aug 26, 2018 at 03:26:54PM -0700, Joe Perches wrote:
> > On Sun, 2018-08-26 at 22:24 +0100, Al Viro wrote:
> > > On Sun, Aug 26, 2018 at 11:57:57AM -0700, Joe Perches wrote:
> > >
> > > > > That, BTW, is why I hate the use of sizeof(*p) in kmalloc, etc.
> > > > > arguments. typeof is even worse in that respect.
> > > >
> > > > True. Semantic searches via tools like coccinelle could help here
> > > > but those searches are quite a bit slower than straightforward greps.
> > >
> > > Those searches are .config-sensitive as well, which can be much more
> > > unpleasant than being slow...
> >
> > Are they? Julia?
>
> They work pretty much on preprocessor output level; if something it ifdef'ed
> out on given config, it won't be seen...

Coccinelle doesn't care what is ifdef'd out. It only misses the things it
can't parse. Very strange ifdefs could indeed cause that, but it should
be a minor problem.

julia

2018-08-27 02:36:51

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Sun, Aug 26, 2018 at 10:00:46PM -0400, Julia Lawall wrote:
>
>
> On Sun, 26 Aug 2018, Al Viro wrote:
>
> > On Sun, Aug 26, 2018 at 03:26:54PM -0700, Joe Perches wrote:
> > > On Sun, 2018-08-26 at 22:24 +0100, Al Viro wrote:
> > > > On Sun, Aug 26, 2018 at 11:57:57AM -0700, Joe Perches wrote:
> > > >
> > > > > > That, BTW, is why I hate the use of sizeof(*p) in kmalloc, etc.
> > > > > > arguments. typeof is even worse in that respect.
> > > > >
> > > > > True. Semantic searches via tools like coccinelle could help here
> > > > > but those searches are quite a bit slower than straightforward greps.
> > > >
> > > > Those searches are .config-sensitive as well, which can be much more
> > > > unpleasant than being slow...
> > >
> > > Are they? Julia?
> >
> > They work pretty much on preprocessor output level; if something it ifdef'ed
> > out on given config, it won't be seen...
>
> Coccinelle doesn't care what is ifdef'd out. It only misses the things it
> can't parse. Very strange ifdefs could indeed cause that, but it should
> be a minor problem.

OK, but... what does it do when it sees two definitions of a structure
in different branches of #if/#else/#endif? I think I'm confused about
what it can and cannot do; to restate the original problem:
* we need to find all places where instances of given type
are created. Assume it never is a member of struct/union/array and
no static or auto duration instances exist - everything is dynamically
allocated somewhere.

Can coccinelle do that and if it can, what are the limitations?

2018-08-27 03:36:57

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL



On Mon, 27 Aug 2018, Al Viro wrote:

> On Sun, Aug 26, 2018 at 10:00:46PM -0400, Julia Lawall wrote:
> >
> >
> > On Sun, 26 Aug 2018, Al Viro wrote:
> >
> > > On Sun, Aug 26, 2018 at 03:26:54PM -0700, Joe Perches wrote:
> > > > On Sun, 2018-08-26 at 22:24 +0100, Al Viro wrote:
> > > > > On Sun, Aug 26, 2018 at 11:57:57AM -0700, Joe Perches wrote:
> > > > >
> > > > > > > That, BTW, is why I hate the use of sizeof(*p) in kmalloc, etc.
> > > > > > > arguments. typeof is even worse in that respect.
> > > > > >
> > > > > > True. Semantic searches via tools like coccinelle could help here
> > > > > > but those searches are quite a bit slower than straightforward greps.
> > > > >
> > > > > Those searches are .config-sensitive as well, which can be much more
> > > > > unpleasant than being slow...
> > > >
> > > > Are they? Julia?
> > >
> > > They work pretty much on preprocessor output level; if something it ifdef'ed
> > > out on given config, it won't be seen...
> >
> > Coccinelle doesn't care what is ifdef'd out. It only misses the things it
> > can't parse. Very strange ifdefs could indeed cause that, but it should
> > be a minor problem.
>
> OK, but... what does it do when it sees two definitions of a structure
> in different branches of #if/#else/#endif? I think I'm confused about
> what it can and cannot do; to restate the original problem:
> * we need to find all places where instances of given type
> are created. Assume it never is a member of struct/union/array and
> no static or auto duration instances exist - everything is dynamically
> allocated somewhere.
>
> Can coccinelle do that and if it can, what are the limitations?

Looking at the thread, it seems that what you are asking for is something
like:

@@
struct tcf_proto *x;
@@

* x = \(kmalloc\|kzalloc\|devm_kmalloc\|devm_kzalloc\)(...)

The * at the beginning of the line means to highlight what you are looking
for, which is done by making a diff in which the highlighted line
appears to be removed.

The limitation is the ability to figure out the type of x. If it is a
local variable, Coccinelle should have no problem. If it is a structure
field, it may be necessary to provide command line arguments like

--all-includes --include-headers-for-types

--all-includes means to try to find all include files that are mentioned
in the .c file. The next stronger option is --recursive includes, which
means include what all of the mentioned files include as well,
recursively. This tends to cause a major performance hit, because a lot
of code is being parsed. --include-headers-for-types heals a bit with
that, as it only considers the header files when computing type
information, and now when applying the rules.

With respect to ifdefs around variable declarations and structure field
declaration, in these cases Coccinelle considers that it cannot make the
ifdef have an if-like control flow, and so if considers the #ifdef, #else
and #endif to be comments. Thus it takes into account only the last type
provided for a given variable. For example, in the following:

int main() {
#ifdef X
int x;
#else
char xx;
#endif

return x;
}

int main2() {
#ifdef X
char x;
#else
int x;
#endif

return x;
}

x is considered to have type int in both returns. But if xx is replaced
by x in the definition of main, then x at the point of the return will
have type char.

Around a statement, such as x = kmalloc(...), it should be able to
consider that both branches of an ifdef are possible.

But there are no absolute guarantees. If there is any problem in parsing
a line of some function, the whole function will be ignores.

julia

2018-08-27 04:06:02

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Sun, Aug 26, 2018 at 11:35:17PM -0400, Julia Lawall wrote:

> * x = \(kmalloc\|kzalloc\|devm_kmalloc\|devm_kzalloc\)(...)

I can name several you've missed right off the top of my head -
vmalloc, kvmalloc, kmem_cache_alloc, kmem_cache_zalloc, variants
with _trace slapped on, and that is not to mention the things like
get_free_page or

void *my_k3wl_alloc(u64 n) // 'cause all artificial limits suck, that's why
{
lots and lots of home-grown stats collection
some tracepoints thrown in just for fun
return kmalloc(n);
}

(and no, I'm not implying that net/sched folks had done anything of that
sort; I have seen that and worse in drivers, though)

> The * at the beginning of the line means to highlight what you are looking
> for, which is done by making a diff in which the highlighted line
> appears to be removed.

Umm... Does that cover return, BTW? Or something like
T *barf;
extern void foo(T *p);
foo(kmalloc(sizeof(*barf)));


> The limitation is the ability to figure out the type of x. If it is a
> local variable, Coccinelle should have no problem. If it is a structure
> field, it may be necessary to provide command line arguments like
>
> --all-includes --include-headers-for-types
>
> --all-includes means to try to find all include files that are mentioned
> in the .c file. The next stronger option is --recursive includes, which
> means include what all of the mentioned files include as well,
> recursively. This tends to cause a major performance hit, because a lot
> of code is being parsed. --include-headers-for-types heals a bit with
> that, as it only considers the header files when computing type
> information, and now when applying the rules.
>
> With respect to ifdefs around variable declarations and structure field
> declaration, in these cases Coccinelle considers that it cannot make the
> ifdef have an if-like control flow, and so if considers the #ifdef, #else
> and #endif to be comments. Thus it takes into account only the last type
> provided for a given variable.

[snip]

What about several variants of structure definition? Because ifdefs around
includes do occur in the wild...

2018-08-27 04:42:45

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL



On Mon, 27 Aug 2018, Al Viro wrote:

> On Sun, Aug 26, 2018 at 11:35:17PM -0400, Julia Lawall wrote:
>
> > * x = \(kmalloc\|kzalloc\|devm_kmalloc\|devm_kzalloc\)(...)
>
> I can name several you've missed right off the top of my head -
> vmalloc, kvmalloc, kmem_cache_alloc, kmem_cache_zalloc, variants
> with _trace slapped on, and that is not to mention the things like
> get_free_page or

OK, maybe for a given type the set of functions would be smaller.

>
> void *my_k3wl_alloc(u64 n) // 'cause all artificial limits suck, that's why
> {
> lots and lots of home-grown stats collection
> some tracepoints thrown in just for fun
> return kmalloc(n);
> }
>
> (and no, I'm not implying that net/sched folks had done anything of that
> sort; I have seen that and worse in drivers, though)
>
> > The * at the beginning of the line means to highlight what you are looking
> > for, which is done by making a diff in which the highlighted line
> > appears to be removed.
>
> Umm... Does that cover return, BTW? Or something like
> T *barf;
> extern void foo(T *p);
> foo(kmalloc(sizeof(*barf)));

It only covers the pattern that is shown, ie an assignment. For this,
another pattern would be needed. It would be necessary to match first the
call that one is concerned with and then go find the function definition
or prototype to find the type of the associated parameter. It is possible
to count the offset of the kmalloc call in the argument list and then get
the type at the corresponding offset in the parameter list of the function
declaration or prototype.

>
>
> > The limitation is the ability to figure out the type of x. If it is a
> > local variable, Coccinelle should have no problem. If it is a structure
> > field, it may be necessary to provide command line arguments like
> >
> > --all-includes --include-headers-for-types
> >
> > --all-includes means to try to find all include files that are mentioned
> > in the .c file. The next stronger option is --recursive includes, which
> > means include what all of the mentioned files include as well,
> > recursively. This tends to cause a major performance hit, because a lot
> > of code is being parsed. --include-headers-for-types heals a bit with
> > that, as it only considers the header files when computing type
> > information, and now when applying the rules.
> >
> > With respect to ifdefs around variable declarations and structure field
> > declaration, in these cases Coccinelle considers that it cannot make the
> > ifdef have an if-like control flow, and so if considers the #ifdef, #else
> > and #endif to be comments. Thus it takes into account only the last type
> > provided for a given variable.
>
> [snip]
>
> What about several variants of structure definition? Because ifdefs around
> includes do occur in the wild...

Such ifdefs would be ignored completely. I suspect that only the last
definition of the structure would be taken into account.

julia

2018-08-27 11:49:18

by Jamal Hadi Salim

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On 2018-08-26 5:56 p.m., Kees Cook wrote:
> On Sun, Aug 26, 2018 at 10:30 AM, Jamal Hadi Salim <[email protected]> wrote:
>> We should add an nla_policy later.
>
> What's the right way to do that for cases like this?

Meant something like attached which you alluded-to in your comments
would give an upper bound (Max allowed keys is 128).

cheers,
jamal


Attachments:
patchlet (706.00 B)

2018-08-27 11:59:25

by Jamal Hadi Salim

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On 2018-08-26 6:57 p.m., Al Viro wrote:
> On Sun, Aug 26, 2018 at 06:32:37PM +0100, Al Viro wrote:
>
>> As far as I can tell, the solution is
> [snip long and painful reasoning]
>> pointers, and not in provably opaque fashion. Theoretically, the three tcf_...
>> inlines above need another look; fortunately, they don't use ->next at all, not to
>> mention not being used anywhere outside of net/sched/*.c
>>
>> The 80 lines above prove that we only need to grep net/sched/*.c for
>> tcf_proto_ops method calls. And only because we don't have (thank $DEITY)
>> anything that could deconstruct types - as soon as some bastard grows means
>> to say "type of the second argument of the function pointed to by p", this
>> kind of analysis, painful as it is, goes out of window. Even as it is,
>> do you really like the idea of newbies trying to get through the exercises
>> like the one above?
>
> BTW, would there be any problem if we took the definitions of tcf_proto and
> tcf_proto_ops to e.g. net/sched/tcf_proto.h (along with the three inlines in
> in pkt_cls.h), left forwards in sch_generic.h and added includes of "tcf_proto.h"
> where needed in net/sched/*.c?
>

I cant think of any challenges. Cong/Jiri? Would it require development
time classifiers/actions/qdiscs to sit in that directory (I suspect you
dont want them in include/net).
BTW, the idea of improving grep-ability of the code by prefixing the
ops appropriately makes sense. i.e we should have ops->cls_init,
ops->act_init etc.

cheers,
jamal

> That would make tcf_proto/tcf_proto_ops opaque outside of net/sched, reducing
> the exposure of internals. Something like a diff below (against net/master,
> builds clean, ought to result in identical binary):
>

2018-08-27 14:09:57

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Mon, Aug 27, 2018 at 4:46 AM, Jamal Hadi Salim <[email protected]> wrote:
> On 2018-08-26 5:56 p.m., Kees Cook wrote:
>>
>> On Sun, Aug 26, 2018 at 10:30 AM, Jamal Hadi Salim <[email protected]>
>> wrote:
>>>
>>> We should add an nla_policy later.
>>
>>
>> What's the right way to do that for cases like this?
>
>
> Meant something like attached which you alluded-to in your comments
> would give an upper bound (Max allowed keys is 128).

The problem is that policy doesn't parse the contents: "nkeys"
determines the size, so we have to both validate minimum size (to be
sure the location of "nkeys" is valid) and check that the size is at
least nkeys * struct long. I don't think there is a way to do this
with the existing policy language.

-Kees

--
Kees Cook
Pixel Security

2018-08-27 14:28:13

by Roman Mashak

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

Kees Cook <[email protected]> writes:

> On Mon, Aug 27, 2018 at 4:46 AM, Jamal Hadi Salim <[email protected]> wrote:
>> On 2018-08-26 5:56 p.m., Kees Cook wrote:
>>>
>>> On Sun, Aug 26, 2018 at 10:30 AM, Jamal Hadi Salim <[email protected]>
>>> wrote:
>>>>
>>>> We should add an nla_policy later.
>>>
>>>
>>> What's the right way to do that for cases like this?
>>
>>
>> Meant something like attached which you alluded-to in your comments
>> would give an upper bound (Max allowed keys is 128).
>
> The problem is that policy doesn't parse the contents: "nkeys"
> determines the size, so we have to both validate minimum size (to be
> sure the location of "nkeys" is valid) and check that the size is at
> least nkeys * struct long. I don't think there is a way to do this
> with the existing policy language.

While at these changes, could you also add and export in UAPI max
allowed keys count, which is currently 128? For example,
TCA_U32_NKEYS_MAX in pkt_cls.h

2018-08-27 21:33:16

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Mon, Aug 27, 2018 at 4:58 AM Jamal Hadi Salim <[email protected]> wrote:
>
> On 2018-08-26 6:57 p.m., Al Viro wrote:
> > On Sun, Aug 26, 2018 at 06:32:37PM +0100, Al Viro wrote:
> >
> >> As far as I can tell, the solution is
> > [snip long and painful reasoning]
> >> pointers, and not in provably opaque fashion. Theoretically, the three tcf_...
> >> inlines above need another look; fortunately, they don't use ->next at all, not to
> >> mention not being used anywhere outside of net/sched/*.c
> >>
> >> The 80 lines above prove that we only need to grep net/sched/*.c for
> >> tcf_proto_ops method calls. And only because we don't have (thank $DEITY)
> >> anything that could deconstruct types - as soon as some bastard grows means
> >> to say "type of the second argument of the function pointed to by p", this
> >> kind of analysis, painful as it is, goes out of window. Even as it is,
> >> do you really like the idea of newbies trying to get through the exercises
> >> like the one above?
> >
> > BTW, would there be any problem if we took the definitions of tcf_proto and
> > tcf_proto_ops to e.g. net/sched/tcf_proto.h (along with the three inlines in
> > in pkt_cls.h), left forwards in sch_generic.h and added includes of "tcf_proto.h"
> > where needed in net/sched/*.c?
> >
>
> I cant think of any challenges. Cong/Jiri? Would it require development
> time classifiers/actions/qdiscs to sit in that directory (I suspect you
> dont want them in include/net).
> BTW, the idea of improving grep-ability of the code by prefixing the
> ops appropriately makes sense. i.e we should have ops->cls_init,
> ops->act_init etc.

Hmm? Isn't struct tcf_proto_ops used and must be provided
by each tc filter module? How does it work if you move it into
net/sched/* for out-of-tree modules? Are they supposed to
include "..../net/sched/tcf_proto.h"?? Or something else?

BTW, we need some grep tool that really understands C syntax,
not making each variable friendly to plain grep.

2018-08-28 00:06:51

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Mon, Aug 27, 2018 at 02:31:41PM -0700, Cong Wang wrote:
> > I cant think of any challenges. Cong/Jiri? Would it require development
> > time classifiers/actions/qdiscs to sit in that directory (I suspect you
> > dont want them in include/net).
> > BTW, the idea of improving grep-ability of the code by prefixing the
> > ops appropriately makes sense. i.e we should have ops->cls_init,
> > ops->act_init etc.
>
> Hmm? Isn't struct tcf_proto_ops used and must be provided
> by each tc filter module? How does it work if you move it into
> net/sched/* for out-of-tree modules? Are they supposed to
> include "..../net/sched/tcf_proto.h"?? Or something else?

If you care about out-of-tree modules, that could easily live in
include/net/tcf_proto.h, provided that it's not pulled by indirect
includes into hell knows how many places. Try
make allmodconfig
make >/dev/null 2>&1
find -name '.*.cmd'|xargs grep sch_generic.h

That finds 2977 files here, most of them having nothing to do with
net/sched.

> BTW, we need some grep tool that really understands C syntax,
> not making each variable friendly to plain grep.

This isn't the matter of C syntax; it needs to handle C typization,
and you really can't do that anywhere near reliably without looking
at preprocessor output. Which very much depends upon .config...

BTW, something odd in cls_u32.c: what happens if we have the following
graph:
tcf_proto <tp>, it's ->data being <c0> and ->root - <ht0>
tc_u_common <c0>, in its ->hlist
<ht1>, in its ->ht[0]
<knode>
<ht0>
and set ->ht_down in <knode> to the <ht0>? AFAICS,
there's nothing to prevent that - TCA_U32_LINK being
0x80000000 will do just that. What happens upon u32_destroy()
in that case? Unless I'm misreading that code, refcounts will be
<c0>: 1
<ht0>: 2
<ht1>: 1
and in u32_destroy() we'll get this:
root_ht = <ht0>
tp_c = <c0>
if (root_ht && --root_ht->refcnt == 0)
u32_destroy_hnode(tp, root_ht, extack);
decrements refcnt to 1 and does nothing else.
if (--tp_c->refcnt == 0) {
is satisfied
hlist_del(&tp_c->hnode);
<c0> unhashed
while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) {
we take ht = <ht1>
u32_clear_hnode(tp, ht, extack);
which does
for (h = 0; h <= ht->divisor; h++) {
while ((n = rtnl_dereference(ht->ht[h])) != NULL) {
n = <knode>
RCU_INIT_POINTER(ht->ht[h],
rtnl_dereference(n->next));
remove <knode> from <ht1>->ht[0]
tcf_unbind_filter(tp, &n->res);
u32_remove_hw_knode(tp, n, extack);
idr_remove(&ht->handle_idr, n->handle);
if (tcf_exts_get_net(&n->exts))
tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
else
u32_destroy_key(n->tp, n, true);
... and we hit u32_destroy_key(<tp>, <knode>, true), which does
struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
ht = <ht0>
tcf_exts_destroy(&n->exts);
tcf_exts_put_net(&n->exts);
if (ht && --ht->refcnt == 0)
kfree(ht);
*NOW* <ht0>->refcnt is 0, and we free the damn thing.
....
kfree(n);
<knode> is freed and we return to u32_destroy_hnode() where we
see that there's nothing else left in <ht1>->ht[...] and return
to u32_destroy(). Where
RCU_INIT_POINTER(tp_c->hlist, ht->next);
sets <c0>->hlist to <ht1>->next, aka <h0>. Which is already freed.

/* u32_destroy_key() will later free ht for us, if it's
* still referenced by some knode
*/
if (--ht->refcnt == 0)
kfree_rcu(ht, rcu);
<ht1>->refcnt reaches 0 and we free it (RCU-delayed)
}
... and we go for the next iteration, this time with ht = <ht0>.
Doing all kinds of unsanitary things to the memory it used to occupy...

Incidentally, if we hit
tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
instead of u32_destroy_key(), the things don't seem to be any better - we
won't do anything to <knode> until rtnl is dropped, so u32_destroy() won't
break on the second pass through the loop - it'll free <ht0> there and
return. Setting us up for trouble, since when u32_delete_key_freepf_work()
finally gets to u32_destroy_key() we'll have <knode>->ht_down pointing
to freed memory and decrementing its contents...

What am I missing in there? Is it just "we should never have ->ht_down
pointing to anyone's ->root"? If so, I'm not sure how to detect that;
if not... what should happen to the orphaned root_ht? Should it
remain on the list? We might have two tcf_proto sharing tp->data,
so tp_c and its list might very well survive the u32_destroy()...

Note, BTW, that if we do leave the orphan on the list and later
change the tc_u_knode so that ->ht_down doesn't point to that
thing anymore, we'll get its refcount incremented to 2 in
u32_init_knode(), then decremented to 1 by u32_set_parms() and
then arrange for u32_delete_key_work() to be run. Which will
drive the refcount to 0 and free the damn thing. While it's
still in the middle of ->hlist...

2018-08-28 16:01:10

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Tue, Aug 28, 2018 at 01:03:10AM +0100, Al Viro wrote:
> if (tcf_exts_get_net(&n->exts))
> tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
> else
> u32_destroy_key(n->tp, n, true);
> ... and we hit u32_destroy_key(<tp>, <knode>, true), which does

Speaking of which, we'd better never hit that branch for other reasons - there's
no RCU delay between removal of knode from the hash chain and its kfree().
tcf_queue_work() does guarantee such delay (by use of queue_rcu_work()), direct
call doesn't...

Anyway, whichever branch is taken, the memory corruption problem remains - the
comments below are accurate, AFAICS.

> Incidentally, if we hit
> tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
> instead of u32_destroy_key(), the things don't seem to be any better - we
> won't do anything to <knode> until rtnl is dropped, so u32_destroy() won't
> break on the second pass through the loop - it'll free <ht0> there and
> return. Setting us up for trouble, since when u32_delete_key_freepf_work()
> finally gets to u32_destroy_key() we'll have <knode>->ht_down pointing
> to freed memory and decrementing its contents...

2018-08-29 19:09:36

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Mon, Aug 27, 2018 at 5:03 PM Al Viro <[email protected]> wrote:
>
> On Mon, Aug 27, 2018 at 02:31:41PM -0700, Cong Wang wrote:
> > > I cant think of any challenges. Cong/Jiri? Would it require development
> > > time classifiers/actions/qdiscs to sit in that directory (I suspect you
> > > dont want them in include/net).
> > > BTW, the idea of improving grep-ability of the code by prefixing the
> > > ops appropriately makes sense. i.e we should have ops->cls_init,
> > > ops->act_init etc.
> >
> > Hmm? Isn't struct tcf_proto_ops used and must be provided
> > by each tc filter module? How does it work if you move it into
> > net/sched/* for out-of-tree modules? Are they supposed to
> > include "..../net/sched/tcf_proto.h"?? Or something else?
>
> If you care about out-of-tree modules, that could easily live in
> include/net/tcf_proto.h, provided that it's not pulled by indirect
> includes into hell knows how many places. Try
> make allmodconfig
> make >/dev/null 2>&1
> find -name '.*.cmd'|xargs grep sch_generic.h
>
> That finds 2977 files here, most of them having nothing to do with
> net/sched.


Moving it to include/net/tcf_proto.h is fine, as out-of-tree modules
can still compile by modifying the included header path.

include/net/pkt_cls.h might be a choice here too.

2018-08-29 21:34:50

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Wed, Aug 29, 2018 at 12:07:09PM -0700, Cong Wang wrote:
> On Mon, Aug 27, 2018 at 5:03 PM Al Viro <[email protected]> wrote:
> >
> > On Mon, Aug 27, 2018 at 02:31:41PM -0700, Cong Wang wrote:
> > > > I cant think of any challenges. Cong/Jiri? Would it require development
> > > > time classifiers/actions/qdiscs to sit in that directory (I suspect you
> > > > dont want them in include/net).
> > > > BTW, the idea of improving grep-ability of the code by prefixing the
> > > > ops appropriately makes sense. i.e we should have ops->cls_init,
> > > > ops->act_init etc.
> > >
> > > Hmm? Isn't struct tcf_proto_ops used and must be provided
> > > by each tc filter module? How does it work if you move it into
> > > net/sched/* for out-of-tree modules? Are they supposed to
> > > include "..../net/sched/tcf_proto.h"?? Or something else?
> >
> > If you care about out-of-tree modules, that could easily live in
> > include/net/tcf_proto.h, provided that it's not pulled by indirect
> > includes into hell knows how many places. Try
> > make allmodconfig
> > make >/dev/null 2>&1
> > find -name '.*.cmd'|xargs grep sch_generic.h
> >
> > That finds 2977 files here, most of them having nothing to do with
> > net/sched.
>
>
> Moving it to include/net/tcf_proto.h is fine, as out-of-tree modules
> can still compile by modifying the included header path.
>
> include/net/pkt_cls.h might be a choice here too.

Nowhere near as massive exposure (123 files vs. 2977), but still - do
ethernet drivers need that? Because that's 77 out of 123...

2018-08-31 04:05:24

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL

On Tue, Aug 28, 2018 at 04:59:38PM +0100, Al Viro wrote:
> On Tue, Aug 28, 2018 at 01:03:10AM +0100, Al Viro wrote:
> > if (tcf_exts_get_net(&n->exts))
> > tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
> > else
> > u32_destroy_key(n->tp, n, true);
> > ... and we hit u32_destroy_key(<tp>, <knode>, true), which does
>
> Speaking of which, we'd better never hit that branch for other reasons - there's
> no RCU delay between removal of knode from the hash chain and its kfree().
> tcf_queue_work() does guarantee such delay (by use of queue_rcu_work()), direct
> call doesn't...
>
> Anyway, whichever branch is taken, the memory corruption problem remains - the
> comments below are accurate, AFAICS.
>
> > Incidentally, if we hit
> > tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
> > instead of u32_destroy_key(), the things don't seem to be any better - we
> > won't do anything to <knode> until rtnl is dropped, so u32_destroy() won't
> > break on the second pass through the loop - it'll free <ht0> there and
> > return. Setting us up for trouble, since when u32_delete_key_freepf_work()
> > finally gets to u32_destroy_key() we'll have <knode>->ht_down pointing
> > to freed memory and decrementing its contents...

Build the kernel with slab poisoning and try this:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: u32 divisor 1
tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: u32 divisor 1
tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1:0:11 u32 ht 1: link 801: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
tc filter delete dev eth0 parent ffff: protocol ip prio 200
tc filter change dev eth0 parent ffff: protocol ip prio 100 handle 1:0:11 u32 ht 1: link 0: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
tc filter delete dev eth0 parent ffff: protocol ip prio 100

Then watch it oops in u32_lookup_ht() from u32_get() from tc_del_tfilter()
Oopsing insn: cmp %ebp,0x8(%rbx). RBX: 6b6b6b6b6b6b6b6b, i.e. slab
poison...

What happens is that ht 801: (created when we'd added tcf_proto for prio 200)
gets pinned down by link 801: in the third tc filter add. Then removal of
prio 200 triggers u32_destroy(), dropping refcount on 801: and doing nothing
else to it. Then filter change drops the last reference to 801:, freeing
it. And we have a freed struct tc_u_hnode stuck in the middle of the list...