2016-10-13 06:27:17

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: slab corruption with current -git

On 2016.10.12 at 23:18 -0700, Linus Torvalds wrote:
> On Oct 12, 2016 23:07, "Markus Trippelsdorf" <[email protected]> wrote:
> >
> > This is nf_register_net_hook at net/netfilter/core.c:106
>
> The "*regs" access?

Yeah.

105 entry->orig_ops = reg;
106 entry->ops = *reg;
107 entry->next = NULL;

--
Markus


2016-10-13 19:52:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: slab corruption with current -git

On Wed, Oct 12, 2016 at 11:27 PM, Markus Trippelsdorf
<[email protected]> wrote:
>
> Yeah.
>
> 105 entry->orig_ops = reg;
> 106 entry->ops = *reg;
> 107 entry->next = NULL;

So ipt_register_table() does:

ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));

and then nf_register_net_hooks() just does

for (i = 0; i < n; i++) {
err = nf_register_net_hook(net, &reg[i]);

so if the *reg is uninitialized, it means that it's the 'ops[]' array
that isn't actually really valid in "valid_hooks". Odd. They should
all be initialized by xt_hook_ops_alloc(), no?

That said, xt_hook_ops_alloc() itself is odd. Lookie here, this is the
loop that initializes things:

for (i = 0, hooknum = 0; i < num_hooks && hook_mask != 0;
hook_mask >>= 1, ++hooknum) {

and it makes no sense to me how that tests *both* "i < num_hools" and
"hook_mask != 0".

Why? Because

num_hooks = hweight32(hook_mask);

so it's entirely redundant. num_hooks is already how many bits are on
in hook_mask, so that test is just duplicating the same thing twice
("have we done less than that number of bits" and "do we have any bits
less").

I don't know. There's something odd going on. Regardless, thsi is a
different problem from the nf_register_net_hook() list handling, so
I'll leave it to the networking people. David?

Linus

2016-10-13 20:45:49

by Florian Westphal

[permalink] [raw]
Subject: Re: slab corruption with current -git

Linus Torvalds <[email protected]> wrote:
> On Wed, Oct 12, 2016 at 11:27 PM, Markus Trippelsdorf
> <[email protected]> wrote:
> >
> > Yeah.
> >
> > 105 entry->orig_ops = reg;
> > 106 entry->ops = *reg;
> > 107 entry->next = NULL;
>
> So ipt_register_table() does:
>
> ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));
>
> and then nf_register_net_hooks() just does
>
> for (i = 0; i < n; i++) {
> err = nf_register_net_hook(net, &reg[i]);
>
> so if the *reg is uninitialized, it means that it's the 'ops[]' array
> that isn't actually really valid in "valid_hooks". Odd. They should
> all be initialized by xt_hook_ops_alloc(), no?

Its only partially initialized. Looking at Markus' splat
its complaining about first 16 bytes (list_head), whose contents are indeed
undefined when it gets copied to entry->ops.

For the time being this seems like the most simple "fix", until we
disentangle the hook description (which should be const) from run-time
allocated data.

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index e0aa7c1d0224..fc4977456c30 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1513,7 +1513,7 @@ xt_hook_ops_alloc(const struct xt_table *table,
nf_hookfn *fn)
if (!num_hooks)
return ERR_PTR(-EINVAL);

- ops = kmalloc(sizeof(*ops) * num_hooks, GFP_KERNEL);
+ ops = kcalloc(num_hooks, sizeof(*ops), GFP_KERNEL);
if (ops == NULL)
return ERR_PTR(-ENOMEM);

I'll pass such a patch to Pablo.

> That said, xt_hook_ops_alloc() itself is odd. Lookie here, this is the
> loop that initializes things:
>
> for (i = 0, hooknum = 0; i < num_hooks && hook_mask != 0;
> hook_mask >>= 1, ++hooknum) {
>
> and it makes no sense to me how that tests *both* "i < num_hools" and
> "hook_mask != 0".

Right, one of these is enough.

2016-10-13 21:33:49

by Al Viro

[permalink] [raw]
Subject: Re: slab corruption with current -git

On Thu, Oct 13, 2016 at 12:49:33PM -0700, Linus Torvalds wrote:

> That said, xt_hook_ops_alloc() itself is odd. Lookie here, this is the
> loop that initializes things:
>
> for (i = 0, hooknum = 0; i < num_hooks && hook_mask != 0;
> hook_mask >>= 1, ++hooknum) {
>
> and it makes no sense to me how that tests *both* "i < num_hools" and
> "hook_mask != 0".
>
> Why? Because
>
> num_hooks = hweight32(hook_mask);
>
> so it's entirely redundant. num_hooks is already how many bits are on
> in hook_mask, so that test is just duplicating the same thing twice
> ("have we done less than that number of bits" and "do we have any bits
> less").
>
> I don't know. There's something odd going on. Regardless, thsi is a
> different problem from the nf_register_net_hook() list handling, so
> I'll leave it to the networking people. David?

Hey, I remember looking through that stuff. <checks> There it is, in
a thread started by Krause Randomness(tm)... Short version: nf_hook_ops
is a mess - it's embedded into different objects, with different subsets
of fields used depending on the containing object and I would seriously
suggest moving some of those into those containing objects.

--------------------------------------------------------------------------
On Thu, Sep 01, 2016 at 08:10:44AM -0500, Eric Sandeen wrote:
> On 8/4/16 8:57 AM, Al Viro wrote:
>
> > Don't feed the troll. On all paths leading to that place we have
> > result->name = kname;
> > len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
> > or
> > result->name = kname;
> > len = strncpy_from_user(kname, filename, PATH_MAX);
> > with failure exits taken if strncpy_from_user() returns an error, which means
> > that the damn thing has already been copied into.
> >
> > FWIW, it looks a lot like buggered kmemcheck; as usual, he can't be bothered
> > to mention which kernel version would it be (let alone how to reproduce it
> > on the kernel in question), but IIRC davej had run into some instrumentation
> > breakage lately.
>
> The original report is in https://bugzilla.kernel.org/show_bug.cgi?id=120651
> if anyone is interested in it.

What the hell does that one have to getname_flags(), other than having
attracted the same... something on the edge of failing the Turing Test?

FWIW, looking at the netfilter one... That's nf_register_net_hook()
hitting
entry->ops = *reg;
with reg pointing to something uninitialized (according to kmemcheck, that is,
and presuming that it's not an instrumentation bug). With the callchain
in report, it came (all in the same assumptions) from
nf_register_net_hooks(net, ops, hweight32(table->valid_hooks))
with hweight32(table->valid_hooks) being greater than the amount of
initialized entries in ops[] (call site in ipt_register_table()).

This "ops" ought to be net/ipv4/netfilter/iptable_filter.c:filter_ops,
allocated by
filter_ops = xt_hook_ops_alloc(&packet_filter, iptable_filter_hook);
in iptable_filter_init(). "table" is &packet_filter and its contents ought
to be unchanged, so ->valid_hooks in there is FILTER_VALID_HOOKS, i.e.
((1 << NF_INET_LOCAL_IN) | (1 << NF_INET_FORWARD) | (1 << NF_INET_LOCAL_OUT)).

Which is to say, filter_ops[] had fewer than 3 initialized elements
when it got to the call of iptable_filter_table_init()... Since filter_ops
hadn't been NULL, the xt_hook_ops_alloc() call above must've already been
done. Said xt_hook_ops_alloc() should've allocated a 3-element array and
hooked through all of it, so it's not a wholesale uninitialized element, it's
uninitialized parts of one...

What gets initialized is ->hook, ->pf, ->hooknum and ->priority.
Let's figure out the offsets:
0: list (two pointers, i.e. 16 bytes)
0x10: hook (8)
0x18: dev (8)
0x20: priv (8)
0x28: pf (1)
0x29: padding (3)
0x2c: hooknum (4)
0x30: priority (4)
0x34: padding (8)

OK... The address of the damn thing is apparently ffff880037b4bd80 and
we see complaint about the accesses at offsets 0, 0x18, 8, 0x20 and then
the same pattern with 0x38 and 0x70 added (i.e. the same fields in the next
two elements of the same array). Then there are similar complaints, but
with a different call chain (iptable_mangle instead of iptable_filter).

These offsets are ->list, ->dev and ->priv, and those are exactly the ones
not initialized by xt_hook_ops_alloc(). Looking at the nf_register_net_hook(),
we have
list_add_rcu(&entry->ops.list, elem->list.prev);
a bit further down the road. ->dev and ->priv are left uninitialized (and
very likely - unused).

I would say it's a false positive. struct nf_hook_ops is embedded into a
bunch of different objects, with different subsets of fields getting used.
IMO it's a bad idea (in particular, I really wonder if ->list would've
been better off moved into (some of) the containing suckers), but it's
not a bug per se, just a design choice asking for trouble. One way of
getting kmemcheck off your back would be to switch xt_hook_ops_alloc() from
ops = kmalloc(sizeof(*ops) * num_hooks, GFP_KERNEL);
to
ops = kcalloc(num_hooks, sizeof(*ops), GFP_KERNEL);
which might have some merits beyond making kmemcheck STFU...
--------------------------------------------------------------------------