2013-06-19 01:46:15

by Linus Torvalds

[permalink] [raw]
Subject: nl80211 NULL pointer dereference

Hmm. Maybe this is old, but I don't think I've seen it before (who
knows, maybe it has killed the machine before, I had a hard hang the
other day).

It's a NULL pointer dereference in nl80211_set_reg() on my Pixel. The
machine kind of stayed up afterwards, although with no working
wireless, and it would not shut down cleanly presumably due to locks
held etc.

Any ideas? I'm including the few wireless-related messages that
happened justr before the oops. Being a pixel, this is with the ath9k
driver.

Linus

---
wlp1s0: authenticate with 00:c0:23:ba:27:40
wlp1s0: send auth to 00:c0:23:ba:27:40 (try 1/3)
wlp1s0: authenticated
ath9k 0000:01:00.0 wlp1s0: disabling HT as WMM/QoS is not supported by the AP
ath9k 0000:01:00.0 wlp1s0: disabling VHT as WMM/QoS is not supported by the AP
wlp1s0: associate with 00:c0:23:ba:27:40 (try 1/3)
wlp1s0: RX AssocResp from 00:c0:23:ba:27:40 (capab=0x501 status=0 aid=4)
wlp1s0: associated
cfg80211: Calling CRDA for country: US

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa02a77d3>] nl80211_set_reg+0x113/0x2c0 [cfg80211]
PGD 1459c3067 PUD 10f6fa067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: ftdi_sio tpm_tis tpm tpm_bios usb_storage fuse
ebtable_nat nf_conntrack_netbios_ns nf_conntrack_broadcast
ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT
nf_conntra
media chromeos_laptop snd_timer snd microcode lpc_ich rfkill
soundcore mfd_core i2c_i801 uinput binfmt_misc dm_crypt i915
i2c_algo_bit drm_kms_helper drm crc32_pclmul crc32c_intel
ghash_clmulni_intel i2
CPU: 1 PID: 4859 Comm: crda Not tainted 3.10.0-rc6 #2
Hardware name: GOOGLE Link, BIOS 12/10/2012
RIP: 0010:[<ffffffffa02a77d3>] [<ffffffffa02a77d3>]
nl80211_set_reg+0x113/0x2c0 [cfg80211]
RSP: 0018:ffff8801277779f0 EFLAGS: 00010202
RAX: ffff8801456b0000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00000000000000c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff880127777a58 R08: 0000000000015d40 R09: ffff880141c8ecc0
R10: ffffffffa02a779a R11: 0000000000000004 R12: 0000000000000000
R13: ffff880141c8ecc0 R14: ffff88013af8d414 R15: ffff880127777a80
FS: 00007f2c82fb5740(0000) GS:ffff88014f280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001459b2000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff81531b44>] genl_family_rcv_msg+0x1f4/0x2e0
[<ffffffff81531cc1>] genl_rcv_msg+0x91/0xd0
[<ffffffff81531339>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff81531758>] genl_rcv+0x28/0x40
[<ffffffff81530d62>] netlink_unicast+0x142/0x1f0
[<ffffffff815310ad>] netlink_sendmsg+0x29d/0x370
[<ffffffff814f22e9>] sock_sendmsg+0x99/0xd0
[<ffffffff814f270e>] ___sys_sendmsg+0x39e/0x3b0
[<ffffffff814f34f2>] __sys_sendmsg+0x42/0x80
[<ffffffff814f3542>] SyS_sendmsg+0x12/0x20
[<ffffffff81615e42>] system_call_fastpath+0x16/0x1b
Code: 60 10 41 0f b6 46 04 0f b6 fb 41 88 45 14 41 0f b6 46 05 41 88
45 15 e8 8c c5 fe ff 84 c0 75 68 49 8b 47 20 4c 8b a0 10 01 00 00 <45>
0f b7 34 24 41 83 ee 04 41 83 fe 03 7e 0e 41 0f b7 44 24 04
RIP [<ffffffffa02a77d3>] nl80211_set_reg+0x113/0x2c0 [cfg80211]
RSP <ffff8801277779f0>
CR2: 0000000000000000


2013-06-19 17:00:36

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] nl80211: fix attrbuf access race by allocating a separate one

On Wed, 2013-06-19 at 09:57 -0700, Ben Greear wrote:
> On 06/19/2013 01:23 AM, Johannes Berg wrote:
> > From: Johannes Berg <[email protected]>
> >
> > Since my commit 3713b4e364, nl80211_dump_wiphy() uses the global

> The commit mentioned above (3713b4e364) is in 3.9.6,

It isn't according to my git. :-)

johannes


2013-06-19 08:24:05

by Johannes Berg

[permalink] [raw]
Subject: [PATCH] nl80211: fix attrbuf access race by allocating a separate one

From: Johannes Berg <[email protected]>

Since my commit 3713b4e364, nl80211_dump_wiphy() uses the global
nl80211_fam.attrbuf for parsing the incoming data. This wouldn't
be a problem if it only did so on the first dump iteration which
is locked against other commands in generic netlink, but due to
space constraints in cb->args (the needed state doesn't fit) I
decided to always parse the original message. That's racy though
since nl80211_fam.attrbuf could be used by some other parsing in
generic netlink concurrently.

For now, fix this by allocating a separate parse buffer (it's a
bit too big for the stack, currently 1448 bytes on 64-bit). For
-next, I'll change the code to parse into the global buffer in
the first round only and then allocate a smaller buffer to keep
the state in cb->args.

Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
---
Let me know if you want to apply this directly, otherwise I'll send it
on its way to John.

net/wireless/nl80211.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d5aed3b..b14b7e3 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -1564,12 +1564,17 @@ static int nl80211_dump_wiphy(struct sk_buff *skb, struct netlink_callback *cb)
struct cfg80211_registered_device *dev;
s64 filter_wiphy = -1;
bool split = false;
- struct nlattr **tb = nl80211_fam.attrbuf;
+ struct nlattr **tb;
int res;

+ /* will be zeroed in nlmsg_parse() */
+ tb = kmalloc(sizeof(*tb) * (NL80211_ATTR_MAX + 1), GFP_KERNEL);
+ if (!tb)
+ return -ENOMEM;
+
mutex_lock(&cfg80211_mutex);
res = nlmsg_parse(cb->nlh, GENL_HDRLEN + nl80211_fam.hdrsize,
- tb, nl80211_fam.maxattr, nl80211_policy);
+ tb, NL80211_ATTR_MAX, nl80211_policy);
if (res == 0) {
split = tb[NL80211_ATTR_SPLIT_WIPHY_DUMP];
if (tb[NL80211_ATTR_WIPHY])
@@ -1583,6 +1588,7 @@ static int nl80211_dump_wiphy(struct sk_buff *skb, struct netlink_callback *cb)
netdev = dev_get_by_index(sock_net(skb->sk), ifidx);
if (!netdev) {
mutex_unlock(&cfg80211_mutex);
+ kfree(tb);
return -ENODEV;
}
if (netdev->ieee80211_ptr) {
@@ -1593,6 +1599,7 @@ static int nl80211_dump_wiphy(struct sk_buff *skb, struct netlink_callback *cb)
dev_put(netdev);
}
}
+ kfree(tb);

list_for_each_entry(dev, &cfg80211_rdev_list, list) {
if (!net_eq(wiphy_net(&dev->wiphy), sock_net(skb->sk)))
--
1.8.0




2013-06-19 16:57:40

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] nl80211: fix attrbuf access race by allocating a separate one

On 06/19/2013 01:23 AM, Johannes Berg wrote:
> From: Johannes Berg <[email protected]>
>
> Since my commit 3713b4e364, nl80211_dump_wiphy() uses the global
> nl80211_fam.attrbuf for parsing the incoming data. This wouldn't
> be a problem if it only did so on the first dump iteration which
> is locked against other commands in generic netlink, but due to
> space constraints in cb->args (the needed state doesn't fit) I
> decided to always parse the original message. That's racy though
> since nl80211_fam.attrbuf could be used by some other parsing in
> generic netlink concurrently.
>
> For now, fix this by allocating a separate parse buffer (it's a
> bit too big for the stack, currently 1448 bytes on 64-bit). For
> -next, I'll change the code to parse into the global buffer in
> the first round only and then allocate a smaller buffer to keep
> the state in cb->args.

The commit mentioned above (3713b4e364) is in 3.9.6, but this patch
doesn't come close to applying on my 3.9.6.

Do you happen to know if this should be backported to 3.9 stable or not?

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2013-06-19 02:06:35

by David Miller

[permalink] [raw]
Subject: Re: nl80211 NULL pointer dereference

From: Linus Torvalds <[email protected]>
Date: Tue, 18 Jun 2013 15:46:13 -1000

> Hmm. Maybe this is old, but I don't think I've seen it before (who
> knows, maybe it has killed the machine before, I had a hard hang the
> other day).
>
> It's a NULL pointer dereference in nl80211_set_reg() on my Pixel. The
> machine kind of stayed up afterwards, although with no working
> wireless, and it would not shut down cleanly presumably due to locks
> held etc.
>
> Any ideas? I'm including the few wireless-related messages that
> happened justr before the oops. Being a pixel, this is with the ath9k
> driver.

nl80211_set_reg() is really careful about validating which netlink
attributes the user has specified, and either not dereferencing or
signalling an error when NULL is seen.

Hmmm...

2013-06-19 13:44:34

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: [PATCH] nl80211: fix attrbuf access race by allocating a separate one

Hello.

On 19-06-2013 12:23, Johannes Berg wrote:

> From: Johannes Berg <[email protected]>

> Since my commit 3713b4e364, nl80211_dump_wiphy() uses the global

Please also specify that commit's summary line in parens.

> nl80211_fam.attrbuf for parsing the incoming data. This wouldn't
> be a problem if it only did so on the first dump iteration which
> is locked against other commands in generic netlink, but due to
> space constraints in cb->args (the needed state doesn't fit) I
> decided to always parse the original message. That's racy though
> since nl80211_fam.attrbuf could be used by some other parsing in
> generic netlink concurrently.

> For now, fix this by allocating a separate parse buffer (it's a
> bit too big for the stack, currently 1448 bytes on 64-bit). For
> -next, I'll change the code to parse into the global buffer in
> the first round only and then allocate a smaller buffer to keep
> the state in cb->args.

> Reported-by: Linus Torvalds <[email protected]>
> Signed-off-by: Johannes Berg <[email protected]>

WBR, Sergei


2013-06-19 08:39:01

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] nl80211: fix attrbuf access race by allocating a separate one

From: Johannes Berg <[email protected]>
Date: Wed, 19 Jun 2013 10:23:58 +0200

> From: Johannes Berg <[email protected]>
>
> Since my commit 3713b4e364, nl80211_dump_wiphy() uses the global
> nl80211_fam.attrbuf for parsing the incoming data. This wouldn't
> be a problem if it only did so on the first dump iteration which
> is locked against other commands in generic netlink, but due to
> space constraints in cb->args (the needed state doesn't fit) I
> decided to always parse the original message. That's racy though
> since nl80211_fam.attrbuf could be used by some other parsing in
> generic netlink concurrently.
>
> For now, fix this by allocating a separate parse buffer (it's a
> bit too big for the stack, currently 1448 bytes on 64-bit). For
> -next, I'll change the code to parse into the global buffer in
> the first round only and then allocate a smaller buffer to keep
> the state in cb->args.
>
> Reported-by: Linus Torvalds <[email protected]>
> Signed-off-by: Johannes Berg <[email protected]>

Acked-by: David S. Miller <[email protected]>

2013-06-19 07:54:59

by Johannes Berg

[permalink] [raw]
Subject: Re: nl80211 NULL pointer dereference

On Tue, 2013-06-18 at 16:24 -1000, Linus Torvalds wrote:

> So it would seem that it's that
>
> info->attrs[NL80211_ATTR_REG_RULES]
>
> thing that is NULL.
>
> And yes, the code checks that for being non-NULL in at the top of the
> function, but maybe there is a race with something else setting it to
> NULL? There is a kzalloc(GFP_KERNEL) in between, so it doesn't even
> have to be a very small race...

Yes. I looked at it, and reproduced it (after making the window larger
by putting some sleeps in there and WARN_ON()). It's really just a
stupid mistake I made: in nl80211_dump_wiphy() I parse attributes into
the global nl80211_fam.attrbuf, without making sure that it has proper
locking. Normally we do something like that only on the first iteration
of a dump which is OK because it's locked, but here I did it always,
which is clearly a bug.

I'll have a patch in a minute.

johannes




2013-06-19 14:00:11

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] nl80211: fix attrbuf access race by allocating a separate one

On Wed, Jun 19, 2013 at 01:39:00AM -0700, David Miller wrote:
> From: Johannes Berg <[email protected]>
> Date: Wed, 19 Jun 2013 10:23:58 +0200
>
> > From: Johannes Berg <[email protected]>
> >
> > Since my commit 3713b4e364, nl80211_dump_wiphy() uses the global
> > nl80211_fam.attrbuf for parsing the incoming data. This wouldn't
> > be a problem if it only did so on the first dump iteration which
> > is locked against other commands in generic netlink, but due to
> > space constraints in cb->args (the needed state doesn't fit) I
> > decided to always parse the original message. That's racy though
> > since nl80211_fam.attrbuf could be used by some other parsing in
> > generic netlink concurrently.
> >
> > For now, fix this by allocating a separate parse buffer (it's a
> > bit too big for the stack, currently 1448 bytes on 64-bit). For
> > -next, I'll change the code to parse into the global buffer in
> > the first round only and then allocate a smaller buffer to keep
> > the state in cb->args.
> >
> > Reported-by: Linus Torvalds <[email protected]>
> > Signed-off-by: Johannes Berg <[email protected]>
>
> Acked-by: David S. Miller <[email protected]>

Acked-by: John W. Linville <[email protected]>

--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2013-06-19 16:26:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] nl80211: fix attrbuf access race by allocating a separate one

On Tue, Jun 18, 2013 at 10:23 PM, Johannes Berg
<[email protected]> wrote:
>
> Let me know if you want to apply this directly, otherwise I'll send it
> on its way to John.

The problem doesn't happen often enough to make this an acute issue
for me, so this patch might as well go through the normal channels.

Thanks,

Linus

2013-06-19 02:24:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: nl80211 NULL pointer dereference

On Tue, Jun 18, 2013 at 4:06 PM, David Miller <[email protected]> wrote:
>
> nl80211_set_reg() is really careful about validating which netlink
> attributes the user has specified, and either not dereferencing or
> signalling an error when NULL is seen.
>
> Hmmm...

The code disassembles to

0: 41 0f b6 46 04 movzbl 0x4(%r14),%eax
5: 0f b6 fb movzbl %bl,%edi
8: 41 88 45 14 mov %al,0x14(%r13)
c: 41 0f b6 46 05 movzbl 0x5(%r14),%eax
11: 41 88 45 15 mov %al,0x15(%r13)
15: e8 8c c5 fe ff callq 0xfffffffffffec5a6
1a: 84 c0 test %al,%al
1c: 75 68 jne 0x86
1e: 49 8b 47 20 mov 0x20(%r15),%rax
22: 4c 8b a0 10 01 00 00 mov 0x110(%rax),%r12
29:* 45 0f b7 34 24 movzwl (%r12),%r14d <-- trapping instruction
2e: 41 83 ee 04 sub $0x4,%r14d
32: 41 83 fe 03 cmp $0x3,%r14d
36: 7e 0e jle 0x46
38: 41 0f b7 44 24 04 movzwl 0x4(%r12),%eax

(I deleted the two first bytes. they were part of an incomplete
preceding instruction). The "call/test/jne" seems to be this paert:

/*
* Disable DFS master mode if the DFS region was
* not supported or known on this kernel.
*/
if (reg_supported_dfs_region(dfs_region))
rd->dfs_region = dfs_region;

and then the two moves into %rax an %r12 seem to be setting up for

nla_for_each_nested(nl_reg_rule, info->attrs[NL80211_ATTR_REG_RULES],

and then the movzwl (that traps) and the subsequent subtract seems to
be the inlined nla_len(nla), which is the "len" to the
nla_for_each_nested() -> nla_for_each_attr() macro expansion.

So it would seem that it's that

info->attrs[NL80211_ATTR_REG_RULES]

thing that is NULL.

And yes, the code checks that for being non-NULL in at the top of the
function, but maybe there is a race with something else setting it to
NULL? There is a kzalloc(GFP_KERNEL) in between, so it doesn't even
have to be a very small race...

Hmm? I really don't know this code at all, so I'm just looking at the
source and lining up the oops...

Linus

2013-06-19 07:47:14

by David Miller

[permalink] [raw]
Subject: Re: nl80211 NULL pointer dereference

From: Linus Torvalds <[email protected]>
Date: Tue, 18 Jun 2013 16:24:57 -1000

> And yes, the code checks that for being non-NULL in at the top of the
> function, but maybe there is a race with something else setting it to
> NULL? There is a kzalloc(GFP_KERNEL) in between, so it doesn't even
> have to be a very small race...

The nl80211 code uses a flag for each netlink command to determine
whether the RTNL mutex should be held across the operation.

This is handled in the pre_doit and post_doit methods implemented
in nl80211.c.

And this operation, in fact, just so happens to be one that doesn't
have the "take the RTNL mutex" flag set.

But for internal consistency of the netlink message itself, the RTNL
mutex should not matter. It's in a private SKB buffer which is in use
only by the ->doit() method.


2013-06-19 17:04:33

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] nl80211: fix attrbuf access race by allocating a separate one

On 06/19/2013 10:00 AM, Johannes Berg wrote:
> On Wed, 2013-06-19 at 09:57 -0700, Ben Greear wrote:
>> On 06/19/2013 01:23 AM, Johannes Berg wrote:
>>> From: Johannes Berg <[email protected]>
>>>
>>> Since my commit 3713b4e364, nl80211_dump_wiphy() uses the global
>
>> The commit mentioned above (3713b4e364) is in 3.9.6,
>
> It isn't according to my git. :-)

Ahh, I was thinking that if 'git show foo' showed it, it was in that branch,
but I suppose that is not actually the case.

Sorry about that.

Thanks,
Ben

>
> johannes
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com