V2: just check napi->dev->netdev_ops instead of getting clever with the
netdev registration state.
Original cover letter:
Hi Dave,
I stumbled across a reproducible kernel panic while playing around with
busy_poll on a Linux 4.9.86 kernel. There's an unfortunate interaction
between init_dummy_netdev, which doesn't bother to fill in netdev_ops, and
sk_busy_loop, which assumed netdev_ops is a valid pointer.
To reproduce on the device under test (DUT), I did:
$ ip addr show dev wlan0
8: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq [...]
inet 172.16.122.6/23 brd 172.16.123.255 scope global wlan0
$ sysctl -w net.core.busy_read=50
$ nc -l 172.16.122.6 5001
Then transmitted some data to this socket from a second host:
$ echo "foo" | nc 172.16.122.6 5001
The DUT immediately hits a kernel panic.
I've attached a patch that applies cleanly to the 4.9.87 stable release.
This fix isn't necessary for net/net-next (ndo_busy_poll was removed in
linux-4.11), but a further backport of this commit is likely required for
any stable releases older than linux-4.5.
I hope this is the right way to raise something like this. I couldn't find
a clear answer from the -stable and netdev on how to handle bugs in features
that no longer exist in mainline.
Thanks,
Josh
init_dummy_netdev() leaves its netdev_ops pointer zeroed. This leads
to a NULL pointer dereference when sk_busy_loop fires against an iwlwifi
wireless adapter and checks napi->dev->netdev_ops->ndo_busy_poll.
Avoid this by ensuring napi->dev->netdev_ops is valid before following
the pointer, avoiding the following panic when busy polling on a dummy
netdev:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
IP: [<ffffffff817b4b72>] sk_busy_loop+0x92/0x2f0
Call Trace:
[<ffffffff815a3134>] ? uart_write_room+0x74/0xf0
[<ffffffff817964a9>] sock_poll+0x99/0xa0
[<ffffffff81223142>] do_sys_poll+0x2e2/0x520
[<ffffffff8118d3fc>] ? get_page_from_freelist+0x3bc/0xa30
[<ffffffff810ada22>] ? update_curr+0x62/0x140
[<ffffffff811ea671>] ? __slab_free+0xa1/0x2a0
[<ffffffff811ea671>] ? __slab_free+0xa1/0x2a0
[<ffffffff8179dbb1>] ? skb_free_head+0x21/0x30
[<ffffffff81221bd0>] ? poll_initwait+0x50/0x50
[<ffffffff811eaa36>] ? kmem_cache_free+0x1c6/0x1e0
[<ffffffff815a4884>] ? uart_write+0x124/0x1d0
[<ffffffff810bd1cd>] ? remove_wait_queue+0x4d/0x60
[<ffffffff810bd224>] ? __wake_up+0x44/0x50
[<ffffffff81582731>] ? tty_write_unlock+0x31/0x40
[<ffffffff8158c5c6>] ? tty_ldisc_deref+0x16/0x20
[<ffffffff81584820>] ? tty_write+0x1e0/0x2f0
[<ffffffff81587e50>] ? process_echoes+0x80/0x80
[<ffffffff8120c17b>] ? __vfs_write+0x2b/0x130
[<ffffffff8120d09a>] ? vfs_write+0x15a/0x1a0
[<ffffffff81223455>] SyS_poll+0x75/0x100
[<ffffffff819a6524>] entry_SYSCALL_64_fastpath+0x24/0xcf
Commit 79e7fff47b7b ("net: remove support for per driver ndo_busy_poll()")
indirectly fixed this upstream in linux-4.11 by removing the offending
pointer usage. No other users of napi->dev touch its netdev_ops.
Fixes: 060212928670 ("net: add low latency socket poll")
Fixes: ce6aea93f751 ("net: network drivers no longer need to implement ndo_busy_poll()") - 4.9.y
Signed-off-by: Josh Elsasser <[email protected]>
---
net/core/dev.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 8898618bf341..1f50c131ed15 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5042,7 +5042,10 @@ bool sk_busy_loop(struct sock *sk, int nonblock)
goto out;
/* Note: ndo_busy_poll method is optional in linux-4.5 */
- busy_poll = napi->dev->netdev_ops->ndo_busy_poll;
+ if (napi->dev->netdev_ops)
+ busy_poll = napi->dev->netdev_ops->ndo_busy_poll;
+ else
+ busy_poll = NULL;
do {
rc = 0;
--
2.11.0
On 03/12/2018 10:32 PM, Josh Elsasser wrote:
> init_dummy_netdev() leaves its netdev_ops pointer zeroed. This leads
> to a NULL pointer dereference when sk_busy_loop fires against an iwlwifi
> wireless adapter and checks napi->dev->netdev_ops->ndo_busy_poll.
>
> Avoid this by ensuring napi->dev->netdev_ops is valid before following
> the pointer, avoiding the following panic when busy polling on a dummy
> netdev:
>
>
> Fixes: 060212928670 ("net: add low latency socket poll")
> Fixes: ce6aea93f751 ("net: network drivers no longer need to implement ndo_busy_poll()") - 4.9.y
> Signed-off-by: Josh Elsasser <[email protected]>
> ---
> net/core/dev.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8898618bf341..1f50c131ed15 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5042,7 +5042,10 @@ bool sk_busy_loop(struct sock *sk, int nonblock)
> goto out;
>
> /* Note: ndo_busy_poll method is optional in linux-4.5 */
> - busy_poll = napi->dev->netdev_ops->ndo_busy_poll;
> + if (napi->dev->netdev_ops)
> + busy_poll = napi->dev->netdev_ops->ndo_busy_poll;
> + else
> + busy_poll = NULL;
>
> do {
> rc = 0;
>
We could instead setup a non NULL netdev_ops pointer on these 'dummy'
devices to not add a check in fast path, but I presume we do
not really care since this fix is for old kernels, and considering how
long it took to discover this bug.
Reviewed-by: Eric Dumazet <[email protected]>
From: Josh Elsasser <[email protected]>
Date: Mon, 12 Mar 2018 22:32:00 -0700
> init_dummy_netdev() leaves its netdev_ops pointer zeroed. This leads
> to a NULL pointer dereference when sk_busy_loop fires against an iwlwifi
> wireless adapter and checks napi->dev->netdev_ops->ndo_busy_poll.
>
> Avoid this by ensuring napi->dev->netdev_ops is valid before following
> the pointer, avoiding the following panic when busy polling on a dummy
> netdev:
...
> Commit 79e7fff47b7b ("net: remove support for per driver ndo_busy_poll()")
> indirectly fixed this upstream in linux-4.11 by removing the offending
> pointer usage. No other users of napi->dev touch its netdev_ops.
>
> Fixes: 060212928670 ("net: add low latency socket poll")
> Fixes: ce6aea93f751 ("net: network drivers no longer need to implement ndo_busy_poll()") - 4.9.y
> Signed-off-by: Josh Elsasser <[email protected]>
Ok, queued up for -stable, thanks.