2019-07-11 22:44:56

by Chris Packham

[permalink] [raw]
Subject: [PATCH v2] tipc: ensure head->lock is initialised

tipc_named_node_up() creates a skb list. It passes the list to
tipc_node_xmit() which has some code paths that can call
skb_queue_purge() which relies on the list->lock being initialised.

The spin_lock is only needed if the messages end up on the receive path
but when the list is created in tipc_named_node_up() we don't
necessarily know if it is going to end up there.

Once all the skb list users are updated in tipc it will then be possible
to update them to use the unlocked variants of the skb list functions
and initialise the lock when we know the message will follow the receive
path.

Signed-off-by: Chris Packham <[email protected]>
---

I'm updating our products to use the latest kernel. One change that we have that
doesn't appear to have been upstreamed is related to the following soft lockup.

NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/3:0]
Modules linked in: tipc jitterentropy_rng echainiv drbg platform_driver(O) ipifwd(PO)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: P O 4.4.6-at1 #1
task: a3054e00 ti: ac6b4000 task.ti: a307a000
NIP: 806891c4 LR: 804f5060 CTR: 804f50d0
REGS: ac6b59b0 TRAP: 0901 Tainted: P O (4.4.6-at1)
MSR: 00029002 <CE,EE,ME> CR: 84002088 XER: 20000000

GPR00: 804f50fc ac6b5a60 a3054e00 00029002 00000101 01001011 00000000 00000001
GPR08: 00021002 c1502d1c ac6b5ae4 00000000 804f50d0
NIP [806891c4] _raw_spin_lock_irqsave+0x44/0x80
LR [804f5060] skb_dequeue+0x20/0x90
Call Trace:
[ac6b5a80] [804f50fc] skb_queue_purge+0x2c/0x50
[ac6b5a90] [c1511058] tipc_node_xmit+0x138/0x170 [tipc]
[ac6b5ad0] [c1509e58] tipc_named_node_up+0x88/0xa0 [tipc]
[ac6b5b00] [c150fc1c] tipc_netlink_compat_stop+0x9bc/0xf50 [tipc]
[ac6b5b20] [c1511638] tipc_rcv+0x418/0x9b0 [tipc]
[ac6b5bc0] [c150218c] tipc_bcast_stop+0xfc/0x7b0 [tipc]
[ac6b5bd0] [80504e38] __netif_receive_skb_core+0x468/0xa10
[ac6b5c70] [805082fc] netif_receive_skb_internal+0x3c/0xe0
[ac6b5ca0] [80642a48] br_handle_frame_finish+0x1d8/0x4d0
[ac6b5d10] [80642f30] br_handle_frame+0x1f0/0x330
[ac6b5d60] [80504ec8] __netif_receive_skb_core+0x4f8/0xa10
[ac6b5e00] [805082fc] netif_receive_skb_internal+0x3c/0xe0
[ac6b5e30] [8044c868] _dpa_rx+0x148/0x5c0
[ac6b5ea0] [8044b0c8] priv_rx_default_dqrr+0x98/0x170
[ac6b5ed0] [804d1338] qman_p_poll_dqrr+0x1b8/0x240
[ac6b5f00] [8044b1c0] dpaa_eth_poll+0x20/0x60
[ac6b5f20] [805087cc] net_rx_action+0x15c/0x320
[ac6b5f80] [8002594c] __do_softirq+0x13c/0x250
[ac6b5fe0] [80025c34] irq_exit+0xb4/0xf0
[ac6b5ff0] [8000d81c] call_do_irq+0x24/0x3c
[a307be60] [80004acc] do_IRQ+0x8c/0x120
[a307be80] [8000f450] ret_from_except+0x0/0x18
--- interrupt: 501 at arch_cpu_idle+0x24/0x70

Eyeballing the code I think it can still happen since tipc_named_node_up
allocates struct sk_buff_head head on the stack so it could have arbitrary
content.

Changes in v2:
- fixup commit subject
- add more information to commit message from mailing list discussion

net/tipc/name_distr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
index 61219f0b9677..44abc8e9c990 100644
--- a/net/tipc/name_distr.c
+++ b/net/tipc/name_distr.c
@@ -190,7 +190,7 @@ void tipc_named_node_up(struct net *net, u32 dnode)
struct name_table *nt = tipc_name_table(net);
struct sk_buff_head head;

- __skb_queue_head_init(&head);
+ skb_queue_head_init(&head);

read_lock_bh(&nt->cluster_scope_lock);
named_distribute(net, &head, dnode, &nt->cluster_scope);
--
2.22.0


2019-07-12 13:37:34

by Jon Maloy

[permalink] [raw]
Subject: RE: [PATCH v2] tipc: ensure head->lock is initialised

Acked-by: Jon Maloy <[email protected]>

Actually, this was not what I meant, but it solves our problem in a simple and safe way, for now.
Later, when net-next opens, I will revert this and do it the right way, which is to change from skb_queue_purge() to __skb_queue_purge as Eric correctly noted.
That change has to be done at four locations, at least, and is too intrusive to post to 'net' now.
I'll send a cleanup patch when net-next re-opens.

BR
///jon


> -----Original Message-----
> From: Chris Packham <[email protected]>
> Sent: 11-Jul-19 18:41
> To: Jon Maloy <[email protected]>; [email protected];
> [email protected]; [email protected]
> Cc: [email protected]; [email protected]; tipc-
> [email protected]; Chris Packham
> <[email protected]>
> Subject: [PATCH v2] tipc: ensure head->lock is initialised
>
> tipc_named_node_up() creates a skb list. It passes the list to
> tipc_node_xmit() which has some code paths that can call
> skb_queue_purge() which relies on the list->lock being initialised.
>
> The spin_lock is only needed if the messages end up on the receive path but
> when the list is created in tipc_named_node_up() we don't necessarily know if
> it is going to end up there.
>
> Once all the skb list users are updated in tipc it will then be possible to update
> them to use the unlocked variants of the skb list functions and initialise the
> lock when we know the message will follow the receive path.
>
> Signed-off-by: Chris Packham <[email protected]>
> ---
>
> I'm updating our products to use the latest kernel. One change that we have
> that doesn't appear to have been upstreamed is related to the following soft
> lockup.
>
> NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/3:0]
> Modules linked in: tipc jitterentropy_rng echainiv drbg platform_driver(O)
> ipifwd(PO)
> CPU: 3 PID: 0 Comm: swapper/3 Tainted: P O 4.4.6-at1 #1
> task: a3054e00 ti: ac6b4000 task.ti: a307a000
> NIP: 806891c4 LR: 804f5060 CTR: 804f50d0
> REGS: ac6b59b0 TRAP: 0901 Tainted: P O (4.4.6-at1)
> MSR: 00029002 <CE,EE,ME> CR: 84002088 XER: 20000000
>
> GPR00: 804f50fc ac6b5a60 a3054e00 00029002 00000101 01001011
> 00000000 00000001
> GPR08: 00021002 c1502d1c ac6b5ae4 00000000 804f50d0 NIP [806891c4]
> _raw_spin_lock_irqsave+0x44/0x80 LR [804f5060] skb_dequeue+0x20/0x90
> Call Trace:
> [ac6b5a80] [804f50fc] skb_queue_purge+0x2c/0x50 [ac6b5a90] [c1511058]
> tipc_node_xmit+0x138/0x170 [tipc] [ac6b5ad0] [c1509e58]
> tipc_named_node_up+0x88/0xa0 [tipc] [ac6b5b00] [c150fc1c]
> tipc_netlink_compat_stop+0x9bc/0xf50 [tipc] [ac6b5b20] [c1511638]
> tipc_rcv+0x418/0x9b0 [tipc] [ac6b5bc0] [c150218c]
> tipc_bcast_stop+0xfc/0x7b0 [tipc] [ac6b5bd0] [80504e38]
> __netif_receive_skb_core+0x468/0xa10
> [ac6b5c70] [805082fc] netif_receive_skb_internal+0x3c/0xe0
> [ac6b5ca0] [80642a48] br_handle_frame_finish+0x1d8/0x4d0
> [ac6b5d10] [80642f30] br_handle_frame+0x1f0/0x330 [ac6b5d60]
> [80504ec8] __netif_receive_skb_core+0x4f8/0xa10
> [ac6b5e00] [805082fc] netif_receive_skb_internal+0x3c/0xe0
> [ac6b5e30] [8044c868] _dpa_rx+0x148/0x5c0 [ac6b5ea0] [8044b0c8]
> priv_rx_default_dqrr+0x98/0x170 [ac6b5ed0] [804d1338]
> qman_p_poll_dqrr+0x1b8/0x240 [ac6b5f00] [8044b1c0]
> dpaa_eth_poll+0x20/0x60 [ac6b5f20] [805087cc]
> net_rx_action+0x15c/0x320 [ac6b5f80] [8002594c]
> __do_softirq+0x13c/0x250 [ac6b5fe0] [80025c34] irq_exit+0xb4/0xf0
> [ac6b5ff0] [8000d81c] call_do_irq+0x24/0x3c [a307be60] [80004acc]
> do_IRQ+0x8c/0x120 [a307be80] [8000f450] ret_from_except+0x0/0x18
> --- interrupt: 501 at arch_cpu_idle+0x24/0x70
>
> Eyeballing the code I think it can still happen since tipc_named_node_up
> allocates struct sk_buff_head head on the stack so it could have arbitrary
> content.
>
> Changes in v2:
> - fixup commit subject
> - add more information to commit message from mailing list discussion
>
> net/tipc/name_distr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c index
> 61219f0b9677..44abc8e9c990 100644
> --- a/net/tipc/name_distr.c
> +++ b/net/tipc/name_distr.c
> @@ -190,7 +190,7 @@ void tipc_named_node_up(struct net *net, u32
> dnode)
> struct name_table *nt = tipc_name_table(net);
> struct sk_buff_head head;
>
> - __skb_queue_head_init(&head);
> + skb_queue_head_init(&head);
>
> read_lock_bh(&nt->cluster_scope_lock);
> named_distribute(net, &head, dnode, &nt->cluster_scope);
> --
> 2.22.0

2019-07-12 22:35:26

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2] tipc: ensure head->lock is initialised

From: Chris Packham <[email protected]>
Date: Fri, 12 Jul 2019 10:41:15 +1200

> tipc_named_node_up() creates a skb list. It passes the list to
> tipc_node_xmit() which has some code paths that can call
> skb_queue_purge() which relies on the list->lock being initialised.
>
> The spin_lock is only needed if the messages end up on the receive path
> but when the list is created in tipc_named_node_up() we don't
> necessarily know if it is going to end up there.
>
> Once all the skb list users are updated in tipc it will then be possible
> to update them to use the unlocked variants of the skb list functions
> and initialise the lock when we know the message will follow the receive
> path.
>
> Signed-off-by: Chris Packham <[email protected]>

Applied.