Hi Edward,
Seems like I've found another poisoned skb->next crash with
netif_receive_skb_list().
This is similar to the one than has been already fixed in 22f6bbb7bcfc
("net: use skb_list_del_init() to remove from RX sublists"). This one
however
applies only to non-standard ptypes (in my case -- ETH_P_XDSA).
I use simple VLAN NAT setup through nft. After switching my in-dev
driver to
netif_receive_skb_list(), system started to crash on forwarding:
[ 88.606777] CPU 0 Unable to handle kernel paging request at virtual
address 0000000e, epc == 80687078, ra == 8052cc7c
[ 88.618666] Oops[#1]:
[ 88.621196] CPU: 0 PID: 0 Comm: swapper Not tainted
5.1.0-rc2-dlink-00206-g4192a172-dirty #1473
[ 88.630885] $ 0 : 00000000 10000400 00000002 864d7850
[ 88.636709] $ 4 : 87c0ddf0 864d7800 87c0ddf0 00000000
[ 88.642526] $ 8 : 00000000 49600000 00000001 00000001
[ 88.648342] $12 : 00000000 c288617b dadbee27 25d17c41
[ 88.654159] $16 : 87c0ddf0 85cff080 80790000 fffffffd
[ 88.659975] $20 : 80797b20 ffffffff 00000001 864d7800
[ 88.665793] $24 : 00000000 8011e658
[ 88.671609] $28 : 80790000 87c0dbc0 87cabf00 8052cc7c
[ 88.677427] Hi : 00000003
[ 88.680622] Lo : 7b5b4220
[ 88.683840] epc : 80687078 vlan_dev_hard_start_xmit+0x1c/0x1a0
[ 88.690532] ra : 8052cc7c dev_hard_start_xmit+0xac/0x188
[ 88.696734] Status: 10000404 IEp
[ 88.700422] Cause : 50000008 (ExcCode 02)
[ 88.704874] BadVA : 0000000e
[ 88.708069] PrId : 0001a120 (MIPS interAptiv (multi))
[ 88.713005] Modules linked in:
[ 88.716407] Process swapper (pid: 0, threadinfo=(ptrval),
task=(ptrval), tls=00000000)
[ 88.725219] Stack : 85f61c28 00000000 0000000e 80780000 87c0ddf0
85cff080 80790000 8052cc7c
[ 88.734529] 87cabf00 00000000 00000001 85f5fb40 807b0000 864d7850
87cabf00 807d0000
[ 88.743839] 864d7800 8655f600 00000000 85cff080 87c1c000 0000006a
00000000 8052d96c
[ 88.753149] 807a0000 8057adb8 87c0dcc8 87c0dc50 85cfff08 00000558
87cabf00 85f58c50
[ 88.762460] 00000002 85f58c00 864d7800 80543308 fffffff4 00000001
85f58c00 864d7800
[ 88.771770] ...
[ 88.774483] Call Trace:
[ 88.777199] [<80687078>] vlan_dev_hard_start_xmit+0x1c/0x1a0
[ 88.783504] [<8052cc7c>] dev_hard_start_xmit+0xac/0x188
[ 88.789326] [<8052d96c>] __dev_queue_xmit+0x6e8/0x7d4
[ 88.794955] [<805a8640>] ip_finish_output2+0x238/0x4d0
[ 88.800677] [<805ab6a0>] ip_output+0xc8/0x140
[ 88.805526] [<805a68f4>] ip_forward+0x364/0x560
[ 88.810567] [<805a4ff8>] ip_rcv+0x48/0xe4
[ 88.815030] [<80528d44>] __netif_receive_skb_one_core+0x44/0x58
[ 88.821635] [<8067f220>] dsa_switch_rcv+0x108/0x1ac
[ 88.827067] [<80528f80>] __netif_receive_skb_list_core+0x228/0x26c
[ 88.833951] [<8052ed84>] netif_receive_skb_list+0x1d4/0x394
[ 88.840160] [<80355a88>] lunar_rx_poll+0x38c/0x828
[ 88.845496] [<8052fa78>] net_rx_action+0x14c/0x3cc
[ 88.850835] [<806ad300>] __do_softirq+0x178/0x338
[ 88.856077] [<8012a2d4>] irq_exit+0xbc/0x100
[ 88.860846] [<802f8b70>] plat_irq_dispatch+0xc0/0x144
[ 88.866477] [<80105974>] handle_int+0x14c/0x158
[ 88.871516] [<806acfb0>] r4k_wait+0x30/0x40
[ 88.876462] Code: afb10014 8c8200a0 00803025 <9443000c> 94a20468
00000000 10620042 00a08025 9605046a
[ 88.887332]
[ 88.888982] ---[ end trace eb863d007da11cf1 ]---
[ 88.894122] Kernel panic - not syncing: Fatal exception in interrupt
[ 88.901202] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt ]---
Some additional debug have showed that skb->next is poisoned on
dsa_switch_rcv()
-- ETH_P_XDSA ptype .func() callback. So when skb enters
dev_hard_start_xmit(),
function tries to "schedule" backpointer to list_head for transmitting.
Here's a working possible fix for that, not sure if it can break
anything
though.
diff --git a/net/core/dev.c b/net/core/dev.c
index 2b67f2aa59dd..fdcff29df915 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5014,8 +5014,10 @@ static inline void
__netif_receive_skb_list_ptype(struct list_head *head,
if (pt_prev->list_func != NULL)
pt_prev->list_func(head, pt_prev, orig_dev);
else
- list_for_each_entry_safe(skb, next, head, list)
+ list_for_each_entry_safe(skb, next, head, list) {
+ skb_list_del_init(skb);
pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+ }
}
static void __netif_receive_skb_list_core(struct list_head *head, bool
pfmemalloc)
Maybe you could look into this and find another/better solution (or I
could
submit this one if that's pretty enough).
BTW, great work with netif_receive_skb_list() -- I've got 70 Mbps gain
(~15%)
on my setup in comparsion to napi_gro_receive().
Thanks,
Alexander.
Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ
On 28/03/2019 10:03, Alexander Lobakin wrote:
> Here's a working possible fix for that, not sure if it can break anything
> though.
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 2b67f2aa59dd..fdcff29df915 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5014,8 +5014,10 @@ static inline void __netif_receive_skb_list_ptype(struct list_head *head,
> if (pt_prev->list_func != NULL)
> pt_prev->list_func(head, pt_prev, orig_dev);
> else
> - list_for_each_entry_safe(skb, next, head, list)
> + list_for_each_entry_safe(skb, next, head, list) {
> + skb_list_del_init(skb);
> pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> + }
> }
>
> static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemalloc)
>
> Maybe you could look into this and find another/better solution (or I could
> submit this one if that's pretty enough).
That looks like the correct fix to me, go ahead and have a
Reviewed-by: Edward Cree <[email protected]>
>
> BTW, great work with netif_receive_skb_list() -- I've got 70 Mbps gain (~15%)
> on my setup in comparsion to napi_gro_receive().
Nice!
-Ed
Edward Cree wrote 28.03.2019 14:37:
> On 28/03/2019 10:03, Alexander Lobakin wrote:
>> Here's a working possible fix for that, not sure if it can break
>> anything
>> though.
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 2b67f2aa59dd..fdcff29df915 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -5014,8 +5014,10 @@ static inline void
>> __netif_receive_skb_list_ptype(struct list_head *head,
>> if (pt_prev->list_func != NULL)
>> pt_prev->list_func(head, pt_prev, orig_dev);
>> else
>> - list_for_each_entry_safe(skb, next, head, list)
>> + list_for_each_entry_safe(skb, next, head, list) {
>> + skb_list_del_init(skb);
>> pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>> + }
>> }
>>
>> static void __netif_receive_skb_list_core(struct list_head *head, bool
>> pfmemalloc)
>>
>> Maybe you could look into this and find another/better solution (or I
>> could
>> submit this one if that's pretty enough).
> That looks like the correct fix to me, go ahead and have a
> Reviewed-by: Edward Cree <[email protected]>
Cool!
Do we need a "stable" mark in this case to propose it to LTS backports
(4.20+, I suppose)?
>> BTW, great work with netif_receive_skb_list() -- I've got 70 Mbps gain
>> (~15%)
>> on my setup in comparsion to napi_gro_receive().
> Nice!
>
> -Ed
Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ
On 28/03/2019 11:56, Alexander Lobakin wrote:
> Edward Cree wrote 28.03.2019 14:37:
>> On 28/03/2019 10:03, Alexander Lobakin wrote:
>>> Here's a working possible fix for that, not sure if it can break anything
>>> though.
>>>
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 2b67f2aa59dd..fdcff29df915 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -5014,8 +5014,10 @@ static inline void __netif_receive_skb_list_ptype(struct list_head *head,
>>> if (pt_prev->list_func != NULL)
>>> pt_prev->list_func(head, pt_prev, orig_dev);
>>> else
>>> - list_for_each_entry_safe(skb, next, head, list)
>>> + list_for_each_entry_safe(skb, next, head, list) {
>>> + skb_list_del_init(skb);
>>> pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>>> + }
>>> }
>>>
>>> static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemalloc)
>>>
>>> Maybe you could look into this and find another/better solution (or I could
>>> submit this one if that's pretty enough).
>> That looks like the correct fix to me, go ahead and have a
>> Reviewed-by: Edward Cree <[email protected]>
>
> Cool!
> Do we need a "stable" mark in this case to propose it to LTS backports (4.20+, I suppose)?
Probably add a
Fixes: 88eb1944e18c ("net: core: propagate SKB lists through packet_type lookup")
(then Dave will take care of sending it to -stable.)
-Ed