2018-03-29 03:53:11

by Dongli Zhang

[permalink] [raw]
Subject: [PATCH v2 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()

The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
the received sk_buff is malformed, that is, when the sk_buff has pattern
(skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
stack:

[ 438.652658] ------------[ cut here ]------------
[ 438.652660] kernel BUG at drivers/net/xen-netback/rx.c:325!
[ 438.652714] invalid opcode: 0000 [#1] SMP NOPTI
[ 438.652813] CPU: 0 PID: 2492 Comm: vif1.0-q0-guest Tainted: G O 4.16.0-rc6+ #1
[ 438.652896] RIP: e030:xenvif_rx_skb+0x3c2/0x5e0 [xen_netback]
[ 438.652926] RSP: e02b:ffffc90040877dc8 EFLAGS: 00010246
[ 438.652956] RAX: 0000000000000160 RBX: 0000000000000022 RCX: 0000000000000001
[ 438.652993] RDX: ffffc900402890d0 RSI: 0000000000000000 RDI: ffffc90040889000
[ 438.653029] RBP: ffff88002b460040 R08: ffffc90040877de0 R09: 0100000000000000
[ 438.653065] R10: 0000000000007ff0 R11: 0000000000000002 R12: ffffc90040889000
[ 438.653100] R13: ffffffff80000000 R14: 0000000000000022 R15: 0000000080000000
[ 438.653149] FS: 00007f15603778c0(0000) GS:ffff880030400000(0000) knlGS:0000000000000000
[ 438.653188] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 438.653219] CR2: 0000000001832a08 CR3: 0000000029c12000 CR4: 0000000000042660
[ 438.653262] Call Trace:
[ 438.653284] ? xen_hypercall_event_channel_op+0xa/0x20
[ 438.653313] xenvif_rx_action+0x41/0x80 [xen_netback]
[ 438.653341] xenvif_kthread_guest_rx+0xb2/0x2a8 [xen_netback]
[ 438.653374] ? __schedule+0x352/0x700
[ 438.653398] ? wait_woken+0x80/0x80
[ 438.653421] kthread+0xf3/0x130
[ 438.653442] ? xenvif_rx_action+0x80/0x80 [xen_netback]
[ 438.653470] ? kthread_destroy_worker+0x40/0x40
[ 438.653497] ret_from_fork+0x35/0x40

The issue is hit by xen-netback when there is bug with other networking
interface (e.g., dom0 physical NIC), who has generated and forwarded
malformed sk_buff to dom0 vifX.Y. It is possible to reproduce the issue on
purpose with below sample code in a kernel module:

skb->dev = dev; // dev of vifX.Y
skb->len = 386;
skb->data_len = 352;
skb->tail = 98;
skb->end = 384;
skb_shinfo(skb)->nr_frags = 0;
dev->netdev_ops->ndo_start_xmit(skb, dev);

This patch stops processing sk_buff immediately if it is detected as
malformed, that is, pkt->frag_iter is NULL but there is still remaining
pkt->remaining_len.

Signed-off-by: Dongli Zhang <[email protected]>

---
Changed since v1:
* return XEN_NETIF_RSP_ERROR in response to netfront

drivers/net/xen-netback/rx.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c
index b1cf7c6..a3d8ee9 100644
--- a/drivers/net/xen-netback/rx.c
+++ b/drivers/net/xen-netback/rx.c
@@ -358,6 +358,7 @@ static void xenvif_rx_data_slot(struct xenvif_queue *queue,
{
unsigned int offset = 0;
unsigned int flags;
+ bool err = false;

do {
size_t len;
@@ -369,6 +370,15 @@ static void xenvif_rx_data_slot(struct xenvif_queue *queue,
offset += len;
pkt->remaining_len -= len;

+ if (unlikely(!pkt->frag_iter && pkt->remaining_len)) {
+ pkt->remaining_len = 0;
+ pkt->extra_count = 0;
+ err = true;
+ pr_err_ratelimited("malformed sk_buff at %s\n",
+ queue->name);
+ break;
+ }
+
} while (offset < XEN_PAGE_SIZE && pkt->remaining_len > 0);

if (pkt->remaining_len > 0)
@@ -392,7 +402,7 @@ static void xenvif_rx_data_slot(struct xenvif_queue *queue,
rsp->offset = 0;
rsp->flags = flags;
rsp->id = req->id;
- rsp->status = (s16)offset;
+ rsp->status = likely(!err) ? (s16)offset : XEN_NETIF_RSP_ERROR;
}

static void xenvif_rx_extra_slot(struct xenvif_queue *queue,
--
2.7.4



2018-03-29 04:04:53

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()



On 03/28/2018 08:51 PM, Dongli Zhang wrote:
> The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
> the received sk_buff is malformed, that is, when the sk_buff has pattern
> (skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
> stack:
>
>...


>
> The issue is hit by xen-netback when there is bug with other networking
> interface (e.g., dom0 physical NIC), who has generated and forwarded
> malformed sk_buff to dom0 vifX.Y. It is possible to reproduce the issue on
> purpose with below sample code in a kernel module:
>
> skb->dev = dev; // dev of vifX.Y
> skb->len = 386;
> skb->data_len = 352;
> skb->tail = 98;
> skb->end = 384;
> skb_shinfo(skb)->nr_frags = 0;
> dev->netdev_ops->ndo_start_xmit(skb, dev);
>

This would be a serious bug in the provider of such skb.

Are you sure you do not have instead an skb with a chain of skbs ?

(skb_shinfo(skb)->frag_list would be not NULL)

Maybe your driver is wrongly advertising NETIF_F_FRAGLIST

commit 2167ca029c244901831 would be the bug origin then...




2018-03-29 04:25:06

by Dongli Zhang

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v2 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()

Hi Eric,

On 03/29/2018 12:03 PM, Eric Dumazet wrote:
>
>
> On 03/28/2018 08:51 PM, Dongli Zhang wrote:
>> The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
>> the received sk_buff is malformed, that is, when the sk_buff has pattern
>> (skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
>> stack:
>>
>> ...
>
>
>>
>> The issue is hit by xen-netback when there is bug with other networking
>> interface (e.g., dom0 physical NIC), who has generated and forwarded
>> malformed sk_buff to dom0 vifX.Y. It is possible to reproduce the issue on
>> purpose with below sample code in a kernel module:
>>
>> skb->dev = dev; // dev of vifX.Y
>> skb->len = 386;
>> skb->data_len = 352;
>> skb->tail = 98;
>> skb->end = 384;
>> skb_shinfo(skb)->nr_frags = 0;
>> dev->netdev_ops->ndo_start_xmit(skb, dev);
>>
>
> This would be a serious bug in the provider of such skb.

/nods

>
> Are you sure you do not have instead an skb with a chain of skbs ?
>
> (skb_shinfo(skb)->frag_list would be not NULL)

I am sure the skb_shinfo(skb)->frag_list is NULL.

>
> Maybe your driver is wrongly advertising NETIF_F_FRAGLIST
>
> commit 2167ca029c244901831 would be the bug origin then...

Unlike the new linux version (whose BUG_ON() does not panic the server), the
BUG_ON() in prior old kernel version would panic xen dom0 server and then people
would always blame xen paravirtual driver.

Indeed, xen-netback did not process the malformed sk_buff appropriately on rx
path. The issue is not hit with old dom0 kernel, when I am running the debug
module (as shown in below link) to generate a malformed sk_buff on purpose.

https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg03176.html

Dongli Zhang