2021-05-19 17:41:46

by Xianting Tian

[permalink] [raw]
Subject: [PATCH] virtio_net: Remove BUG() to aviod machine dead

When met error, we output a print to avoid a BUG().

Signed-off-by: Xianting Tian <[email protected]>
---
drivers/net/virtio_net.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c921ebf3ae82..a66174d13e81 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1647,9 +1647,8 @@ static int xmit_skb(struct send_queue *sq, struct
sk_buff *skb)
hdr = skb_vnet_hdr(skb);

if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
- virtio_is_little_endian(vi->vdev), false,
- 0))
- BUG();
+ virtio_is_little_endian(vi->vdev), false, 0))
+ return -EPROTO;

if (vi->mergeable_rx_bufs)
hdr->num_buffers = 0;
--
2.17.1


2021-05-19 17:44:34

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] virtio_net: Remove BUG() to aviod machine dead

typo in subject

On Tue, May 18, 2021 at 05:46:56PM +0800, Xianting Tian wrote:
> When met error, we output a print to avoid a BUG().
>
> Signed-off-by: Xianting Tian <[email protected]>
> ---
> drivers/net/virtio_net.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c921ebf3ae82..a66174d13e81 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1647,9 +1647,8 @@ static int xmit_skb(struct send_queue *sq, struct
> sk_buff *skb)
> hdr = skb_vnet_hdr(skb);
>
> if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
> - virtio_is_little_endian(vi->vdev), false,
> - 0))
> - BUG();
> + virtio_is_little_endian(vi->vdev), false, 0))
> + return -EPROTO;
>

why EPROTO? can you add some comments to explain what is going on pls?

is this related to a malicious hypervisor thing?

don't we want at least a WARN_ON? Or _ONCE?

> if (vi->mergeable_rx_bufs)
> hdr->num_buffers = 0;
> --
> 2.17.1


2021-05-19 20:05:41

by Xianting Tian

[permalink] [raw]
Subject: Re: [PATCH] virtio_net: Remove BUG() to aviod machine dead

thanks, I submit the patch as commented by Andrew
https://lkml.org/lkml/2021/5/18/256

Actually, if xmit_skb() returns error, below code will give a warning
with error code.

/* Try to transmit */
err = xmit_skb(sq, skb);

/* This should not happen! */
if (unlikely(err)) {
dev->stats.tx_fifo_errors++;
if (net_ratelimit())
dev_warn(&dev->dev,
"Unexpected TXQ (%d) queue failure: %d\n",
qnum, err);
dev->stats.tx_dropped++;
dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
}





?? 2021/5/18 ????5:54, Michael S. Tsirkin д??:
> typo in subject
>
> On Tue, May 18, 2021 at 05:46:56PM +0800, Xianting Tian wrote:
>> When met error, we output a print to avoid a BUG().
>>
>> Signed-off-by: Xianting Tian <[email protected]>
>> ---
>> drivers/net/virtio_net.c | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index c921ebf3ae82..a66174d13e81 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1647,9 +1647,8 @@ static int xmit_skb(struct send_queue *sq, struct
>> sk_buff *skb)
>> hdr = skb_vnet_hdr(skb);
>>
>> if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
>> - virtio_is_little_endian(vi->vdev), false,
>> - 0))
>> - BUG();
>> + virtio_is_little_endian(vi->vdev), false, 0))
>> + return -EPROTO;
>>
>
> why EPROTO? can you add some comments to explain what is going on pls?
>
> is this related to a malicious hypervisor thing?
>
> don't we want at least a WARN_ON? Or _ONCE?
>
>> if (vi->mergeable_rx_bufs)
>> hdr->num_buffers = 0;
>> --
>> 2.17.1

2021-05-20 07:37:35

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH] virtio_net: Remove BUG() to aviod machine dead

If you need to respin, there is a typo in the title s/aviod/avoid/

On Tue, May 18, 2021 at 05:46:56PM +0800, Xianting Tian wrote:
>When met error, we output a print to avoid a BUG().
>
>Signed-off-by: Xianting Tian <[email protected]>
>---
> drivers/net/virtio_net.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>index c921ebf3ae82..a66174d13e81 100644
>--- a/drivers/net/virtio_net.c
>+++ b/drivers/net/virtio_net.c
>@@ -1647,9 +1647,8 @@ static int xmit_skb(struct send_queue *sq,
>struct sk_buff *skb)
> hdr = skb_vnet_hdr(skb);
>
> if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
>- virtio_is_little_endian(vi->vdev), false,
>- 0))
>- BUG();
>+ virtio_is_little_endian(vi->vdev), false, 0))
^
This change is not related.

>+ return -EPROTO;
>
> if (vi->mergeable_rx_bufs)
> hdr->num_buffers = 0;
>--
>2.17.1
>

2021-05-25 06:20:57

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] virtio_net: Remove BUG() to aviod machine dead


在 2021/5/19 下午10:18, Xianting Tian 写道:
> thanks, I submit the patch as commented by Andrew
> https://lkml.org/lkml/2021/5/18/256
>
> Actually, if xmit_skb() returns error, below code will give a warning
> with error code.
>
>     /* Try to transmit */
>     err = xmit_skb(sq, skb);
>
>     /* This should not happen! */
>     if (unlikely(err)) {
>         dev->stats.tx_fifo_errors++;
>         if (net_ratelimit())
>             dev_warn(&dev->dev,
>                  "Unexpected TXQ (%d) queue failure: %d\n",
>                  qnum, err);
>         dev->stats.tx_dropped++;
>         dev_kfree_skb_any(skb);
>         return NETDEV_TX_OK;
>     }
>
>
>
>
>
> 在 2021/5/18 下午5:54, Michael S. Tsirkin 写道:
>> typo in subject
>>
>> On Tue, May 18, 2021 at 05:46:56PM +0800, Xianting Tian wrote:
>>> When met error, we output a print to avoid a BUG().


So you don't explain why you need to remove BUG(). I think it deserve a
BUG().


>>>
>>> Signed-off-by: Xianting Tian <[email protected]>
>>> ---
>>>   drivers/net/virtio_net.c | 5 ++---
>>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index c921ebf3ae82..a66174d13e81 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -1647,9 +1647,8 @@ static int xmit_skb(struct send_queue *sq, struct
>>> sk_buff *skb)
>>>           hdr = skb_vnet_hdr(skb);
>>>
>>>       if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
>>> -                    virtio_is_little_endian(vi->vdev), false,
>>> -                    0))
>>> -        BUG();
>>> +                virtio_is_little_endian(vi->vdev), false, 0))
>>> +        return -EPROTO;
>>>
>>
>> why EPROTO? can you add some comments to explain what is going on pls?
>>
>> is this related to a malicious hypervisor thing?


I think not if I was not wrong.

Each sources (either userspace or device), the skb should be built
through virtio_net_hdr_to_skb() which means the validation has already
been done.

If we it fails here, it's a real bug.

Thanks


>>
>> don't we want at least a WARN_ON? Or _ONCE?
>>
>>>       if (vi->mergeable_rx_bufs)
>>>           hdr->num_buffers = 0;
>>> --
>>> 2.17.1
>

2021-06-02 08:12:44

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH] virtio_net: Remove BUG() to aviod machine dead

On Tue, May 25, 2021 at 02:19:03PM +0800, Jason Wang wrote:
>
> 在 2021/5/19 下午10:18, Xianting Tian 写道:
> > thanks, I submit the patch as commented by Andrew
> > https://lkml.org/lkml/2021/5/18/256
> >
> > Actually, if xmit_skb() returns error, below code will give a warning
> > with error code.
> >
> >     /* Try to transmit */
> >     err = xmit_skb(sq, skb);
> >
> >     /* This should not happen! */
> >     if (unlikely(err)) {
> >         dev->stats.tx_fifo_errors++;
> >         if (net_ratelimit())
> >             dev_warn(&dev->dev,
> >                  "Unexpected TXQ (%d) queue failure: %d\n",
> >                  qnum, err);
> >         dev->stats.tx_dropped++;
> >         dev_kfree_skb_any(skb);
> >         return NETDEV_TX_OK;
> >     }
> >
> >
> >
> >
> >
> > 在 2021/5/18 下午5:54, Michael S. Tsirkin 写道:
> > > typo in subject
> > >
> > > On Tue, May 18, 2021 at 05:46:56PM +0800, Xianting Tian wrote:
> > > > When met error, we output a print to avoid a BUG().
>
>
> So you don't explain why you need to remove BUG(). I think it deserve a
> BUG().

BUG() will crash the machine and virtio_net is not kernel core
functionality that must stop the machine to prevent anything truly
harmful and basic.

I would argue that code in drivers/* shouldn't call BUG() macros at all.

If it is impossible, don't check for that or add WARN_ON() and recover,
but don't crash whole system.

Thanks

2021-06-02 11:44:52

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] virtio_net: Remove BUG() to aviod machine dead


在 2021/6/2 下午1:59, Leon Romanovsky 写道:
> On Tue, May 25, 2021 at 02:19:03PM +0800, Jason Wang wrote:
>> 在 2021/5/19 下午10:18, Xianting Tian 写道:
>>> thanks, I submit the patch as commented by Andrew
>>> https://lkml.org/lkml/2021/5/18/256
>>>
>>> Actually, if xmit_skb() returns error, below code will give a warning
>>> with error code.
>>>
>>>     /* Try to transmit */
>>>     err = xmit_skb(sq, skb);
>>>
>>>     /* This should not happen! */
>>>     if (unlikely(err)) {
>>>         dev->stats.tx_fifo_errors++;
>>>         if (net_ratelimit())
>>>             dev_warn(&dev->dev,
>>>                  "Unexpected TXQ (%d) queue failure: %d\n",
>>>                  qnum, err);
>>>         dev->stats.tx_dropped++;
>>>         dev_kfree_skb_any(skb);
>>>         return NETDEV_TX_OK;
>>>     }
>>>
>>>
>>>
>>>
>>>
>>> 在 2021/5/18 下午5:54, Michael S. Tsirkin 写道:
>>>> typo in subject
>>>>
>>>> On Tue, May 18, 2021 at 05:46:56PM +0800, Xianting Tian wrote:
>>>>> When met error, we output a print to avoid a BUG().
>>
>> So you don't explain why you need to remove BUG(). I think it deserve a
>> BUG().
> BUG() will crash the machine and virtio_net is not kernel core
> functionality that must stop the machine to prevent anything truly
> harmful and basic.


Note that the BUG() here is not for virtio-net itself. It tells us that
a bug was found by virtio-net.

That is, the one that produces the skb has a bug, usually it's the
network core.

There could also be the issue of the packet from untrusted source
(userspace like TAP or packet socket) but they should be validated there.

Thanks


>
> I would argue that code in drivers/* shouldn't call BUG() macros at all.
>
> If it is impossible, don't check for that or add WARN_ON() and recover,
> but don't crash whole system.
>
> Thanks
>

2021-06-02 12:57:30

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH] virtio_net: Remove BUG() to aviod machine dead

On Wed, Jun 02, 2021 at 03:14:50PM +0800, Jason Wang wrote:
>
> 在 2021/6/2 下午1:59, Leon Romanovsky 写道:
> > On Tue, May 25, 2021 at 02:19:03PM +0800, Jason Wang wrote:
> > > 在 2021/5/19 下午10:18, Xianting Tian 写道:
> > > > thanks, I submit the patch as commented by Andrew
> > > > https://lkml.org/lkml/2021/5/18/256
> > > >
> > > > Actually, if xmit_skb() returns error, below code will give a warning
> > > > with error code.
> > > >
> > > >     /* Try to transmit */
> > > >     err = xmit_skb(sq, skb);
> > > >
> > > >     /* This should not happen! */
> > > >     if (unlikely(err)) {
> > > >         dev->stats.tx_fifo_errors++;
> > > >         if (net_ratelimit())
> > > >             dev_warn(&dev->dev,
> > > >                  "Unexpected TXQ (%d) queue failure: %d\n",
> > > >                  qnum, err);
> > > >         dev->stats.tx_dropped++;
> > > >         dev_kfree_skb_any(skb);
> > > >         return NETDEV_TX_OK;
> > > >     }
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 在 2021/5/18 下午5:54, Michael S. Tsirkin 写道:
> > > > > typo in subject
> > > > >
> > > > > On Tue, May 18, 2021 at 05:46:56PM +0800, Xianting Tian wrote:
> > > > > > When met error, we output a print to avoid a BUG().
> > >
> > > So you don't explain why you need to remove BUG(). I think it deserve a
> > > BUG().
> > BUG() will crash the machine and virtio_net is not kernel core
> > functionality that must stop the machine to prevent anything truly
> > harmful and basic.
>
>
> Note that the BUG() here is not for virtio-net itself. It tells us that a
> bug was found by virtio-net.
>
> That is, the one that produces the skb has a bug, usually it's the network
> core.
>
> There could also be the issue of the packet from untrusted source (userspace
> like TAP or packet socket) but they should be validated there.

So it is even worse than I thought. You are saying that in theory untrusted
remote host can crash system. IMHO, It is definitely not the place to put BUG().

I remind you that in-kernel API is build on the promise that data passed
between and calls are safe and already checked. You don't need to set a
protection from the net/core.

Thanks

>
> Thanks
>
>
> >
> > I would argue that code in drivers/* shouldn't call BUG() macros at all.
> >
> > If it is impossible, don't check for that or add WARN_ON() and recover,
> > but don't crash whole system.
> >
> > Thanks
> >
>