2018-09-26 16:35:18

by Mohammed Gamal

[permalink] [raw]
Subject: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

Dring high network traffic changes to network interface parameters
such as number of channels or MTU can cause a kernel panic with a NULL
pointer dereference. This is due to netvsc_device_remove() being
called and deallocating the channel ring buffers, which can then be
accessed by netvsc_send_pkt() before they're allocated on calling
netvsc_device_add()

The patch fixes this problem by checking the channel state and returning
ENODEV if not yet opened. We also move the call to hv_ringbuf_avail_percent()
which may access the uninitialized ring buffer.

Signed-off-by: Mohammed Gamal <[email protected]>
---
drivers/net/hyperv/netvsc.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index fe01e14..75f1b31 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -825,7 +825,12 @@ static inline int netvsc_send_pkt(
struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx);
u64 req_id;
int ret;
- u32 ring_avail = hv_get_avail_to_write_percent(&out_channel->outbound);
+ u32 ring_avail;
+
+ if (out_channel->state != CHANNEL_OPENED_STATE)
+ return -ENODEV;
+
+ ring_avail = hv_get_avail_to_write_percent(&out_channel->outbound);

nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
if (skb)
--
1.8.3.1



2018-09-26 17:14:51

by Haiyang Zhang

[permalink] [raw]
Subject: RE: [PATCH] hv_netvsc: Make sure out channel is fully opened on send



> -----Original Message-----
> From: Mohammed Gamal <[email protected]>
> Sent: Wednesday, September 26, 2018 12:34 PM
> To: Stephen Hemminger <[email protected]>; [email protected]
> Cc: KY Srinivasan <[email protected]>; Haiyang Zhang
> <[email protected]>; vkuznets <[email protected]>;
> [email protected]; cavery <[email protected]>; linux-
> [email protected]; [email protected]; Mohammed Gamal
> <[email protected]>
> Subject: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
>
> Dring high network traffic changes to network interface parameters such as
> number of channels or MTU can cause a kernel panic with a NULL pointer
> dereference. This is due to netvsc_device_remove() being called and
> deallocating the channel ring buffers, which can then be accessed by
> netvsc_send_pkt() before they're allocated on calling
> netvsc_device_add()
>
> The patch fixes this problem by checking the channel state and returning
> ENODEV if not yet opened. We also move the call to hv_ringbuf_avail_percent()
> which may access the uninitialized ring buffer.
>
> Signed-off-by: Mohammed Gamal <[email protected]>
> ---
> drivers/net/hyperv/netvsc.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index
> fe01e14..75f1b31 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt(
> struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx);
> u64 req_id;
> int ret;
> - u32 ring_avail = hv_get_avail_to_write_percent(&out_channel-
> >outbound);
> + u32 ring_avail;
> +
> + if (out_channel->state != CHANNEL_OPENED_STATE)
> + return -ENODEV;
> +
> + ring_avail = hv_get_avail_to_write_percent(&out_channel->outbound);

When you reproducing the NULL ptr panic, does your kernel include the following patch?
hv_netvsc: common detach logic
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea

We call netif_tx_disable(ndev) and netif_device_detach(ndev) before doing the changes
on MTU or #channels. So there should be no call to start_xmit() when channel is not ready.

If you see the check for CHANNEL_OPENED_STATE is still necessary on upstream kernel (including
the patch " common detach logic "), we should debug further on the code and find out the
root cause.

Thanks,
- Haiyang


2018-09-27 08:57:35

by Mohammed Gamal

[permalink] [raw]
Subject: Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

On Wed, 2018-09-26 at 17:13 +0000, Haiyang Zhang wrote:
> > -----Original Message-----
> > From: Mohammed Gamal <[email protected]>
> > Sent: Wednesday, September 26, 2018 12:34 PM
> > To: Stephen Hemminger <[email protected]>; [email protected].
> > org
> > Cc: KY Srinivasan <[email protected]>; Haiyang Zhang
> > <[email protected]>; vkuznets <[email protected]>;
> > [email protected]; cavery <[email protected]>; linux-
> > [email protected]; [email protected]; Mohammed
> > Gamal
> > <[email protected]>
> > Subject: [PATCH] hv_netvsc: Make sure out channel is fully opened
> > on send
> >
> > Dring high network traffic changes to network interface parameters
> > such as
> > number of channels or MTU can cause a kernel panic with a NULL
> > pointer
> > dereference. This is due to netvsc_device_remove() being called and
> > deallocating the channel ring buffers, which can then be accessed
> > by
> > netvsc_send_pkt() before they're allocated on calling
> > netvsc_device_add()
> >
> > The patch fixes this problem by checking the channel state and
> > returning
> > ENODEV if not yet opened. We also move the call to
> > hv_ringbuf_avail_percent()
> > which may access the uninitialized ring buffer.
> >
> > Signed-off-by: Mohammed Gamal <[email protected]>
> > ---
> >  drivers/net/hyperv/netvsc.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/hyperv/netvsc.c
> > b/drivers/net/hyperv/netvsc.c index
> > fe01e14..75f1b31 100644
> > --- a/drivers/net/hyperv/netvsc.c
> > +++ b/drivers/net/hyperv/netvsc.c
> > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt(
> >   struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > packet->q_idx);
> >   u64 req_id;
> >   int ret;
> > - u32 ring_avail =
> > hv_get_avail_to_write_percent(&out_channel-
> > > outbound);
> >
> > + u32 ring_avail;
> > +
> > + if (out_channel->state != CHANNEL_OPENED_STATE)
> > + return -ENODEV;
> > +
> > + ring_avail = hv_get_avail_to_write_percent(&out_channel-
> > >outbound);
>
> When you reproducing the NULL ptr panic, does your kernel include the
> following patch?
> hv_netvsc: common detach logic
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c
> ommit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea
>
Yes it is included. And the commit did reduce the occurrence of this
race condition, but it still nevertheless occurs albeit rarely.

> We call netif_tx_disable(ndev) and netif_device_detach(ndev) before
> doing the changes 
> on MTU or #channels. So there should be no call to start_xmit() when
> channel is not ready.
>
> If you see the check for CHANNEL_OPENED_STATE is still necessary on
> upstream kernel (including 
> the patch " common detach logic "), we should debug further on the
> code and find out the 
> root cause.
>
> Thanks,
> - Haiyang
>

2018-09-27 10:24:47

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

On Thu, 27 Sep 2018 10:57:05 +0200
Mohammed Gamal <[email protected]> wrote:

> On Wed, 2018-09-26 at 17:13 +0000, Haiyang Zhang wrote:
> > > -----Original Message-----
> > > From: Mohammed Gamal <[email protected]>
> > > Sent: Wednesday, September 26, 2018 12:34 PM
> > > To: Stephen Hemminger <[email protected]>; [email protected].
> > > org
> > > Cc: KY Srinivasan <[email protected]>; Haiyang Zhang
> > > <[email protected]>; vkuznets <[email protected]>;
> > > [email protected]; cavery <[email protected]>; linux-
> > > [email protected]; [email protected]; Mohammed
> > > Gamal
> > > <[email protected]>
> > > Subject: [PATCH] hv_netvsc: Make sure out channel is fully opened
> > > on send
> > >
> > > Dring high network traffic changes to network interface parameters
> > > such as
> > > number of channels or MTU can cause a kernel panic with a NULL
> > > pointer
> > > dereference. This is due to netvsc_device_remove() being called and
> > > deallocating the channel ring buffers, which can then be accessed
> > > by
> > > netvsc_send_pkt() before they're allocated on calling
> > > netvsc_device_add()
> > >
> > > The patch fixes this problem by checking the channel state and
> > > returning
> > > ENODEV if not yet opened. We also move the call to
> > > hv_ringbuf_avail_percent()
> > > which may access the uninitialized ring buffer.
> > >
> > > Signed-off-by: Mohammed Gamal <[email protected]>
> > > ---
> > >  drivers/net/hyperv/netvsc.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/hyperv/netvsc.c
> > > b/drivers/net/hyperv/netvsc.c index
> > > fe01e14..75f1b31 100644
> > > --- a/drivers/net/hyperv/netvsc.c
> > > +++ b/drivers/net/hyperv/netvsc.c
> > > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt(
> > >   struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > > packet->q_idx);
> > >   u64 req_id;
> > >   int ret;
> > > - u32 ring_avail =
> > > hv_get_avail_to_write_percent(&out_channel-
> > > > outbound);
> > >
> > > + u32 ring_avail;
> > > +
> > > + if (out_channel->state != CHANNEL_OPENED_STATE)
> > > + return -ENODEV;
> > > +
> > > + ring_avail = hv_get_avail_to_write_percent(&out_channel-
> > > >outbound);
> >
> > When you reproducing the NULL ptr panic, does your kernel include the
> > following patch?
> > hv_netvsc: common detach logic
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c
> > ommit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea
> >
> Yes it is included. And the commit did reduce the occurrence of this
> race condition, but it still nevertheless occurs albeit rarely.
>
> > We call netif_tx_disable(ndev) and netif_device_detach(ndev) before
> > doing the changes 
> > on MTU or #channels. So there should be no call to start_xmit() when
> > channel is not ready.
> >
> > If you see the check for CHANNEL_OPENED_STATE is still necessary on
> > upstream kernel (including 
> > the patch " common detach logic "), we should debug further on the
> > code and find out the 
> > root cause.
> >
> > Thanks,
> > - Haiyang
> >
> _______________________________________________
> devel mailing list
> [email protected]
> http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

Is there some workload, that can be used to reproduce this?
The stress test from Vitaly with changing parameters while running network traffic
passes now.

Can you reproduce this with the upstream current kernel?

Adding the check in start xmit is still racy, and won't cure the problem.

Another solution would be to add a grace period in the netvsc detach logic.


2018-09-27 10:32:29

by Mohammed Gamal

[permalink] [raw]
Subject: Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

On Thu, 2018-09-27 at 12:23 +0200, Stephen Hemminger wrote:
> On Thu, 27 Sep 2018 10:57:05 +0200
> Mohammed Gamal <[email protected]> wrote:
>
> > On Wed, 2018-09-26 at 17:13 +0000, Haiyang Zhang wrote:
> > > > -----Original Message-----
> > > > From: Mohammed Gamal <[email protected]>
> > > > Sent: Wednesday, September 26, 2018 12:34 PM
> > > > To: Stephen Hemminger <[email protected]>; [email protected]
> > > > nel.
> > > > org
> > > > Cc: KY Srinivasan <[email protected]>; Haiyang Zhang
> > > > <[email protected]>; vkuznets <[email protected]>;
> > > > [email protected]; cavery <[email protected]>; linux-
> > > > [email protected]; [email protected]; Mohammed
> > > > Gamal
> > > > <[email protected]>
> > > > Subject: [PATCH] hv_netvsc: Make sure out channel is fully
> > > > opened
> > > > on send
> > > >
> > > > Dring high network traffic changes to network interface
> > > > parameters
> > > > such as
> > > > number of channels or MTU can cause a kernel panic with a NULL
> > > > pointer
> > > > dereference. This is due to netvsc_device_remove() being called
> > > > and
> > > > deallocating the channel ring buffers, which can then be
> > > > accessed
> > > > by
> > > > netvsc_send_pkt() before they're allocated on calling
> > > > netvsc_device_add()
> > > >
> > > > The patch fixes this problem by checking the channel state and
> > > > returning
> > > > ENODEV if not yet opened. We also move the call to
> > > > hv_ringbuf_avail_percent()
> > > > which may access the uninitialized ring buffer.
> > > >
> > > > Signed-off-by: Mohammed Gamal <[email protected]>
> > > > ---
> > > >  drivers/net/hyperv/netvsc.c | 7 ++++++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/hyperv/netvsc.c
> > > > b/drivers/net/hyperv/netvsc.c index
> > > > fe01e14..75f1b31 100644
> > > > --- a/drivers/net/hyperv/netvsc.c
> > > > +++ b/drivers/net/hyperv/netvsc.c
> > > > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt(
> > > >   struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > > > packet->q_idx);
> > > >   u64 req_id;
> > > >   int ret;
> > > > - u32 ring_avail =
> > > > hv_get_avail_to_write_percent(&out_channel-  
> > > > > outbound);  
> > > >
> > > > + u32 ring_avail;
> > > > +
> > > > + if (out_channel->state != CHANNEL_OPENED_STATE)
> > > > + return -ENODEV;
> > > > +
> > > > + ring_avail =
> > > > hv_get_avail_to_write_percent(&out_channel-  
> > > > > outbound);  
> > >
> > > When you reproducing the NULL ptr panic, does your kernel include
> > > the
> > > following patch?
> > > hv_netvsc: common detach logic
> > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.g
> > > it/c
> > > ommit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea
> > >   
> >
> > Yes it is included. And the commit did reduce the occurrence of
> > this
> > race condition, but it still nevertheless occurs albeit rarely.
> >
> > > We call netif_tx_disable(ndev) and netif_device_detach(ndev)
> > > before
> > > doing the changes 
> > > on MTU or #channels. So there should be no call to start_xmit()
> > > when
> > > channel is not ready.
> > >
> > > If you see the check for CHANNEL_OPENED_STATE is still necessary
> > > on
> > > upstream kernel (including 
> > > the patch " common detach logic "), we should debug further on
> > > the
> > > code and find out the 
> > > root cause.
> > >
> > > Thanks,
> > > - Haiyang
> > >   
> >
> > _______________________________________________
> > devel mailing list
> > [email protected]
> > http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-
> > devel
>
> Is there some workload, that can be used to reproduce this?
> The stress test from Vitaly with changing parameters while running
> network traffic
> passes now.
>
> Can you reproduce this with the upstream current kernel?
>
> Adding the check in start xmit is still racy, and won't cure the
> problem.
>
> Another solution would be to add a grace period in the netvsc detach
> logic.
>

Steps to reproduce are listed here:
https://bugzilla.redhat.com/show_bug.cgi?id=1632653

We've also managed to reproduce the same issue upstream. It's more
likely to be reproduced on Windows 2012R2 than 2016.

Regards,
Mohammed