2018-01-23 09:35:22

by Mohammed Gamal

[permalink] [raw]
Subject: [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts

Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split") introduced
a regression that caused VMs not to shutdown after netvsc_device_remove() is
called. This is caused by GPADL teardown sequence change, and while that was
necessary to fix issues with Win2016 hosts, it did introduce a regression for
earlier versions.

Prior to commit 0cf737808 the call sequence in netvsc_device_remove() was as
follows (as implemented in netvsc_destroy_buf()):
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Teardown receive buffer GPADL
3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
4- Teardown send buffer GPADL
5- Close vmbus

This didn't work for WS2016 hosts. Commit 0cf737808 split netvsc_destroy_buf()
into two functions and rearranged the order as follows
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
3- Close vmbus
4- Teardown receive buffer GPADL
5- Teardown send buffer GPADL

That worked well for WS2016 hosts, but for WS2012 hosts it prevented VMs from
shutting down.

This patch series works around this problem. The first patch splits
netvsc_revoke_buf() and netvsc_teardown_gpadl() into two finer grained
functions for tearing down send and receive buffers individally. The second patch
uses the finer grained functions to implement the teardown sequence according to
the host's version. We keep the behavior introduced in 0cf737808ae7 for Windows
2016 hosts, while we re-introduce the old sequence for earlier verions.

Mohammed Gamal (2):
hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
hv_netvsc: Change GPADL teardown order according to Hyper-V version

drivers/net/hyperv/netvsc.c | 50 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 42 insertions(+), 8 deletions(-)

--
1.8.3.1



2018-01-23 09:35:27

by Mohammed Gamal

[permalink] [raw]
Subject: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()

Split each of the functions into two for each of send/recv buffers

Signed-off-by: Mohammed Gamal <[email protected]>
---
drivers/net/hyperv/netvsc.c | 35 +++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index bfc7969..3982f76 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -100,8 +100,8 @@ static void free_netvsc_device_rcu(struct netvsc_device *nvdev)
call_rcu(&nvdev->rcu, free_netvsc_device);
}

-static void netvsc_revoke_buf(struct hv_device *device,
- struct netvsc_device *net_device)
+static void netvsc_revoke_recv_buf(struct hv_device *device,
+ struct netvsc_device *net_device)
{
struct nvsp_message *revoke_packet;
struct net_device *ndev = hv_get_drvdata(device);
@@ -146,6 +146,14 @@ static void netvsc_revoke_buf(struct hv_device *device,
}
net_device->recv_section_cnt = 0;
}
+}
+
+static void netvsc_revoke_send_buf(struct hv_device *device,
+ struct netvsc_device *net_device)
+{
+ struct nvsp_message *revoke_packet;
+ struct net_device *ndev = hv_get_drvdata(device);
+ int ret;

/* Deal with the send buffer we may have setup.
* If we got a send section size, it means we received a
@@ -189,8 +197,8 @@ static void netvsc_revoke_buf(struct hv_device *device,
}
}

-static void netvsc_teardown_gpadl(struct hv_device *device,
- struct netvsc_device *net_device)
+static void netvsc_teardown_recv_buf_gpadl(struct hv_device *device,
+ struct netvsc_device *net_device)
{
struct net_device *ndev = hv_get_drvdata(device);
int ret;
@@ -215,6 +223,13 @@ static void netvsc_teardown_gpadl(struct hv_device *device,
vfree(net_device->recv_buf);
net_device->recv_buf = NULL;
}
+}
+
+static void netvsc_teardown_send_buf_gpadl(struct hv_device *device,
+ struct netvsc_device *net_device)
+{
+ struct net_device *ndev = hv_get_drvdata(device);
+ int ret;

if (net_device->send_buf_gpadl_handle) {
ret = vmbus_teardown_gpadl(device->channel,
@@ -425,8 +440,10 @@ static int netvsc_init_buf(struct hv_device *device,
goto exit;

cleanup:
- netvsc_revoke_buf(device, net_device);
- netvsc_teardown_gpadl(device, net_device);
+ netvsc_revoke_recv_buf(device, net_device);
+ netvsc_revoke_send_buf(device, net_device);
+ netvsc_teardown_recv_buf_gpadl(device, net_device);
+ netvsc_teardown_send_buf_gpadl(device, net_device);

exit:
return ret;
@@ -558,7 +575,8 @@ void netvsc_device_remove(struct hv_device *device)

cancel_work_sync(&net_device->subchan_work);

- netvsc_revoke_buf(device, net_device);
+ netvsc_revoke_recv_buf(device, net_device);
+ netvsc_revoke_send_buf(device, net_device);

RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);

@@ -571,7 +589,8 @@ void netvsc_device_remove(struct hv_device *device)
/* Now, we can close the channel safely */
vmbus_close(device->channel);

- netvsc_teardown_gpadl(device, net_device);
+ netvsc_teardown_recv_buf_gpadl(device, net_device);
+ netvsc_teardown_send_buf_gpadl(device, net_device);

/* And dissassociate NAPI context from device */
for (i = 0; i < net_device->num_chn; i++)
--
1.8.3.1


2018-01-23 09:36:27

by Mohammed Gamal

[permalink] [raw]
Subject: [RFC PATCH 2/2] hv_netvsc: Change GPADL teardown order according to Hyper-V version

Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
introduced a regression causing VMs not to shutdown on pre-Wind2016
hosts after netvsc_remove_device() is called. This was caused as the
GPADL teardown sequence was changed.

This patch restores the old behavior for pre-Win2016 hosts, while
keeping the changes from 0cf7378 for Win2016 and higher hosts.

Signed-off-by: Mohammed Gamal <[email protected]>
---
drivers/net/hyperv/netvsc.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 3982f76..d09bb3b 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -575,8 +575,17 @@ void netvsc_device_remove(struct hv_device *device)

cancel_work_sync(&net_device->subchan_work);

+ /*
+ * Revoke receive buffer. If host is pre-Win2016 then tear down
+ * receive buffer GPADL. Do the same for send buffer.
+ */
netvsc_revoke_recv_buf(device, net_device);
+ if (vmbus_proto_version < VERSION_WIN10)
+ netvsc_teardown_recv_buf_gpadl(device, net_device);
+
netvsc_revoke_send_buf(device, net_device);
+ if (vmbus_proto_version < VERSION_WIN10)
+ netvsc_teardown_send_buf_gpadl(device, net_device);

RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);

@@ -589,8 +598,14 @@ void netvsc_device_remove(struct hv_device *device)
/* Now, we can close the channel safely */
vmbus_close(device->channel);

- netvsc_teardown_recv_buf_gpadl(device, net_device);
- netvsc_teardown_send_buf_gpadl(device, net_device);
+ /*
+ * If host is Win2016 or higher then we do the GPADL tear down
+ * here after VMBus is closed, instead of doing it earlier.
+ */
+ if (vmbus_proto_version >= VERSION_WIN10) {
+ netvsc_teardown_recv_buf_gpadl(device, net_device);
+ netvsc_teardown_send_buf_gpadl(device, net_device);
+ }

/* And dissassociate NAPI context from device */
for (i = 0; i < net_device->num_chn; i++)
--
1.8.3.1


2018-01-23 15:44:27

by Haiyang Zhang

[permalink] [raw]
Subject: RE: [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts



> -----Original Message-----
> From: Mohammed Gamal [mailto:[email protected]]
> Sent: Tuesday, January 23, 2018 4:34 AM
> To: [email protected]
> Cc: [email protected]; [email protected]; KY
> Srinivasan <[email protected]>; Haiyang Zhang <[email protected]>;
> Stephen Hemminger <[email protected]>; [email protected];
> [email protected]; [email protected]; Mohammed Gamal
> <[email protected]>
> Subject: [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012
> hosts
>
> Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
> introduced a regression that caused VMs not to shutdown after
> netvsc_device_remove() is called. This is caused by GPADL teardown
> sequence change, and while that was necessary to fix issues with Win2016
> hosts, it did introduce a regression for earlier versions.
>
> Prior to commit 0cf737808 the call sequence in netvsc_device_remove() was
> as follows (as implemented in netvsc_destroy_buf()):
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Teardown receive buffer GPADL
> 3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 4- Teardown send buffer GPADL
> 5- Close vmbus
>
> This didn't work for WS2016 hosts. Commit 0cf737808 split
> netvsc_destroy_buf() into two functions and rearranged the order as follows
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 3- Close vmbus
> 4- Teardown receive buffer GPADL
> 5- Teardown send buffer GPADL
>
> That worked well for WS2016 hosts, but for WS2012 hosts it prevented VMs
> from shutting down.
>
> This patch series works around this problem. The first patch splits
> netvsc_revoke_buf() and netvsc_teardown_gpadl() into two finer grained
> functions for tearing down send and receive buffers individally. The second
> patch uses the finer grained functions to implement the teardown sequence
> according to the host's version. We keep the behavior introduced in
> 0cf737808ae7 for Windows
> 2016 hosts, while we re-introduce the old sequence for earlier verions.
>
> Mohammed Gamal (2):
> hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
> hv_netvsc: Change GPADL teardown order according to Hyper-V version
>
> drivers/net/hyperv/netvsc.c | 50
> +++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 42 insertions(+), 8 deletions(-)

Thank you for the patches. We are testing them internally.

- Haiyang

2018-01-23 16:34:53

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts

On Tue, 23 Jan 2018 10:34:03 +0100
Mohammed Gamal <[email protected]> wrote:

> Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split") introduced
> a regression that caused VMs not to shutdown after netvsc_device_remove() is
> called. This is caused by GPADL teardown sequence change, and while that was
> necessary to fix issues with Win2016 hosts, it did introduce a regression for
> earlier versions.
>
> Prior to commit 0cf737808 the call sequence in netvsc_device_remove() was as
> follows (as implemented in netvsc_destroy_buf()):
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Teardown receive buffer GPADL
> 3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 4- Teardown send buffer GPADL
> 5- Close vmbus
>
> This didn't work for WS2016 hosts. Commit 0cf737808 split netvsc_destroy_buf()
> into two functions and rearranged the order as follows
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 3- Close vmbus
> 4- Teardown receive buffer GPADL
> 5- Teardown send buffer GPADL
>
> That worked well for WS2016 hosts, but for WS2012 hosts it prevented VMs from
> shutting down.
>
> This patch series works around this problem. The first patch splits
> netvsc_revoke_buf() and netvsc_teardown_gpadl() into two finer grained
> functions for tearing down send and receive buffers individally. The second patch
> uses the finer grained functions to implement the teardown sequence according to
> the host's version. We keep the behavior introduced in 0cf737808ae7 for Windows
> 2016 hosts, while we re-introduce the old sequence for earlier verions.
>
> Mohammed Gamal (2):
> hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
> hv_netvsc: Change GPADL teardown order according to Hyper-V version
>
> drivers/net/hyperv/netvsc.c | 50 +++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 42 insertions(+), 8 deletions(-)
>

The problem the original commit was trying to solve was actions in flight
in the receive buffer on shutdown. Having different ordering for each version of Hyper-V
seems unnecessary. There should be a way to get a stable sequence here.

Let me see if I can shake more information out of the Windows team to see what
the handshake on the other side is. Let's not apply this until then.

2018-01-26 18:11:35

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] hv_netvsc: Fix shutdown regression on Win2012 hosts

On Tue, 23 Jan 2018 10:34:03 +0100
Mohammed Gamal <[email protected]> wrote:

> Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split") introduced
> a regression that caused VMs not to shutdown after netvsc_device_remove() is
> called. This is caused by GPADL teardown sequence change, and while that was
> necessary to fix issues with Win2016 hosts, it did introduce a regression for
> earlier versions.
>
> Prior to commit 0cf737808 the call sequence in netvsc_device_remove() was as
> follows (as implemented in netvsc_destroy_buf()):
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Teardown receive buffer GPADL
> 3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 4- Teardown send buffer GPADL
> 5- Close vmbus
>
> This didn't work for WS2016 hosts. Commit 0cf737808 split netvsc_destroy_buf()
> into two functions and rearranged the order as follows
> 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
> 2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
> 3- Close vmbus
> 4- Teardown receive buffer GPADL
> 5- Teardown send buffer GPADL
>
> That worked well for WS2016 hosts, but for WS2012 hosts it prevented VMs from
> shutting down.
>
> This patch series works around this problem. The first patch splits
> netvsc_revoke_buf() and netvsc_teardown_gpadl() into two finer grained
> functions for tearing down send and receive buffers individally. The second patch
> uses the finer grained functions to implement the teardown sequence according to
> the host's version. We keep the behavior introduced in 0cf737808ae7 for Windows
> 2016 hosts, while we re-introduce the old sequence for earlier verions.
>
> Mohammed Gamal (2):
> hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
> hv_netvsc: Change GPADL teardown order according to Hyper-V version
>
> drivers/net/hyperv/netvsc.c | 50 +++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 42 insertions(+), 8 deletions(-)
>

What I am experimenting with is sending an NDIS_RESET (instead of setting packet filter)
as part of the close processing. This seems more like what the description of what Windows
driver does and matches my reading of the public RNDIS specification.

2018-01-30 19:31:19

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()

On Tue, 23 Jan 2018 10:34:04 +0100
Mohammed Gamal <[email protected]> wrote:

> Split each of the functions into two for each of send/recv buffers
>
> Signed-off-by: Mohammed Gamal <[email protected]>

Splitting these functions is not necessary

2018-01-30 19:32:26

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] hv_netvsc: Change GPADL teardown order according to Hyper-V version

On Tue, 23 Jan 2018 10:34:05 +0100
Mohammed Gamal <[email protected]> wrote:

> Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
> introduced a regression causing VMs not to shutdown on pre-Wind2016
> hosts after netvsc_remove_device() is called. This was caused as the
> GPADL teardown sequence was changed.
>
> This patch restores the old behavior for pre-Win2016 hosts, while
> keeping the changes from 0cf7378 for Win2016 and higher hosts.
>
> Signed-off-by: Mohammed Gamal <[email protected]>

Investigated the Windows driver to see how it handled this.
It uses NVSP version < 4 to check for older hosts. So that patch
should use that.

Currently testing a version with that change.

2018-01-31 11:18:07

by Mohammed Gamal

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()

On Tue, 2018-01-30 at 11:29 -0800, Stephen Hemminger wrote:
> On Tue, 23 Jan 2018 10:34:04 +0100
> Mohammed Gamal <[email protected]> wrote:
>
> > Split each of the functions into two for each of send/recv buffers
> >
> > Signed-off-by: Mohammed Gamal <[email protected]>
>
> Splitting these functions is not necessary

How so? We need to send each message independently, and hence the split
(see cover letter). Is there another way?

2018-01-31 23:24:34

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()

On Wed, 31 Jan 2018 12:16:49 +0100
Mohammed Gamal <[email protected]> wrote:

> On Tue, 2018-01-30 at 11:29 -0800, Stephen Hemminger wrote:
> > On Tue, 23 Jan 2018 10:34:04 +0100
> > Mohammed Gamal <[email protected]> wrote:
> >
> > > Split each of the functions into two for each of send/recv buffers
> > >
> > > Signed-off-by: Mohammed Gamal <[email protected]>
> >
> > Splitting these functions is not necessary
>
> How so? We need to send each message independently, and hence the split
> (see cover letter). Is there another way?

This is all that is needed.


Subject: [PATCH] hv_netvsc: work around for gpadl teardown on older windows
server

On WS2012 the host ignores messages after vmbus channel is closed.
Workaround this by doing what Windows does and send the teardown
before close on older versions of NVSP protocol.

Reported-by: Mohammed Gamal <[email protected]>
Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
Signed-off-by: Stephen Hemminger <[email protected]>
---
drivers/net/hyperv/netvsc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 17e529af79dc..1a3df0eff42f 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -574,10 +574,17 @@ void netvsc_device_remove(struct hv_device *device)
*/
netdev_dbg(ndev, "net device safe to remove\n");

+ /* Workaround for older versions of Windows require that
+ * buffer be revoked before channel is disabled
+ */
+ if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_4)
+ netvsc_teardown_gpadl(device, net_device);
+
/* Now, we can close the channel safely */
vmbus_close(device->channel);

- netvsc_teardown_gpadl(device, net_device);
+ if (net_device->nvsp_version >= NVSP_PROTOCOL_VERSION_4)
+ netvsc_teardown_gpadl(device, net_device);

/* And dissassociate NAPI context from device */
for (i = 0; i < net_device->num_chn; i++)
--
2.15.1


2018-02-01 08:39:08

by Mohammed Gamal

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()

On Wed, 2018-01-31 at 15:01 -0800, Stephen Hemminger wrote:
> On Wed, 31 Jan 2018 12:16:49 +0100
> Mohammed Gamal <[email protected]> wrote:
>
> > On Tue, 2018-01-30 at 11:29 -0800, Stephen Hemminger wrote:
> > > On Tue, 23 Jan 2018 10:34:04 +0100
> > > Mohammed Gamal <[email protected]> wrote:
> > >   
> > > > Split each of the functions into two for each of send/recv
> > > > buffers
> > > >
> > > > Signed-off-by: Mohammed Gamal <[email protected]>  
> > >
> > > Splitting these functions is not necessary  
> >
> > How so? We need to send each message independently, and hence the
> > split
> > (see cover letter). Is there another way?
>
> This is all that is needed.
>
>
> Subject: [PATCH] hv_netvsc: work around for gpadl teardown on older
> windows
>  server
>
> On WS2012 the host ignores messages after vmbus channel is closed.
> Workaround this by doing what Windows does and send the teardown
> before close on older versions of NVSP protocol.
>
> Reported-by: Mohammed Gamal <[email protected]>
> Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
> Signed-off-by: Stephen Hemminger <[email protected]>
> ---
>  drivers/net/hyperv/netvsc.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/hyperv/netvsc.c
> b/drivers/net/hyperv/netvsc.c
> index 17e529af79dc..1a3df0eff42f 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -574,10 +574,17 @@ void netvsc_device_remove(struct hv_device
> *device)
>    */
>   netdev_dbg(ndev, "net device safe to remove\n");
>  
> + /* Workaround for older versions of Windows require that
> +  * buffer be revoked before channel is disabled
> +  */
> + if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_4)
> + netvsc_teardown_gpadl(device, net_device);
> +
>   /* Now, we can close the channel safely */
>   vmbus_close(device->channel);
>  
> - netvsc_teardown_gpadl(device, net_device);
> + if (net_device->nvsp_version >= NVSP_PROTOCOL_VERSION_4)
> + netvsc_teardown_gpadl(device, net_device);
>  
>   /* And dissassociate NAPI context from device */
>   for (i = 0; i < net_device->num_chn; i++)

I've tried a similar workaround before by calling
netvsc_teardown_gpadl() after netvsc_revoke_buf(), but before setting
net_device_ctx->nvdev to NULL and it caused the guest to hang when
trying to change MTU. 

Let me try that change and see if it behaves differently.

2018-02-01 22:35:48

by Mohammed Gamal

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()

On Thu, 2018-02-01 at 09:37 +0100, Mohammed Gamal wrote:
> On Wed, 2018-01-31 at 15:01 -0800, Stephen Hemminger wrote:
> > On Wed, 31 Jan 2018 12:16:49 +0100
> > Mohammed Gamal <[email protected]> wrote:
> >
> > > On Tue, 2018-01-30 at 11:29 -0800, Stephen Hemminger wrote:
> > > > On Tue, 23 Jan 2018 10:34:04 +0100
> > > > Mohammed Gamal <[email protected]> wrote:
> > > >   
> > > > > Split each of the functions into two for each of send/recv
> > > > > buffers
> > > > >
> > > > > Signed-off-by: Mohammed Gamal <[email protected]>  
> > > >
> > > > Splitting these functions is not necessary  
> > >
> > > How so? We need to send each message independently, and hence the
> > > split
> > > (see cover letter). Is there another way?
> >
> > This is all that is needed.
> >
> >
> > Subject: [PATCH] hv_netvsc: work around for gpadl teardown on older
> > windows
> >  server
> >
> > On WS2012 the host ignores messages after vmbus channel is closed.
> > Workaround this by doing what Windows does and send the teardown
> > before close on older versions of NVSP protocol.
> >
> > Reported-by: Mohammed Gamal <[email protected]>
> > Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
> > Signed-off-by: Stephen Hemminger <[email protected]>
> > ---
> >  drivers/net/hyperv/netvsc.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/hyperv/netvsc.c
> > b/drivers/net/hyperv/netvsc.c
> > index 17e529af79dc..1a3df0eff42f 100644
> > --- a/drivers/net/hyperv/netvsc.c
> > +++ b/drivers/net/hyperv/netvsc.c
> > @@ -574,10 +574,17 @@ void netvsc_device_remove(struct hv_device
> > *device)
> >    */
> >   netdev_dbg(ndev, "net device safe to remove\n");
> >  
> > + /* Workaround for older versions of Windows require that
> > +  * buffer be revoked before channel is disabled
> > +  */
> > + if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_4)
> > + netvsc_teardown_gpadl(device, net_device);
> > +
> >   /* Now, we can close the channel safely */
> >   vmbus_close(device->channel);
> >  
> > - netvsc_teardown_gpadl(device, net_device);
> > + if (net_device->nvsp_version >= NVSP_PROTOCOL_VERSION_4)
> > + netvsc_teardown_gpadl(device, net_device);
> >  
> >   /* And dissassociate NAPI context from device */
> >   for (i = 0; i < net_device->num_chn; i++)
>
> I've tried a similar workaround before by calling
> netvsc_teardown_gpadl() after netvsc_revoke_buf(), but before setting
> net_device_ctx->nvdev to NULL and it caused the guest to hang when
> trying to change MTU. 
>
> Let me try that change and see if it behaves differently.

I tested the patch, but I've actually seen some unexpected behavior.

First, net_device->nvsp_version is actually NVSP_PROTOCOL_VERSION_5 on
both my Win2012 and Win2016 hosts that I tested on, so the condition is
never executed.

Second, when doing the check instead as if (vmbus_proto_version <
VERSION_WIN10), I get the same behavior I described above where the
guest hangs as the kernel waits indefinitely in vmbus_teardown_gpadl()
for a completion to be signaled. This is actually what lead me to
propose splitting netvsc_revoke_buf() and netvsc_teardown_gpadl() in my
initial patchset so that we keep the same order of messages and avoid
that indefinite wait.

2018-02-01 22:40:02

by Stephen Hemminger

[permalink] [raw]
Subject: RE: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()

There are multiple issues with some of the parameter change paths.
Still working on getting something stable. Both upstream, and net-next do have crash issues under concurrent changes.

I don't want Linux doing different workaround than Windows if at all possible; because it means that it would require much wider testing against many different versions.
Ps: WS2008r2 still needs to be supported.

-----Original Message-----
From: Mohammed Gamal [mailto:[email protected]]
Sent: Thursday, February 1, 2018 2:34 PM
To: Stephen Hemminger <[email protected]>
Cc: [email protected]; [email protected]; Stephen Hemminger <[email protected]>; Haiyang Zhang <[email protected]>; [email protected]; [email protected]; [email protected]
Subject: Re: [RFC PATCH 1/2] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()

On Thu, 2018-02-01 at 09:37 +0100, Mohammed Gamal wrote:
> On Wed, 2018-01-31 at 15:01 -0800, Stephen Hemminger wrote:
> > On Wed, 31 Jan 2018 12:16:49 +0100
> > Mohammed Gamal <[email protected]> wrote:
> >
> > > On Tue, 2018-01-30 at 11:29 -0800, Stephen Hemminger wrote:
> > > > On Tue, 23 Jan 2018 10:34:04 +0100
> > > > Mohammed Gamal <[email protected]> wrote:
> > > >   
> > > > > Split each of the functions into two for each of send/recv
> > > > > buffers
> > > > >
> > > > > Signed-off-by: Mohammed Gamal <[email protected]>  
> > > >
> > > > Splitting these functions is not necessary  
> > >
> > > How so? We need to send each message independently, and hence the
> > > split
> > > (see cover letter). Is there another way?
> >
> > This is all that is needed.
> >
> >
> > Subject: [PATCH] hv_netvsc: work around for gpadl teardown on older
> > windows
> >  server
> >
> > On WS2012 the host ignores messages after vmbus channel is closed.
> > Workaround this by doing what Windows does and send the teardown
> > before close on older versions of NVSP protocol.
> >
> > Reported-by: Mohammed Gamal <[email protected]>
> > Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
> > Signed-off-by: Stephen Hemminger <[email protected]>
> > ---
> >  drivers/net/hyperv/netvsc.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/hyperv/netvsc.c
> > b/drivers/net/hyperv/netvsc.c
> > index 17e529af79dc..1a3df0eff42f 100644
> > --- a/drivers/net/hyperv/netvsc.c
> > +++ b/drivers/net/hyperv/netvsc.c
> > @@ -574,10 +574,17 @@ void netvsc_device_remove(struct hv_device
> > *device)
> >    */
> >   netdev_dbg(ndev, "net device safe to remove\n");
> >  
> > + /* Workaround for older versions of Windows require that
> > +  * buffer be revoked before channel is disabled
> > +  */
> > + if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_4)
> > + netvsc_teardown_gpadl(device, net_device);
> > +
> >   /* Now, we can close the channel safely */
> >   vmbus_close(device->channel);
> >  
> > - netvsc_teardown_gpadl(device, net_device);
> > + if (net_device->nvsp_version >= NVSP_PROTOCOL_VERSION_4)
> > + netvsc_teardown_gpadl(device, net_device);
> >  
> >   /* And dissassociate NAPI context from device */
> >   for (i = 0; i < net_device->num_chn; i++)
>
> I've tried a similar workaround before by calling
> netvsc_teardown_gpadl() after netvsc_revoke_buf(), but before setting
> net_device_ctx->nvdev to NULL and it caused the guest to hang when
> trying to change MTU. 
>
> Let me try that change and see if it behaves differently.

I tested the patch, but I've actually seen some unexpected behavior.

First, net_device->nvsp_version is actually NVSP_PROTOCOL_VERSION_5 on
both my Win2012 and Win2016 hosts that I tested on, so the condition is
never executed.

Second, when doing the check instead as if (vmbus_proto_version <
VERSION_WIN10), I get the same behavior I described above where the
guest hangs as the kernel waits indefinitely in vmbus_teardown_gpadl()
for a completion to be signaled. This is actually what lead me to
propose splitting netvsc_revoke_buf() and netvsc_teardown_gpadl() in my
initial patchset so that we keep the same order of messages and avoid
that indefinite wait.