2015-07-16 17:53:05

by KY Srinivasan

[permalink] [raw]
Subject: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe

The current code returns from probe without waiting for the proper handling
of subchannels that may be requested. If the netvsc driver were to be rapidly
loaded/unloaded, we can trigger a panic as the unload will be tearing
down state that may not have been fully setup yet. We fix this issue by making
sure that we return from the probe call only after ensuring that the
sub-channel offers in flight are properly handled.

Signed-off-by: K. Y. Srinivasan <[email protected]>
Reviewed-and-tested-by: Haiyang Zhang <[email protected]
---
drivers/net/hyperv/hyperv_net.h | 2 ++
drivers/net/hyperv/rndis_filter.c | 25 +++++++++++++++++++++++++
2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 26cd14c..925b75d 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -671,6 +671,8 @@ struct netvsc_device {
u32 send_table[VRSS_SEND_TAB_SIZE];
u32 max_chn;
u32 num_chn;
+ spinlock_t sc_lock; /* Protects num_sc_offered variable */
+ u32 num_sc_offered;
atomic_t queue_sends[NR_CPUS];

/* Holds rndis device info */
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index 2e40417..2e09f3f 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -984,9 +984,16 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc)
struct netvsc_device *nvscdev;
u16 chn_index = new_sc->offermsg.offer.sub_channel_index;
int ret;
+ unsigned long flags;

nvscdev = hv_get_drvdata(new_sc->primary_channel->device_obj);

+ spin_lock_irqsave(&nvscdev->sc_lock, flags);
+ nvscdev->num_sc_offered--;
+ spin_unlock_irqrestore(&nvscdev->sc_lock, flags);
+ if (nvscdev->num_sc_offered == 0)
+ complete(&nvscdev->channel_init_wait);
+
if (chn_index >= nvscdev->num_chn)
return;

@@ -1015,8 +1022,10 @@ int rndis_filter_device_add(struct hv_device *dev,
u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
u32 mtu, size;
u32 num_rss_qs;
+ u32 sc_delta;
const struct cpumask *node_cpu_mask;
u32 num_possible_rss_qs;
+ unsigned long flags;

rndis_device = get_rndis_device();
if (!rndis_device)
@@ -1039,6 +1048,8 @@ int rndis_filter_device_add(struct hv_device *dev,
net_device->max_chn = 1;
net_device->num_chn = 1;

+ spin_lock_init(&net_device->sc_lock);
+
net_device->extension = rndis_device;
rndis_device->net_dev = net_device;

@@ -1116,6 +1127,9 @@ int rndis_filter_device_add(struct hv_device *dev,
num_possible_rss_qs = cpumask_weight(node_cpu_mask);
net_device->num_chn = min(num_possible_rss_qs, num_rss_qs);

+ num_rss_qs = net_device->num_chn - 1;
+ net_device->num_sc_offered = num_rss_qs;
+
if (net_device->num_chn == 1)
goto out;

@@ -1157,11 +1171,22 @@ int rndis_filter_device_add(struct hv_device *dev,

ret = rndis_filter_set_rss_param(rndis_device, net_device->num_chn);

+ /*
+ * Wait for the host to send us the sub-channel offers.
+ */
+ spin_lock_irqsave(&net_device->sc_lock, flags);
+ sc_delta = net_device->num_chn - 1 - num_rss_qs;
+ net_device->num_sc_offered -= sc_delta;
+ spin_unlock_irqrestore(&net_device->sc_lock, flags);
+
+ if (net_device->num_sc_offered != 0)
+ wait_for_completion(&net_device->channel_init_wait);
out:
if (ret) {
net_device->max_chn = 1;
net_device->num_chn = 1;
}
+
return 0; /* return 0 because primary channel can be used alone */

err_dev_remv:
--
1.7.4.1


2015-07-17 10:00:59

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe

> From: K. Y. Srinivasan
> Sent: Friday, July 17, 2015 3:17
> Subject: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed
> during probe
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> ...
> @@ -1116,6 +1127,9 @@ int rndis_filter_device_add(struct hv_device *dev,
> num_possible_rss_qs = cpumask_weight(node_cpu_mask);
> net_device->num_chn = min(num_possible_rss_qs, num_rss_qs);
>
> + num_rss_qs = net_device->num_chn - 1;
> + net_device->num_sc_offered = num_rss_qs;
> +
> if (net_device->num_chn == 1)
> goto out;
>
> @@ -1157,11 +1171,22 @@ int rndis_filter_device_add(struct hv_device *dev,
>
> ret = rndis_filter_set_rss_param(rndis_device, net_device->num_chn);
>
> + /*
> + * Wait for the host to send us the sub-channel offers.
> + */
> + spin_lock_irqsave(&net_device->sc_lock, flags);
> + sc_delta = net_device->num_chn - 1 - num_rss_qs;
> + net_device->num_sc_offered -= sc_delta;

Hi KY,
IMO here the "-= " should be "+="?

I think sc_delta is usually <= 0, meaning the host may allocate less subchannels than
we expect.
With "-=", net_device->num_sc_offered can become bigger -- this doesn't seem correct.

Why not use
"net_device->num_sc_offered = net_device->num_chn - 1;" directly?
At this point, net_device->num_chn has been the number of the actual channels.


> + spin_unlock_irqrestore(&net_device->sc_lock, flags);
> +
> + if (net_device->num_sc_offered != 0)
> + wait_for_completion(&net_device->channel_init_wait);

BTW, I also tested the patch and I can confirm the panic I saw disappeared with the patch.

-- Dexuan

2015-07-17 10:07:15

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe

> From: K. Y. Srinivasan
> Sent: Friday, July 17, 2015 3:17
> Subject: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed
> during probe
>
> The current code returns from probe without waiting for the proper handling
> of subchannels that may be requested. If the netvsc driver were to be rapidly
> loaded/unloaded, we can trigger a panic as the unload will be tearing
> down state that may not have been fully setup yet. We fix this issue by making
> sure that we return from the probe call only after ensuring that the
> sub-channel offers in flight are properly handled.
>
> ---
> drivers/net/hyperv/hyperv_net.h | 2 ++
> drivers/net/hyperv/rndis_filter.c | 25 +++++++++++++++++++++++++
> 2 files changed, 27 insertions(+), 0 deletions(-)

BTW, not sure if we should make the same fix to storvsc.

IMO storvsc should have the same issue, at least in theory, though usually it's
unlikely to unload storvsc. :-)

Thanks,
-- Dexuan

2015-07-17 14:12:56

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe

"K. Y. Srinivasan" <[email protected]> writes:

> The current code returns from probe without waiting for the proper handling
> of subchannels that may be requested. If the netvsc driver were to be rapidly
> loaded/unloaded, we can trigger a panic as the unload will be tearing
> down state that may not have been fully setup yet. We fix this issue by making
> sure that we return from the probe call only after ensuring that the
> sub-channel offers in flight are properly handled.
>
> Signed-off-by: K. Y. Srinivasan <[email protected]>
> Reviewed-and-tested-by: Haiyang Zhang <[email protected]
> ---
> drivers/net/hyperv/hyperv_net.h | 2 ++
> drivers/net/hyperv/rndis_filter.c | 25 +++++++++++++++++++++++++
> 2 files changed, 27 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 26cd14c..925b75d 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -671,6 +671,8 @@ struct netvsc_device {
> u32 send_table[VRSS_SEND_TAB_SIZE];
> u32 max_chn;
> u32 num_chn;
> + spinlock_t sc_lock; /* Protects num_sc_offered variable */
> + u32 num_sc_offered;
> atomic_t queue_sends[NR_CPUS];
>
> /* Holds rndis device info */
> diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
> index 2e40417..2e09f3f 100644
> --- a/drivers/net/hyperv/rndis_filter.c
> +++ b/drivers/net/hyperv/rndis_filter.c
> @@ -984,9 +984,16 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc)
> struct netvsc_device *nvscdev;
> u16 chn_index = new_sc->offermsg.offer.sub_channel_index;
> int ret;
> + unsigned long flags;
>
> nvscdev = hv_get_drvdata(new_sc->primary_channel->device_obj);
>
> + spin_lock_irqsave(&nvscdev->sc_lock, flags);
> + nvscdev->num_sc_offered--;
> + spin_unlock_irqrestore(&nvscdev->sc_lock, flags);
> + if (nvscdev->num_sc_offered == 0)
> + complete(&nvscdev->channel_init_wait);
> +
> if (chn_index >= nvscdev->num_chn)
> return;
>
> @@ -1015,8 +1022,10 @@ int rndis_filter_device_add(struct hv_device *dev,
> u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
> u32 mtu, size;
> u32 num_rss_qs;
> + u32 sc_delta;
> const struct cpumask *node_cpu_mask;
> u32 num_possible_rss_qs;
> + unsigned long flags;
>
> rndis_device = get_rndis_device();
> if (!rndis_device)
> @@ -1039,6 +1048,8 @@ int rndis_filter_device_add(struct hv_device *dev,
> net_device->max_chn = 1;
> net_device->num_chn = 1;
>
> + spin_lock_init(&net_device->sc_lock);
> +
> net_device->extension = rndis_device;
> rndis_device->net_dev = net_device;
>
> @@ -1116,6 +1127,9 @@ int rndis_filter_device_add(struct hv_device *dev,
> num_possible_rss_qs = cpumask_weight(node_cpu_mask);
> net_device->num_chn = min(num_possible_rss_qs, num_rss_qs);
>
> + num_rss_qs = net_device->num_chn - 1;
> + net_device->num_sc_offered = num_rss_qs;
> +
> if (net_device->num_chn == 1)
> goto out;
>
> @@ -1157,11 +1171,22 @@ int rndis_filter_device_add(struct hv_device *dev,
>
> ret = rndis_filter_set_rss_param(rndis_device, net_device->num_chn);
>
> + /*
> + * Wait for the host to send us the sub-channel offers.
> + */
> + spin_lock_irqsave(&net_device->sc_lock, flags);
> + sc_delta = net_device->num_chn - 1 - num_rss_qs;
> + net_device->num_sc_offered -= sc_delta;
> + spin_unlock_irqrestore(&net_device->sc_lock, flags);
> +
> + if (net_device->num_sc_offered != 0)
> + wait_for_completion(&net_device->channel_init_wait);

I'd suggest we add an essentian timeout (big, let's say 30 sec.)
here. In case something goes wrong we don't really want to hang the
whole kernel for forever. Such bugs are hard to debug as if a 'kernel
hangs' is reported we can't be sure which wait caused it. We can even
have something like:

t = wait_for_completion_timeout(&net_device->channel_init_wait, 30*HZ);
BUG_ON(t == 0);

This is much better as we'll be sure what went wrong. (I know other
pieces of hyper-v code use wait_for_completion() without a timeout, this
is rather a general suggestion for all of them).

> out:
> if (ret) {
> net_device->max_chn = 1;
> net_device->num_chn = 1;
> }
> +
> return 0; /* return 0 because primary channel can be used alone */
>
> err_dev_remv:

--
Vitaly

2015-07-17 15:33:08

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe



> -----Original Message-----
> From: Dexuan Cui
> Sent: Friday, July 17, 2015 3:01 AM
> To: KY Srinivasan; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Cc: KY Srinivasan
> Subject: RE: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
> processed during probe
>
> > From: K. Y. Srinivasan
> > Sent: Friday, July 17, 2015 3:17
> > Subject: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
> processed
> > during probe
> > diff --git a/drivers/net/hyperv/hyperv_net.h
> b/drivers/net/hyperv/hyperv_net.h
> > ...
> > @@ -1116,6 +1127,9 @@ int rndis_filter_device_add(struct hv_device
> *dev,
> > num_possible_rss_qs = cpumask_weight(node_cpu_mask);
> > net_device->num_chn = min(num_possible_rss_qs, num_rss_qs);
> >
> > + num_rss_qs = net_device->num_chn - 1;
> > + net_device->num_sc_offered = num_rss_qs;
> > +
> > if (net_device->num_chn == 1)
> > goto out;
> >
> > @@ -1157,11 +1171,22 @@ int rndis_filter_device_add(struct hv_device
> *dev,
> >
> > ret = rndis_filter_set_rss_param(rndis_device, net_device-
> >num_chn);
> >
> > + /*
> > + * Wait for the host to send us the sub-channel offers.
> > + */
> > + spin_lock_irqsave(&net_device->sc_lock, flags);
> > + sc_delta = net_device->num_chn - 1 - num_rss_qs;
> > + net_device->num_sc_offered -= sc_delta;
>
> Hi KY,
> IMO here the "-= " should be "+="?
>
> I think sc_delta is usually <= 0, meaning the host may allocate less
> subchannels than
> we expect.
> With "-=", net_device->num_sc_offered can become bigger -- this doesn't
> seem correct.
We control how many sub-channels we want the host to offer (say sc_requested). Based on this
number we begin to track how many have actually been processed - we decrement sc_requested
each time a sub-channel offer is processed. If the host were to actually offer all that we have requested,
then checking for sc_requested to be zero is sufficient to ensure that we have processed all the
potentially in-flight sub-channels. However, the host may choose to offer less than what we had asked for
and the variable "delta" is tracking this difference. Since we are counting down from what we had asked for
we have to subtract "delta" for proper accounting.

>
> Why not use
> "net_device->num_sc_offered = net_device->num_chn - 1;" directly?
> At this point, net_device->num_chn has been the number of the actual
> channels.

I am not sure what the question here is. num_sc_offered is initialized to the number we
are going to ask and this is the number that will be decremented each time a sub-channel
is processed. Since the host may decide to offer us less than what we had asked and some
sub-channels may have already been processed (num_sc_offerred decremented accordingly)
by the time we discover that the host has offered us less than what we asked for, we adjust
num_sc_offered accordingly.

>
>
> > + spin_unlock_irqrestore(&net_device->sc_lock, flags);
> > +
> > + if (net_device->num_sc_offered != 0)
> > + wait_for_completion(&net_device->channel_init_wait);
>
> BTW, I also tested the patch and I can confirm the panic I saw disappeared
> with the patch.

Thank you.

K. Y
>
> -- Dexuan

2015-07-17 15:34:24

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe



> -----Original Message-----
> From: Dexuan Cui
> Sent: Friday, July 17, 2015 3:07 AM
> To: KY Srinivasan; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Cc: KY Srinivasan
> Subject: RE: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
> processed during probe
>
> > From: K. Y. Srinivasan
> > Sent: Friday, July 17, 2015 3:17
> > Subject: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
> processed
> > during probe
> >
> > The current code returns from probe without waiting for the proper
> handling
> > of subchannels that may be requested. If the netvsc driver were to be
> rapidly
> > loaded/unloaded, we can trigger a panic as the unload will be tearing
> > down state that may not have been fully setup yet. We fix this issue by
> making
> > sure that we return from the probe call only after ensuring that the
> > sub-channel offers in flight are properly handled.
> >
> > ---
> > drivers/net/hyperv/hyperv_net.h | 2 ++
> > drivers/net/hyperv/rndis_filter.c | 25 +++++++++++++++++++++++++
> > 2 files changed, 27 insertions(+), 0 deletions(-)
>
> BTW, not sure if we should make the same fix to storvsc.
>
> IMO storvsc should have the same issue, at least in theory, though usually it's
> unlikely to unload storvsc. :-)

You are right; I am planning to submit a similar patch for storvsc. As you note,
this scenario is unlikely for sorvsc.

K. Y
>
> Thanks,
> -- Dexuan

2015-07-17 15:53:59

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:[email protected]]
> Sent: Friday, July 17, 2015 7:13 AM
> To: KY Srinivasan
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Dexuan Cui
> Subject: Re: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
> processed during probe
>
> "K. Y. Srinivasan" <[email protected]> writes:
>
> > The current code returns from probe without waiting for the proper
> handling
> > of subchannels that may be requested. If the netvsc driver were to be
> rapidly
> > loaded/unloaded, we can trigger a panic as the unload will be tearing
> > down state that may not have been fully setup yet. We fix this issue by
> making
> > sure that we return from the probe call only after ensuring that the
> > sub-channel offers in flight are properly handled.
> >
> > Signed-off-by: K. Y. Srinivasan <[email protected]>
> > Reviewed-and-tested-by: Haiyang Zhang <[email protected]
> > ---
> > drivers/net/hyperv/hyperv_net.h | 2 ++
> > drivers/net/hyperv/rndis_filter.c | 25 +++++++++++++++++++++++++
> > 2 files changed, 27 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/net/hyperv/hyperv_net.h
> b/drivers/net/hyperv/hyperv_net.h
> > index 26cd14c..925b75d 100644
> > --- a/drivers/net/hyperv/hyperv_net.h
> > +++ b/drivers/net/hyperv/hyperv_net.h
> > @@ -671,6 +671,8 @@ struct netvsc_device {
> > u32 send_table[VRSS_SEND_TAB_SIZE];
> > u32 max_chn;
> > u32 num_chn;
> > + spinlock_t sc_lock; /* Protects num_sc_offered variable */
> > + u32 num_sc_offered;
> > atomic_t queue_sends[NR_CPUS];
> >
> > /* Holds rndis device info */
> > diff --git a/drivers/net/hyperv/rndis_filter.c
> b/drivers/net/hyperv/rndis_filter.c
> > index 2e40417..2e09f3f 100644
> > --- a/drivers/net/hyperv/rndis_filter.c
> > +++ b/drivers/net/hyperv/rndis_filter.c
> > @@ -984,9 +984,16 @@ static void netvsc_sc_open(struct vmbus_channel
> *new_sc)
> > struct netvsc_device *nvscdev;
> > u16 chn_index = new_sc->offermsg.offer.sub_channel_index;
> > int ret;
> > + unsigned long flags;
> >
> > nvscdev = hv_get_drvdata(new_sc->primary_channel->device_obj);
> >
> > + spin_lock_irqsave(&nvscdev->sc_lock, flags);
> > + nvscdev->num_sc_offered--;
> > + spin_unlock_irqrestore(&nvscdev->sc_lock, flags);
> > + if (nvscdev->num_sc_offered == 0)
> > + complete(&nvscdev->channel_init_wait);
> > +
> > if (chn_index >= nvscdev->num_chn)
> > return;
> >
> > @@ -1015,8 +1022,10 @@ int rndis_filter_device_add(struct hv_device
> *dev,
> > u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
> > u32 mtu, size;
> > u32 num_rss_qs;
> > + u32 sc_delta;
> > const struct cpumask *node_cpu_mask;
> > u32 num_possible_rss_qs;
> > + unsigned long flags;
> >
> > rndis_device = get_rndis_device();
> > if (!rndis_device)
> > @@ -1039,6 +1048,8 @@ int rndis_filter_device_add(struct hv_device
> *dev,
> > net_device->max_chn = 1;
> > net_device->num_chn = 1;
> >
> > + spin_lock_init(&net_device->sc_lock);
> > +
> > net_device->extension = rndis_device;
> > rndis_device->net_dev = net_device;
> >
> > @@ -1116,6 +1127,9 @@ int rndis_filter_device_add(struct hv_device
> *dev,
> > num_possible_rss_qs = cpumask_weight(node_cpu_mask);
> > net_device->num_chn = min(num_possible_rss_qs, num_rss_qs);
> >
> > + num_rss_qs = net_device->num_chn - 1;
> > + net_device->num_sc_offered = num_rss_qs;
> > +
> > if (net_device->num_chn == 1)
> > goto out;
> >
> > @@ -1157,11 +1171,22 @@ int rndis_filter_device_add(struct hv_device
> *dev,
> >
> > ret = rndis_filter_set_rss_param(rndis_device, net_device-
> >num_chn);
> >
> > + /*
> > + * Wait for the host to send us the sub-channel offers.
> > + */
> > + spin_lock_irqsave(&net_device->sc_lock, flags);
> > + sc_delta = net_device->num_chn - 1 - num_rss_qs;
> > + net_device->num_sc_offered -= sc_delta;
> > + spin_unlock_irqrestore(&net_device->sc_lock, flags);
> > +
> > + if (net_device->num_sc_offered != 0)
> > + wait_for_completion(&net_device->channel_init_wait);
>
> I'd suggest we add an essentian timeout (big, let's say 30 sec.)
> here. In case something goes wrong we don't really want to hang the
> whole kernel for forever. Such bugs are hard to debug as if a 'kernel
> hangs' is reported we can't be sure which wait caused it. We can even
> have something like:
>
> t = wait_for_completion_timeout(&net_device->channel_init_wait, 30*HZ);
> BUG_ON(t == 0);
>
> This is much better as we'll be sure what went wrong. (I know other
> pieces of hyper-v code use wait_for_completion() without a timeout, this
> is rather a general suggestion for all of them).

There is some history here. Initially, I had timeout for calls where we could reasonably
rollback state if we timed out. Some calls were subsequently changed to unconditional
wait because under some load conditions, these timeouts would trigger (granted I did not
have 30 second timeout; it was a 5 second timeout).

Greg was opposed to calls to BUG_ON() in general for drivers. Consequently, I have chosen to
wait unconditionally in cases where there is no sensible rollback if we were to timeout (rather
thank bug checking).

In this specific case though, we can have a timeout and we don't need to bug check if we timeout.

I will make this change and resubmit.

Thanks,

K. Y
>
> > out:
> > if (ret) {
> > net_device->max_chn = 1;
> > net_device->num_chn = 1;
> > }
> > +
> > return 0; /* return 0 because primary channel can be used alone */
> >
> > err_dev_remv:
>
> --
> Vitaly

2015-07-17 16:09:51

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe

KY Srinivasan <[email protected]> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:[email protected]]
>> Sent: Friday, July 17, 2015 7:13 AM
>> To: KY Srinivasan
>> Cc: [email protected]; [email protected]; linux-
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; Dexuan Cui
>> Subject: Re: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
>> processed during probe
>>
>> "K. Y. Srinivasan" <[email protected]> writes:
>>
>> > The current code returns from probe without waiting for the proper
>> handling
>> > of subchannels that may be requested. If the netvsc driver were to be
>> rapidly
>> > loaded/unloaded, we can trigger a panic as the unload will be tearing
>> > down state that may not have been fully setup yet. We fix this issue by
>> making
>> > sure that we return from the probe call only after ensuring that the
>> > sub-channel offers in flight are properly handled.
>> >
>> > Signed-off-by: K. Y. Srinivasan <[email protected]>
>> > Reviewed-and-tested-by: Haiyang Zhang <[email protected]
>> > ---
>> > drivers/net/hyperv/hyperv_net.h | 2 ++
>> > drivers/net/hyperv/rndis_filter.c | 25 +++++++++++++++++++++++++
>> > 2 files changed, 27 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/drivers/net/hyperv/hyperv_net.h
>> b/drivers/net/hyperv/hyperv_net.h
>> > index 26cd14c..925b75d 100644
>> > --- a/drivers/net/hyperv/hyperv_net.h
>> > +++ b/drivers/net/hyperv/hyperv_net.h
>> > @@ -671,6 +671,8 @@ struct netvsc_device {
>> > u32 send_table[VRSS_SEND_TAB_SIZE];
>> > u32 max_chn;
>> > u32 num_chn;
>> > + spinlock_t sc_lock; /* Protects num_sc_offered variable */
>> > + u32 num_sc_offered;
>> > atomic_t queue_sends[NR_CPUS];
>> >
>> > /* Holds rndis device info */
>> > diff --git a/drivers/net/hyperv/rndis_filter.c
>> b/drivers/net/hyperv/rndis_filter.c
>> > index 2e40417..2e09f3f 100644
>> > --- a/drivers/net/hyperv/rndis_filter.c
>> > +++ b/drivers/net/hyperv/rndis_filter.c
>> > @@ -984,9 +984,16 @@ static void netvsc_sc_open(struct vmbus_channel
>> *new_sc)
>> > struct netvsc_device *nvscdev;
>> > u16 chn_index = new_sc->offermsg.offer.sub_channel_index;
>> > int ret;
>> > + unsigned long flags;
>> >
>> > nvscdev = hv_get_drvdata(new_sc->primary_channel->device_obj);
>> >
>> > + spin_lock_irqsave(&nvscdev->sc_lock, flags);
>> > + nvscdev->num_sc_offered--;
>> > + spin_unlock_irqrestore(&nvscdev->sc_lock, flags);
>> > + if (nvscdev->num_sc_offered == 0)
>> > + complete(&nvscdev->channel_init_wait);
>> > +
>> > if (chn_index >= nvscdev->num_chn)
>> > return;
>> >
>> > @@ -1015,8 +1022,10 @@ int rndis_filter_device_add(struct hv_device
>> *dev,
>> > u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
>> > u32 mtu, size;
>> > u32 num_rss_qs;
>> > + u32 sc_delta;
>> > const struct cpumask *node_cpu_mask;
>> > u32 num_possible_rss_qs;
>> > + unsigned long flags;
>> >
>> > rndis_device = get_rndis_device();
>> > if (!rndis_device)
>> > @@ -1039,6 +1048,8 @@ int rndis_filter_device_add(struct hv_device
>> *dev,
>> > net_device->max_chn = 1;
>> > net_device->num_chn = 1;
>> >
>> > + spin_lock_init(&net_device->sc_lock);
>> > +
>> > net_device->extension = rndis_device;
>> > rndis_device->net_dev = net_device;
>> >
>> > @@ -1116,6 +1127,9 @@ int rndis_filter_device_add(struct hv_device
>> *dev,
>> > num_possible_rss_qs = cpumask_weight(node_cpu_mask);
>> > net_device->num_chn = min(num_possible_rss_qs, num_rss_qs);
>> >
>> > + num_rss_qs = net_device->num_chn - 1;
>> > + net_device->num_sc_offered = num_rss_qs;
>> > +
>> > if (net_device->num_chn == 1)
>> > goto out;
>> >
>> > @@ -1157,11 +1171,22 @@ int rndis_filter_device_add(struct hv_device
>> *dev,
>> >
>> > ret = rndis_filter_set_rss_param(rndis_device, net_device-
>> >num_chn);
>> >
>> > + /*
>> > + * Wait for the host to send us the sub-channel offers.
>> > + */
>> > + spin_lock_irqsave(&net_device->sc_lock, flags);
>> > + sc_delta = net_device->num_chn - 1 - num_rss_qs;
>> > + net_device->num_sc_offered -= sc_delta;
>> > + spin_unlock_irqrestore(&net_device->sc_lock, flags);
>> > +
>> > + if (net_device->num_sc_offered != 0)
>> > + wait_for_completion(&net_device->channel_init_wait);
>>
>> I'd suggest we add an essentian timeout (big, let's say 30 sec.)
>> here. In case something goes wrong we don't really want to hang the
>> whole kernel for forever. Such bugs are hard to debug as if a 'kernel
>> hangs' is reported we can't be sure which wait caused it. We can even
>> have something like:
>>
>> t = wait_for_completion_timeout(&net_device->channel_init_wait, 30*HZ);
>> BUG_ON(t == 0);
>>
>> This is much better as we'll be sure what went wrong. (I know other
>> pieces of hyper-v code use wait_for_completion() without a timeout, this
>> is rather a general suggestion for all of them).
>
> There is some history here. Initially, I had timeout for calls where we could reasonably
> rollback state if we timed out. Some calls were subsequently changed to unconditional
> wait because under some load conditions, these timeouts would trigger (granted I did not
> have 30 second timeout; it was a 5 second timeout).
>
> Greg was opposed to calls to BUG_ON() in general for drivers.

Even WARN() in my opinion should do the job: we'll at least have
something in the log.

> Consequently, I have chosen to
> wait unconditionally in cases where there is no sensible rollback if we were to timeout (rather
> thank bug checking).

(irrelevant to this patch but) in that case I'd suggest we have
something like:

while (1) {
t = wait_for_completion_timeout(&..., 10*HZ);
if (t)
break;
WARN(1, "Timeout while waiting for hypervisor to reply! Keep waiting...");
}

The sole purpose of that is to have something in dmesg.

>
> In this specific case though, we can have a timeout and we don't need to bug check if we timeout.
>
> I will make this change and resubmit.

Thanks!

>
> Thanks,
>
> K. Y
>>
>> > out:
>> > if (ret) {
>> > net_device->max_chn = 1;
>> > net_device->num_chn = 1;
>> > }
>> > +
>> > return 0; /* return 0 because primary channel can be used alone */
>> >
>> > err_dev_remv:
>>
>> --
>> Vitaly

--
Vitaly

2015-07-17 16:13:09

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:[email protected]]
> Sent: Friday, July 17, 2015 9:10 AM
> To: KY Srinivasan
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Dexuan Cui
> Subject: Re: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
> processed during probe
>
> KY Srinivasan <[email protected]> writes:
>
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:[email protected]]
> >> Sent: Friday, July 17, 2015 7:13 AM
> >> To: KY Srinivasan
> >> Cc: [email protected]; [email protected]; linux-
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]; Dexuan Cui
> >> Subject: Re: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
> >> processed during probe
> >>
> >> "K. Y. Srinivasan" <[email protected]> writes:
> >>
> >> > The current code returns from probe without waiting for the proper
> >> handling
> >> > of subchannels that may be requested. If the netvsc driver were to be
> >> rapidly
> >> > loaded/unloaded, we can trigger a panic as the unload will be tearing
> >> > down state that may not have been fully setup yet. We fix this issue by
> >> making
> >> > sure that we return from the probe call only after ensuring that the
> >> > sub-channel offers in flight are properly handled.
> >> >
> >> > Signed-off-by: K. Y. Srinivasan <[email protected]>
> >> > Reviewed-and-tested-by: Haiyang Zhang <[email protected]
> >> > ---
> >> > drivers/net/hyperv/hyperv_net.h | 2 ++
> >> > drivers/net/hyperv/rndis_filter.c | 25 +++++++++++++++++++++++++
> >> > 2 files changed, 27 insertions(+), 0 deletions(-)
> >> >
> >> > diff --git a/drivers/net/hyperv/hyperv_net.h
> >> b/drivers/net/hyperv/hyperv_net.h
> >> > index 26cd14c..925b75d 100644
> >> > --- a/drivers/net/hyperv/hyperv_net.h
> >> > +++ b/drivers/net/hyperv/hyperv_net.h
> >> > @@ -671,6 +671,8 @@ struct netvsc_device {
> >> > u32 send_table[VRSS_SEND_TAB_SIZE];
> >> > u32 max_chn;
> >> > u32 num_chn;
> >> > + spinlock_t sc_lock; /* Protects num_sc_offered variable */
> >> > + u32 num_sc_offered;
> >> > atomic_t queue_sends[NR_CPUS];
> >> >
> >> > /* Holds rndis device info */
> >> > diff --git a/drivers/net/hyperv/rndis_filter.c
> >> b/drivers/net/hyperv/rndis_filter.c
> >> > index 2e40417..2e09f3f 100644
> >> > --- a/drivers/net/hyperv/rndis_filter.c
> >> > +++ b/drivers/net/hyperv/rndis_filter.c
> >> > @@ -984,9 +984,16 @@ static void netvsc_sc_open(struct
> vmbus_channel
> >> *new_sc)
> >> > struct netvsc_device *nvscdev;
> >> > u16 chn_index = new_sc->offermsg.offer.sub_channel_index;
> >> > int ret;
> >> > + unsigned long flags;
> >> >
> >> > nvscdev = hv_get_drvdata(new_sc->primary_channel->device_obj);
> >> >
> >> > + spin_lock_irqsave(&nvscdev->sc_lock, flags);
> >> > + nvscdev->num_sc_offered--;
> >> > + spin_unlock_irqrestore(&nvscdev->sc_lock, flags);
> >> > + if (nvscdev->num_sc_offered == 0)
> >> > + complete(&nvscdev->channel_init_wait);
> >> > +
> >> > if (chn_index >= nvscdev->num_chn)
> >> > return;
> >> >
> >> > @@ -1015,8 +1022,10 @@ int rndis_filter_device_add(struct hv_device
> >> *dev,
> >> > u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
> >> > u32 mtu, size;
> >> > u32 num_rss_qs;
> >> > + u32 sc_delta;
> >> > const struct cpumask *node_cpu_mask;
> >> > u32 num_possible_rss_qs;
> >> > + unsigned long flags;
> >> >
> >> > rndis_device = get_rndis_device();
> >> > if (!rndis_device)
> >> > @@ -1039,6 +1048,8 @@ int rndis_filter_device_add(struct hv_device
> >> *dev,
> >> > net_device->max_chn = 1;
> >> > net_device->num_chn = 1;
> >> >
> >> > + spin_lock_init(&net_device->sc_lock);
> >> > +
> >> > net_device->extension = rndis_device;
> >> > rndis_device->net_dev = net_device;
> >> >
> >> > @@ -1116,6 +1127,9 @@ int rndis_filter_device_add(struct hv_device
> >> *dev,
> >> > num_possible_rss_qs = cpumask_weight(node_cpu_mask);
> >> > net_device->num_chn = min(num_possible_rss_qs, num_rss_qs);
> >> >
> >> > + num_rss_qs = net_device->num_chn - 1;
> >> > + net_device->num_sc_offered = num_rss_qs;
> >> > +
> >> > if (net_device->num_chn == 1)
> >> > goto out;
> >> >
> >> > @@ -1157,11 +1171,22 @@ int rndis_filter_device_add(struct hv_device
> >> *dev,
> >> >
> >> > ret = rndis_filter_set_rss_param(rndis_device, net_device-
> >> >num_chn);
> >> >
> >> > + /*
> >> > + * Wait for the host to send us the sub-channel offers.
> >> > + */
> >> > + spin_lock_irqsave(&net_device->sc_lock, flags);
> >> > + sc_delta = net_device->num_chn - 1 - num_rss_qs;
> >> > + net_device->num_sc_offered -= sc_delta;
> >> > + spin_unlock_irqrestore(&net_device->sc_lock, flags);
> >> > +
> >> > + if (net_device->num_sc_offered != 0)
> >> > + wait_for_completion(&net_device->channel_init_wait);
> >>
> >> I'd suggest we add an essentian timeout (big, let's say 30 sec.)
> >> here. In case something goes wrong we don't really want to hang the
> >> whole kernel for forever. Such bugs are hard to debug as if a 'kernel
> >> hangs' is reported we can't be sure which wait caused it. We can even
> >> have something like:
> >>
> >> t = wait_for_completion_timeout(&net_device->channel_init_wait,
> 30*HZ);
> >> BUG_ON(t == 0);
> >>
> >> This is much better as we'll be sure what went wrong. (I know other
> >> pieces of hyper-v code use wait_for_completion() without a timeout, this
> >> is rather a general suggestion for all of them).
> >
> > There is some history here. Initially, I had timeout for calls where we could
> reasonably
> > rollback state if we timed out. Some calls were subsequently changed to
> unconditional
> > wait because under some load conditions, these timeouts would trigger
> (granted I did not
> > have 30 second timeout; it was a 5 second timeout).
> >
> > Greg was opposed to calls to BUG_ON() in general for drivers.
>
> Even WARN() in my opinion should do the job: we'll at least have
> something in the log.
>
> > Consequently, I have chosen to
> > wait unconditionally in cases where there is no sensible rollback if we were
> to timeout (rather
> > thank bug checking).
>
> (irrelevant to this patch but) in that case I'd suggest we have
> something like:
>
> while (1) {
> t = wait_for_completion_timeout(&..., 10*HZ);
> if (t)
> break;
> WARN(1, "Timeout while waiting for hypervisor to reply! Keep waiting...");
> }
>
> The sole purpose of that is to have something in dmesg.

Agreed; I will resubmit with these changes.

Thanks,

K. Y
>
> >
> > In this specific case though, we can have a timeout and we don't need to
> bug check if we timeout.
> >
> > I will make this change and resubmit.
>
> Thanks!
>
> >
> > Thanks,
> >
> > K. Y
> >>
> >> > out:
> >> > if (ret) {
> >> > net_device->max_chn = 1;
> >> > net_device->num_chn = 1;
> >> > }
> >> > +
> >> > return 0; /* return 0 because primary channel can be used alone */
> >> >
> >> > err_dev_remv:
> >>
> >> --
> >> Vitaly
>
> --
> Vitaly

2015-07-20 09:39:25

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe

> From: KY Srinivasan
> Sent: Friday, July 17, 2015 23:33
> > From: Dexuan Cui
> > Sent: Friday, July 17, 2015 3:01 AM
> > > From: K. Y. Srinivasan
> > > Sent: Friday, July 17, 2015 3:17
> > > Subject: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
> > processed
> > > during probe
> > > diff --git a/drivers/net/hyperv/hyperv_net.h
> > b/drivers/net/hyperv/hyperv_net.h
> > > ...
> > > @@ -1116,6 +1127,9 @@ int rndis_filter_device_add(struct hv_device
> > *dev,
> > > num_possible_rss_qs = cpumask_weight(node_cpu_mask);
> > > net_device->num_chn = min(num_possible_rss_qs, num_rss_qs);
> > >
> > > +num_rss_qs = net_device->num_chn - 1;
> > > +net_device->num_sc_offered = num_rss_qs;
> > > +
> > > if (net_device->num_chn == 1)
> > > goto out;
> > >
> > > @@ -1157,11 +1171,22 @@ int rndis_filter_device_add(struct hv_device
> > *dev,
> > >
> > > ret = rndis_filter_set_rss_param(rndis_device, net_device-
> > >num_chn);
> > >
> > > +/*
> > > + * Wait for the host to send us the sub-channel offers.
> > > + */
> > > +spin_lock_irqsave(&net_device->sc_lock, flags);
> > > +sc_delta = net_device->num_chn - 1 - num_rss_qs;
> > > +net_device->num_sc_offered -= sc_delta;
> >
> > Hi KY,
> > IMO here the "-= " should be "+="?
> >
> > I think sc_delta is usually <= 0, meaning the host may allocate less
> > subchannels than
> > we expect.
> > With "-=", net_device->num_sc_offered can become bigger -- this doesn't
> > seem correct.
> We control how many sub-channels we want the host to offer (say
> sc_requested). Based on this
> number we begin to track how many have actually been processed - we
> decrement sc_requested
> each time a sub-channel offer is processed. If the host were to actually offer all
> that we have requested,
> then checking for sc_requested to be zero is sufficient to ensure that we have
> processed all the
> potentially in-flight sub-channels. However, the host may choose to offer less
> than what we had asked for
> and the variable "delta" is tracking this difference. Since we are counting down
> from what we had asked for
> we have to subtract "delta" for proper accounting.

Yes, I understand the rationale.
Let me show the issue by example:

Let's assume sc_requested is 7 and the host actually only offers 3 sub-channels:
1. Just before sending the NVSP_MSG5_TYPE_SUBCHANNEL message, we have
net_device->num_chn == 8,
num_rss_qs == 7
net_device->num_sc_offered == 7

2. Just after we get the reply of the message,
net_device->num_chn == 4
sc_delta = net_device->num_chn - 1 - num_rss_qs, so sc_delta == 4 - 1 - 7 = -4
net_device->num_sc_offered -= sc_delta, so
net_device->num_sc_offered == 7 - (-4) = 11. It's not zero, so we sleep on the
wait_for_completion().

3. Now we process the 3 sub-channel and net_device->num_sc_offered will
become 11 -1 -1 -1 == 8 and no complete() will be invoked!

That's why I think the "-=" in the line
net_device->num_sc_offered -= sc_delta
should be "+=".

> > Why not use
> > "net_device->num_sc_offered = net_device->num_chn - 1;" directly?
> > At this point, net_device->num_chn has been the number of the actual
> > channels.
>
> I am not sure what the question here is. num_sc_offered is initialized to the
> number we
> are going to ask and this is the number that will be decremented each time a
> sub-channel
> is processed. Since the host may decide to offer us less than what we had asked
> and some
> sub-channels may have already been processed (num_sc_offerred decremented
> accordingly)
> by the time we discover that the host has offered us less than what we asked for,
> we adjust
> num_sc_offered accordingly.

Sorry, I had a misunderstanding here.
Please just ignore this question.

Thanks,
-- Dexuan

2015-07-20 19:00:14

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe



> -----Original Message-----
> From: Dexuan Cui
> Sent: Monday, July 20, 2015 2:39 AM
> To: KY Srinivasan; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: RE: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
> processed during probe
>
> > From: KY Srinivasan
> > Sent: Friday, July 17, 2015 23:33
> > > From: Dexuan Cui
> > > Sent: Friday, July 17, 2015 3:01 AM
> > > > From: K. Y. Srinivasan
> > > > Sent: Friday, July 17, 2015 3:17
> > > > Subject: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
> > > processed
> > > > during probe
> > > > diff --git a/drivers/net/hyperv/hyperv_net.h
> > > b/drivers/net/hyperv/hyperv_net.h
> > > > ...
> > > > @@ -1116,6 +1127,9 @@ int rndis_filter_device_add(struct hv_device
> > > *dev,
> > > > num_possible_rss_qs = cpumask_weight(node_cpu_mask);
> > > > net_device->num_chn = min(num_possible_rss_qs, num_rss_qs);
> > > >
> > > > +num_rss_qs = net_device->num_chn - 1;
> > > > +net_device->num_sc_offered = num_rss_qs;
> > > > +
> > > > if (net_device->num_chn == 1)
> > > > goto out;
> > > >
> > > > @@ -1157,11 +1171,22 @@ int rndis_filter_device_add(struct
> hv_device
> > > *dev,
> > > >
> > > > ret = rndis_filter_set_rss_param(rndis_device, net_device-
> > > >num_chn);
> > > >
> > > > +/*
> > > > + * Wait for the host to send us the sub-channel offers.
> > > > + */
> > > > +spin_lock_irqsave(&net_device->sc_lock, flags);
> > > > +sc_delta = net_device->num_chn - 1 - num_rss_qs;
> > > > +net_device->num_sc_offered -= sc_delta;
> > >
> > > Hi KY,
> > > IMO here the "-= " should be "+="?
> > >
> > > I think sc_delta is usually <= 0, meaning the host may allocate less
> > > subchannels than
> > > we expect.
> > > With "-=", net_device->num_sc_offered can become bigger -- this
> doesn't
> > > seem correct.
> > We control how many sub-channels we want the host to offer (say
> > sc_requested). Based on this
> > number we begin to track how many have actually been processed - we
> > decrement sc_requested
> > each time a sub-channel offer is processed. If the host were to actually
> offer all
> > that we have requested,
> > then checking for sc_requested to be zero is sufficient to ensure that we
> have
> > processed all the
> > potentially in-flight sub-channels. However, the host may choose to offer
> less
> > than what we had asked for
> > and the variable "delta" is tracking this difference. Since we are counting
> down
> > from what we had asked for
> > we have to subtract "delta" for proper accounting.
>
> Yes, I understand the rationale.
> Let me show the issue by example:
>
> Let's assume sc_requested is 7 and the host actually only offers 3 sub-
> channels:
> 1. Just before sending the NVSP_MSG5_TYPE_SUBCHANNEL message, we
> have
> net_device->num_chn == 8,
> num_rss_qs == 7
> net_device->num_sc_offered == 7
>
> 2. Just after we get the reply of the message,
> net_device->num_chn == 4
> sc_delta = net_device->num_chn - 1 - num_rss_qs, so sc_delta == 4 - 1 - 7 = -
> 4
> net_device->num_sc_offered -= sc_delta, so
> net_device->num_sc_offered == 7 - (-4) = 11. It's not zero, so we sleep on
> the
> wait_for_completion().
>
> 3. Now we process the 3 sub-channel and net_device->num_sc_offered will
> become 11 -1 -1 -1 == 8 and no complete() will be invoked!
>
> That's why I think the "-=" in the line
> net_device->num_sc_offered -= sc_delta
> should be "+=".

Thanks Dexuan; I will fix the issue and resend.

Regards,

K. Y