2016-10-19 14:29:33

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

Fix in commit 880988348270 ("hv_netvsc: set nvdev link after populating
chn_table") turns out to be incomplete. A crash in
netvsc_get_next_send_section() is observed on mtu change when the device
is under load. The race I identified is: if we get to netvsc_send() after
we set net_device_ctx->nvdev link in netvsc_device_add() but before we
finish netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not
allocated and we crash. Unfortunately we can't set net_device_ctx->nvdev
link after the netvsc_init_buf() call as during the negotiation we need
to receive packets and on the receive path we check for it. It would
probably be possible to split nvdev into a pair of nvdev_in and nvdev_out
links and check them accordingly in get_outbound_net_device()/
get_inbound_net_device() but this looks like an overkill.

Check that send_section_map is allocated in netvsc_send().

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
drivers/net/hyperv/netvsc.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 720b5fa..e2bfaac 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -888,6 +888,13 @@ int netvsc_send(struct hv_device *device,
if (!net_device)
return -ENODEV;

+ /* We may race with netvsc_connect_vsp()/netvsc_init_buf() and get
+ * here before the negotiation with the host is finished and
+ * send_section_map may not be allocated yet.
+ */
+ if (!net_device->send_section_map)
+ return -EAGAIN;
+
out_channel = net_device->chn_table[q_idx];

packet->send_buf_index = NETVSC_INVALID_INDEX;
--
2.7.4


2016-10-20 08:51:11

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

Stephen Hemminger <[email protected]> writes:

> Do we need ACCESS_ONCE() here to avoid check/use issues?
>

I think we don't: this is the only place in the function where we read
the variable so we'll get normal read. We're not trying to syncronize
with netvsc_init_buf() as that would require locking, if we read stale
NULL value after it was already updated on a different CPU we're fine,
we'll just return -EAGAIN.

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:[email protected]]
> Sent: Wednesday, October 19, 2016 2:53 PM
> To: [email protected]
> Cc: Stephen Hemminger <[email protected]>; [email protected]; [email protected]; KY Srinivasan <[email protected]>; Haiyang Zhang <[email protected]>
> Subject: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()
>
> Fix in commit 880988348270 ("hv_netvsc: set nvdev link after populating
> chn_table") turns out to be incomplete. A crash in
> netvsc_get_next_send_section() is observed on mtu change when the device is under load. The race I identified is: if we get to netvsc_send() after we set net_device_ctx->nvdev link in netvsc_device_add() but before we finish netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not allocated and we crash. Unfortunately we can't set net_device_ctx->nvdev link after the netvsc_init_buf() call as during the negotiation we need to receive packets and on the receive path we check for it. It would probably be possible to split nvdev into a pair of nvdev_in and nvdev_out links and check them accordingly in get_outbound_net_device()/
> get_inbound_net_device() but this looks like an overkill.
>
> Check that send_section_map is allocated in netvsc_send().
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> drivers/net/hyperv/netvsc.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 720b5fa..e2bfaac 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -888,6 +888,13 @@ int netvsc_send(struct hv_device *device,
> if (!net_device)
> return -ENODEV;
>
> + /* We may race with netvsc_connect_vsp()/netvsc_init_buf() and get
> + * here before the negotiation with the host is finished and
> + * send_section_map may not be allocated yet.
> + */
> + if (!net_device->send_section_map)
> + return -EAGAIN;
> +
> out_channel = net_device->chn_table[q_idx];
>
> packet->send_buf_index = NETVSC_INVALID_INDEX;
> --
> 2.7.4

--
Vitaly

2016-10-20 09:09:42

by Stephen Hemminger

[permalink] [raw]
Subject: RE: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

Do we need ACCESS_ONCE() here to avoid check/use issues?

-----Original Message-----
From: Vitaly Kuznetsov [mailto:[email protected]]
Sent: Wednesday, October 19, 2016 2:53 PM
To: [email protected]
Cc: Stephen Hemminger <[email protected]>; [email protected]; [email protected]; KY Srinivasan <[email protected]>; Haiyang Zhang <[email protected]>
Subject: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

Fix in commit 880988348270 ("hv_netvsc: set nvdev link after populating
chn_table") turns out to be incomplete. A crash in
netvsc_get_next_send_section() is observed on mtu change when the device is under load. The race I identified is: if we get to netvsc_send() after we set net_device_ctx->nvdev link in netvsc_device_add() but before we finish netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not allocated and we crash. Unfortunately we can't set net_device_ctx->nvdev link after the netvsc_init_buf() call as during the negotiation we need to receive packets and on the receive path we check for it. It would probably be possible to split nvdev into a pair of nvdev_in and nvdev_out links and check them accordingly in get_outbound_net_device()/
get_inbound_net_device() but this looks like an overkill.

Check that send_section_map is allocated in netvsc_send().

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
drivers/net/hyperv/netvsc.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 720b5fa..e2bfaac 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -888,6 +888,13 @@ int netvsc_send(struct hv_device *device,
if (!net_device)
return -ENODEV;

+ /* We may race with netvsc_connect_vsp()/netvsc_init_buf() and get
+ * here before the negotiation with the host is finished and
+ * send_section_map may not be allocated yet.
+ */
+ if (!net_device->send_section_map)
+ return -EAGAIN;
+
out_channel = net_device->chn_table[q_idx];

packet->send_buf_index = NETVSC_INVALID_INDEX;
--
2.7.4


2016-10-20 18:03:06

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

From: Vitaly Kuznetsov <[email protected]>
Date: Thu, 20 Oct 2016 10:51:04 +0200

> Stephen Hemminger <[email protected]> writes:
>
>> Do we need ACCESS_ONCE() here to avoid check/use issues?
>>
>
> I think we don't: this is the only place in the function where we read
> the variable so we'll get normal read. We're not trying to syncronize
> with netvsc_init_buf() as that would require locking, if we read stale
> NULL value after it was already updated on a different CPU we're fine,
> we'll just return -EAGAIN.

The concern is if we race with netvsc_destroy_buf() and this pointer
becomes NULL after the test you are adding.

2016-10-21 11:16:05

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

David Miller <[email protected]> writes:

> From: Vitaly Kuznetsov <[email protected]>
> Date: Thu, 20 Oct 2016 10:51:04 +0200
>
>> Stephen Hemminger <[email protected]> writes:
>>
>>> Do we need ACCESS_ONCE() here to avoid check/use issues?
>>>
>>
>> I think we don't: this is the only place in the function where we read
>> the variable so we'll get normal read. We're not trying to syncronize
>> with netvsc_init_buf() as that would require locking, if we read stale
>> NULL value after it was already updated on a different CPU we're fine,
>> we'll just return -EAGAIN.
>
> The concern is if we race with netvsc_destroy_buf() and this pointer
> becomes NULL after the test you are adding.

Thanks, this is interesting.

We may reach to netvsc_destroy_buf() by 3 pathes:

1) cleanup path in netvsc_init_buf(). It is never called with
send_section_map being not NULL so it seems we're safe.

2) from netvsc_remove() when the device is being removed. As far as I
understand we can't be on the transmit path after we call
unregister_netdev() so we're safe.

3) from netvsc_change_mtu() and netvsc_set_channels(). These pathes are
specific to netvsc driver as basically we need to remove the device and
add it back to change mtu/number of channels. The underligning 'struct
net_device' stays but the rest is being removed and added back. On both
pathes we first call netvsc_close() which does netif_tx_disable() and as
far as I understand (I may be wrong) this guarantees that we won't be in
netvsc_send().

So *I think* that we can't ever free send_section_map while in
netvsc_send() and we can't even get to netvsc_send() after it is freed
but I may have missed something.

--
Vitaly

2016-10-21 14:53:03

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

From: Vitaly Kuznetsov <[email protected]>
Date: Fri, 21 Oct 2016 13:15:53 +0200

> David Miller <[email protected]> writes:
>
>> From: Vitaly Kuznetsov <[email protected]>
>> Date: Thu, 20 Oct 2016 10:51:04 +0200
>>
>>> Stephen Hemminger <[email protected]> writes:
>>>
>>>> Do we need ACCESS_ONCE() here to avoid check/use issues?
>>>>
>>>
>>> I think we don't: this is the only place in the function where we read
>>> the variable so we'll get normal read. We're not trying to syncronize
>>> with netvsc_init_buf() as that would require locking, if we read stale
>>> NULL value after it was already updated on a different CPU we're fine,
>>> we'll just return -EAGAIN.
>>
>> The concern is if we race with netvsc_destroy_buf() and this pointer
>> becomes NULL after the test you are adding.
>
> Thanks, this is interesting.
>
> We may reach to netvsc_destroy_buf() by 3 pathes:
>
> 1) cleanup path in netvsc_init_buf(). It is never called with
> send_section_map being not NULL so it seems we're safe.
>
> 2) from netvsc_remove() when the device is being removed. As far as I
> understand we can't be on the transmit path after we call
> unregister_netdev() so we're safe.
>
> 3) from netvsc_change_mtu() and netvsc_set_channels(). These pathes are
> specific to netvsc driver as basically we need to remove the device and
> add it back to change mtu/number of channels. The underligning 'struct
> net_device' stays but the rest is being removed and added back. On both
> pathes we first call netvsc_close() which does netif_tx_disable() and as
> far as I understand (I may be wrong) this guarantees that we won't be in
> netvsc_send().
>
> So *I think* that we can't ever free send_section_map while in
> netvsc_send() and we can't even get to netvsc_send() after it is freed
> but I may have missed something.

Ok you may be right.

Can't the device be taken down by asynchronous events as well? For example
if the peer end of the interface in the other guest disappears.

2016-10-21 15:17:24

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

David Miller <[email protected]> writes:

> From: Vitaly Kuznetsov <[email protected]>
> Date: Fri, 21 Oct 2016 13:15:53 +0200
>
>> David Miller <[email protected]> writes:
>>
>>> From: Vitaly Kuznetsov <[email protected]>
>>> Date: Thu, 20 Oct 2016 10:51:04 +0200
>>>
>>>> Stephen Hemminger <[email protected]> writes:
>>>>
>>>>> Do we need ACCESS_ONCE() here to avoid check/use issues?
>>>>>
>>>>
>>>> I think we don't: this is the only place in the function where we read
>>>> the variable so we'll get normal read. We're not trying to syncronize
>>>> with netvsc_init_buf() as that would require locking, if we read stale
>>>> NULL value after it was already updated on a different CPU we're fine,
>>>> we'll just return -EAGAIN.
>>>
>>> The concern is if we race with netvsc_destroy_buf() and this pointer
>>> becomes NULL after the test you are adding.
>>
>> Thanks, this is interesting.
>>
>> We may reach to netvsc_destroy_buf() by 3 pathes:
>>
>> 1) cleanup path in netvsc_init_buf(). It is never called with
>> send_section_map being not NULL so it seems we're safe.
>>
>> 2) from netvsc_remove() when the device is being removed. As far as I
>> understand we can't be on the transmit path after we call
>> unregister_netdev() so we're safe.
>>
>> 3) from netvsc_change_mtu() and netvsc_set_channels(). These pathes are
>> specific to netvsc driver as basically we need to remove the device and
>> add it back to change mtu/number of channels. The underligning 'struct
>> net_device' stays but the rest is being removed and added back. On both
>> pathes we first call netvsc_close() which does netif_tx_disable() and as
>> far as I understand (I may be wrong) this guarantees that we won't be in
>> netvsc_send().
>>
>> So *I think* that we can't ever free send_section_map while in
>> netvsc_send() and we can't even get to netvsc_send() after it is freed
>> but I may have missed something.
>
> Ok you may be right.
>
> Can't the device be taken down by asynchronous events as well? For example
> if the peer end of the interface in the other guest disappears.

The device may be hot removed from host's side. In this case we follow
the following call chain:

... -> vmbus_device_unregister() -> ... -> vmbus_remove() -> netvsc_remove()

and netvsc_remove() does netif_tx_disable(); unregister_netdev();
before calling rndis_filter_device_remove() leading to netvsc_destroy_buf().

So it seems we can't be in netvsc_send() when the device is
disappearing.

--
Vitaly

2016-10-21 15:27:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

From: Vitaly Kuznetsov <[email protected]>
Date: Fri, 21 Oct 2016 17:17:18 +0200

> David Miller <[email protected]> writes:
>
>> From: Vitaly Kuznetsov <[email protected]>
>> Date: Fri, 21 Oct 2016 13:15:53 +0200
>>
>>> David Miller <[email protected]> writes:
>>>
>>>> From: Vitaly Kuznetsov <[email protected]>
>>>> Date: Thu, 20 Oct 2016 10:51:04 +0200
>>>>
>>>>> Stephen Hemminger <[email protected]> writes:
>>>>>
>>>>>> Do we need ACCESS_ONCE() here to avoid check/use issues?
>>>>>>
>>>>>
>>>>> I think we don't: this is the only place in the function where we read
>>>>> the variable so we'll get normal read. We're not trying to syncronize
>>>>> with netvsc_init_buf() as that would require locking, if we read stale
>>>>> NULL value after it was already updated on a different CPU we're fine,
>>>>> we'll just return -EAGAIN.
>>>>
>>>> The concern is if we race with netvsc_destroy_buf() and this pointer
>>>> becomes NULL after the test you are adding.
>>>
>>> Thanks, this is interesting.
>>>
>>> We may reach to netvsc_destroy_buf() by 3 pathes:
>>>
>>> 1) cleanup path in netvsc_init_buf(). It is never called with
>>> send_section_map being not NULL so it seems we're safe.
>>>
>>> 2) from netvsc_remove() when the device is being removed. As far as I
>>> understand we can't be on the transmit path after we call
>>> unregister_netdev() so we're safe.
>>>
>>> 3) from netvsc_change_mtu() and netvsc_set_channels(). These pathes are
>>> specific to netvsc driver as basically we need to remove the device and
>>> add it back to change mtu/number of channels. The underligning 'struct
>>> net_device' stays but the rest is being removed and added back. On both
>>> pathes we first call netvsc_close() which does netif_tx_disable() and as
>>> far as I understand (I may be wrong) this guarantees that we won't be in
>>> netvsc_send().
>>>
>>> So *I think* that we can't ever free send_section_map while in
>>> netvsc_send() and we can't even get to netvsc_send() after it is freed
>>> but I may have missed something.
>>
>> Ok you may be right.
>>
>> Can't the device be taken down by asynchronous events as well? For example
>> if the peer end of the interface in the other guest disappears.
>
> The device may be hot removed from host's side. In this case we follow
> the following call chain:
>
> ... -> vmbus_device_unregister() -> ... -> vmbus_remove() -> netvsc_remove()
>
> and netvsc_remove() does netif_tx_disable(); unregister_netdev();
> before calling rndis_filter_device_remove() leading to netvsc_destroy_buf().
>
> So it seems we can't be in netvsc_send() when the device is
> disappearing.

Ok, it all looks good then.

2016-10-21 15:28:06

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

From: Vitaly Kuznetsov <[email protected]>
Date: Wed, 19 Oct 2016 15:53:01 +0200

> Fix in commit 880988348270 ("hv_netvsc: set nvdev link after populating
> chn_table") turns out to be incomplete. A crash in
> netvsc_get_next_send_section() is observed on mtu change when the device
> is under load. The race I identified is: if we get to netvsc_send() after
> we set net_device_ctx->nvdev link in netvsc_device_add() but before we
> finish netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not
> allocated and we crash. Unfortunately we can't set net_device_ctx->nvdev
> link after the netvsc_init_buf() call as during the negotiation we need
> to receive packets and on the receive path we check for it. It would
> probably be possible to split nvdev into a pair of nvdev_in and nvdev_out
> links and check them accordingly in get_outbound_net_device()/
> get_inbound_net_device() but this looks like an overkill.
>
> Check that send_section_map is allocated in netvsc_send().
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>

Applied, thanks.