2018-02-12 01:15:45

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Fix ring buffer signaling



> -----Original Message-----
> From: Michael Kelley [mailto:[email protected]]
> Sent: Saturday, February 10, 2018 12:49 PM
> To: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; Stephen
> Hemminger <[email protected]>; KY Srinivasan
> <[email protected]>
> Cc: Michael Kelley (EOSG) <[email protected]>
> Subject: [PATCH char-misc 1/1] Drivers: hv: vmbus: Fix ring buffer signaling
>
> Fix bugs in signaling the Hyper-V host when freeing space in the
> host->guest ring buffer:
>
> 1. The interrupt_mask must not be used to determine whether to signal
> on the host->guest ring buffer
> 2. The ring buffer write_index must be read (via hv_get_bytes_to_write)
> *after* pending_send_sz is read in order to avoid a race condition
> 3. Comparisons with pending_send_sz must treat the "equals" case as
> not-enough-space
> 4. Don't signal if the pending_send_sz feature is not present. Older
> versions of Hyper-V that don't implement this feature will poll.
>
> Fixes: 03bad714a161 ("vmbus: more host signalling avoidance")
> Signed-off-by: Michael Kelley <[email protected]>
> ---
> drivers/hv/ring_buffer.c | 24 ++++++++++++++++--------
> 1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 50e0714..b64be18 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -423,7 +423,11 @@ struct vmpacket_descriptor *
> void hv_pkt_iter_close(struct vmbus_channel *channel)
> {
> struct hv_ring_buffer_info *rbi = &channel->inbound;
> - u32 orig_write_sz = hv_get_bytes_to_write(rbi);
> + u32 curr_write_sz;
> + u32 delta = rbi->ring_buffer->read_index < rbi->priv_read_index ?
> + (rbi->priv_read_index - rbi->ring_buffer-
> >read_index) :
> + (rbi->ring_datasize - rbi->ring_buffer->read_index +
> + rbi->priv_read_index);
>
> /*
> * Make sure all reads are done before we update the read index
> since
> @@ -446,27 +450,31 @@ void hv_pkt_iter_close(struct vmbus_channel
> *channel)
> */
> virt_mb();
>
> - /* If host has disabled notifications then skip */
> - if (rbi->ring_buffer->interrupt_mask)
> - return;
> -
> if (rbi->ring_buffer->feature_bits.feat_pending_send_sz) {
> u32 pending_sz = READ_ONCE(rbi->ring_buffer-
> >pending_send_sz);
>
> /*
> + * Ensure the read of write_index in
> hv_get_bytes_to_write()
> + * happens after the read of pending_send_sz.
> + */
> + virt_rmb();
We can avoid the read barrier by making the initialization of curr_write_sz conditional
on pending_send_sz being non-zero. Indeed you can make all the signaling code conditional
on pending_send_sz being non-zero.



> + curr_write_sz = hv_get_bytes_to_write(rbi);
> +
> + /*
> * If there was space before we began iteration,
> * then host was not blocked. Also handles case where
> * pending_sz is zero then host has nothing pending
> * and does not need to be signaled.
> */
> - if (orig_write_sz > pending_sz)
> + if (curr_write_sz - delta > pending_sz)
> return;
>
> /* If pending write will not fit, don't give false hope. */
> - if (hv_get_bytes_to_write(rbi) < pending_sz)
> + if (curr_write_sz <= pending_sz)
> return;
> +
> + vmbus_setevent(channel);
> }
>
> - vmbus_setevent(channel);
> }
> EXPORT_SYMBOL_GPL(hv_pkt_iter_close);
> --
> 1.8.3.1



2018-02-12 06:30:04

by Michael Kelley (EOSG)

[permalink] [raw]
Subject: RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Fix ring buffer signaling

> -----Original Message-----
> From: KY Srinivasan
> Sent: Sunday, February 11, 2018 5:14 PM

--- snip ---

> > if (rbi->ring_buffer->feature_bits.feat_pending_send_sz) {
> > u32 pending_sz = READ_ONCE(rbi->ring_buffer-
> > >pending_send_sz);
> >
> > /*
> > + * Ensure the read of write_index in
> > hv_get_bytes_to_write()
> > + * happens after the read of pending_send_sz.
> > + */
> > + virt_rmb();
> We can avoid the read barrier by making the initialization of curr_write_sz conditional
> on pending_send_sz being non-zero. Indeed you can make all the signaling code conditional
> on pending_send_sz being non-zero.
>
>

I agree that we can immediately test pending_send_sz for zero, and exit
if zero. A zero value will be by far the most common path, and would
save executing the read barrier and hv_get_bytes_to_write(). Can also
move the calculation of "delta" to after the test for zero.

But I believe the read barrier is still needed on the path where
pending_send_sz is non-zero. Just the testing the value of pending_send_sz
doesn't guarantee that the write_index wouldn't be speculatively read,
and potentially out-of-order. And we have to consider the out-of-order
behaviors on ARM64 as well.

>
> > + curr_write_sz = hv_get_bytes_to_write(rbi);
> > +
> > + /*
> > * If there was space before we began iteration,
> > * then host was not blocked. Also handles case where
> > * pending_sz is zero then host has nothing pending
> > * and does not need to be signaled.
> > */
> > - if (orig_write_sz > pending_sz)
> > + if (curr_write_sz - delta > pending_sz)
> > return;
> >
> > /* If pending write will not fit, don't give false hope. */
> > - if (hv_get_bytes_to_write(rbi) < pending_sz)
> > + if (curr_write_sz <= pending_sz)
> > return;
> > +
> > + vmbus_setevent(channel);
> > }
> >
> > - vmbus_setevent(channel);
> > }
> > EXPORT_SYMBOL_GPL(hv_pkt_iter_close);
> > --
> > 1.8.3.1