LinuxLists.cc - [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport

2018-07-14 08:50:05

Subject: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport

When client has multiple threads that issue io requests all the
time, and the server has a very good performance, it may cause
cpu is running in the irq context for a long time because it can
check virtqueue has buf in the *while* loop.

So we should keep chan->lock in the whole loop.

Signed-off-by: Yiwen Jiang <[email protected]>
---
net/9p/trans_virtio.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 05006cb..9b0f5f2 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -148,20 +148,18 @@ static void req_done(struct virtqueue *vq)

p9_debug(P9_DEBUG_TRANS, ": request done\n");

+ spin_lock_irqsave(&chan->lock, flags);
while (1) {
- spin_lock_irqsave(&chan->lock, flags);
req = virtqueue_get_buf(chan->vq, &len);
- if (req == NULL) {
- spin_unlock_irqrestore(&chan->lock, flags);
+ if (req == NULL)
break;
- }
chan->ring_bufs_avail = 1;
- spin_unlock_irqrestore(&chan->lock, flags);
/* Wakeup if anyone waiting for VirtIO ring space. */
wake_up(chan->vc_wq);
if (len)
p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
}
+ spin_unlock_irqrestore(&chan->lock, flags);
}

/**
--
1.8.3.1

2018-07-14 09:06:07

by Dominique Martinet

[permalink] [raw]

Subject: Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport

jiangyiwen wrote on Sat, Jul 14, 2018:
> When client has multiple threads that issue io requests all the
> time, and the server has a very good performance, it may cause
> cpu is running in the irq context for a long time because it can
> check virtqueue has buf in the *while* loop.
>
> So we should keep chan->lock in the whole loop.

Hmm, this is generally bad practice to hold a spin lock for long.
In general, spin locks are meant to protect data, not code.

I'd want some numbers to decide on this one, even if I think this
particular case is safe (e.g. this cannot dead-lock)

> Signed-off-by: Yiwen Jiang <[email protected]>
> ---
> net/9p/trans_virtio.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index 05006cb..9b0f5f2 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -148,20 +148,18 @@ static void req_done(struct virtqueue *vq)
>
> p9_debug(P9_DEBUG_TRANS, ": request done\n");
>
> + spin_lock_irqsave(&chan->lock, flags);
> while (1) {
> - spin_lock_irqsave(&chan->lock, flags);
> req = virtqueue_get_buf(chan->vq, &len);
> - if (req == NULL) {
> - spin_unlock_irqrestore(&chan->lock, flags);
> + if (req == NULL)
> break;
> - }
> chan->ring_bufs_avail = 1;
> - spin_unlock_irqrestore(&chan->lock, flags);
> /* Wakeup if anyone waiting for VirtIO ring space. */
> wake_up(chan->vc_wq);

In particular, the wake up here echoes to wait events that will
immediately try to grab the lock, and will needlessly spin on it until
this thread is done.
If we do go this way I'd want setting chan->ring_bufs_avail to be done
just before unlocking and the wakeup to be done just after unlocking out
of the loop iff we processed at least one iteration here.

That should also save you precious cpu cycles while under lock :)

--
Dominique Martinet

2018-07-14 11:13:38

by jiangyiwen

[permalink] [raw]

Subject: Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport

On 2018/7/14 17:05, Dominique Martinet wrote:
> jiangyiwen wrote on Sat, Jul 14, 2018:
>> When client has multiple threads that issue io requests all the
>> time, and the server has a very good performance, it may cause
>> cpu is running in the irq context for a long time because it can
>> check virtqueue has buf in the *while* loop.
>>
>> So we should keep chan->lock in the whole loop.
>
> Hmm, this is generally bad practice to hold a spin lock for long.
> In general, spin locks are meant to protect data, not code.
>
> I'd want some numbers to decide on this one, even if I think this
> particular case is safe (e.g. this cannot dead-lock)
>

Actually, the loop will not hold a spin lock for long, because other
threads will not issue new requests in this case. In addition,
virtio-blk or virtio-scsi also use this solution, I guess it may also
encounter this problem before.

>> Signed-off-by: Yiwen Jiang <[email protected]>
>> ---
>> net/9p/trans_virtio.c | 8 +++-----
>> 1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>> index 05006cb..9b0f5f2 100644
>> --- a/net/9p/trans_virtio.c
>> +++ b/net/9p/trans_virtio.c
>> @@ -148,20 +148,18 @@ static void req_done(struct virtqueue *vq)
>>
>> p9_debug(P9_DEBUG_TRANS, ": request done\n");
>>
>> + spin_lock_irqsave(&chan->lock, flags);
>> while (1) {
>> - spin_lock_irqsave(&chan->lock, flags);
>> req = virtqueue_get_buf(chan->vq, &len);
>> - if (req == NULL) {
>> - spin_unlock_irqrestore(&chan->lock, flags);
>> + if (req == NULL)
>> break;
>> - }
>> chan->ring_bufs_avail = 1;
>> - spin_unlock_irqrestore(&chan->lock, flags);
>> /* Wakeup if anyone waiting for VirtIO ring space. */
>> wake_up(chan->vc_wq);
>
> In particular, the wake up here echoes to wait events that will
> immediately try to grab the lock, and will needlessly spin on it until
> this thread is done.
> If we do go this way I'd want setting chan->ring_bufs_avail to be done
> just before unlocking and the wakeup to be done just after unlocking out
> of the loop iff we processed at least one iteration here.
>

I can move the wakeup operation after the unlocking. Like what I said
above, I think this loop will not execute for long.

Thanks,
Yiwen.

> That should also save you precious cpu cycles while under lock :)
>

2018-07-14 12:48:59

by Dominique Martinet

[permalink] [raw]

Subject: Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport

jiangyiwen wrote on Sat, Jul 14, 2018:
> On 2018/7/14 17:05, Dominique Martinet wrote:
> > jiangyiwen wrote on Sat, Jul 14, 2018:
> >> When client has multiple threads that issue io requests all the
> >> time, and the server has a very good performance, it may cause
> >> cpu is running in the irq context for a long time because it can
> >> check virtqueue has buf in the *while* loop.
> >>
> >> So we should keep chan->lock in the whole loop.
> >
> > Hmm, this is generally bad practice to hold a spin lock for long.
> > In general, spin locks are meant to protect data, not code.
> >
> > I'd want some numbers to decide on this one, even if I think this
> > particular case is safe (e.g. this cannot dead-lock)
> >
>
> Actually, the loop will not hold a spin lock for long, because other
> threads will not issue new requests in this case. In addition,
> virtio-blk or virtio-scsi also use this solution, I guess it may also
> encounter this problem before.

Fair enough. If you do have some numbers to give though (throughput
and/or iops before/after) I'd still be really curious.

> >> chan->ring_bufs_avail = 1;
> >> - spin_unlock_irqrestore(&chan->lock, flags);
> >> /* Wakeup if anyone waiting for VirtIO ring space. */
> >> wake_up(chan->vc_wq);
> >
> > In particular, the wake up here echoes to wait events that will
> > immediately try to grab the lock, and will needlessly spin on it until
> > this thread is done.
> > If we do go this way I'd want setting chan->ring_bufs_avail to be done
> > just before unlocking and the wakeup to be done just after unlocking out
> > of the loop iff we processed at least one iteration here.
>
> I can move the wakeup operation after the unlocking. Like what I said
> above, I think this loop will not execute for long.

Please do, you listed virtio_blk as doing this and they have the same
kind of pattern with a req_done bool and only restarting stopped queues
if they processed something

--
Dominique

2018-07-16 01:56:54

by jiangyiwen

[permalink] [raw]

Subject: Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport

On 2018/7/14 20:47, Dominique Martinet wrote:
> jiangyiwen wrote on Sat, Jul 14, 2018:
>> On 2018/7/14 17:05, Dominique Martinet wrote:
>>> jiangyiwen wrote on Sat, Jul 14, 2018:
>>>> When client has multiple threads that issue io requests all the
>>>> time, and the server has a very good performance, it may cause
>>>> cpu is running in the irq context for a long time because it can
>>>> check virtqueue has buf in the *while* loop.
>>>>
>>>> So we should keep chan->lock in the whole loop.
>>>
>>> Hmm, this is generally bad practice to hold a spin lock for long.
>>> In general, spin locks are meant to protect data, not code.
>>>
>>> I'd want some numbers to decide on this one, even if I think this
>>> particular case is safe (e.g. this cannot dead-lock)
>>>
>>
>> Actually, the loop will not hold a spin lock for long, because other
>> threads will not issue new requests in this case. In addition,
>> virtio-blk or virtio-scsi also use this solution, I guess it may also
>> encounter this problem before.
>
> Fair enough. If you do have some numbers to give though (throughput
> and/or iops before/after) I'd still be really curious.
>
>>>> chan->ring_bufs_avail = 1;
>>>> - spin_unlock_irqrestore(&chan->lock, flags);
>>>> /* Wakeup if anyone waiting for VirtIO ring space. */
>>>> wake_up(chan->vc_wq);
>>>
>>> In particular, the wake up here echoes to wait events that will
>>> immediately try to grab the lock, and will needlessly spin on it until
>>> this thread is done.
>>> If we do go this way I'd want setting chan->ring_bufs_avail to be done
>>> just before unlocking and the wakeup to be done just after unlocking out
>>> of the loop iff we processed at least one iteration here.
>>
>> I can move the wakeup operation after the unlocking. Like what I said
>> above, I think this loop will not execute for long.
>
> Please do, you listed virtio_blk as doing this and they have the same
> kind of pattern with a req_done bool and only restarting stopped queues
> if they processed something
>

You're right, this wake up operation should be put after the unlocking,
I will resend it. In addition, whether I should resend this patch based
on your 9p-next branch?

Thanks,
Yiwen.

2018-07-16 13:39:59

by Dominique Martinet

[permalink] [raw]

Subject: Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport

jiangyiwen wrote on Mon, Jul 16, 2018:
> You're right, this wake up operation should be put after the unlocking,
> I will resend it. In addition, whether I should resend this patch based
> on your 9p-next branch?

There is a trivial conflict with Thomas' validate PDU length patch,
but as it is trivial either work for me - pick whichever is easier to
work with for you.

The main reason I asked for a new version of the other patch is that the
IDR rework changed spin locks, so I'd rather it being clean.

Thanks,
--
Dominique

2018-07-17 01:13:15

by jiangyiwen

[permalink] [raw]

Subject: Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport

On 2018/7/16 21:38, Dominique Martinet wrote:
> jiangyiwen wrote on Mon, Jul 16, 2018:
>> You're right, this wake up operation should be put after the unlocking,
>> I will resend it. In addition, whether I should resend this patch based
>> on your 9p-next branch?
>
> There is a trivial conflict with Thomas' validate PDU length patch,
> but as it is trivial either work for me - pick whichever is easier to
> work with for you.
>
> The main reason I asked for a new version of the other patch is that the
> IDR rework changed spin locks, so I'd rather it being clean.
>
>
> Thanks,
>

ok, I will resend the patch later based on linux-next branch.

Thanks,
Yiwen.