LinuxLists.cc - BUG_ON in virtio-ring.c

2013-05-24 03:40:25

Subject: BUG_ON in virtio-ring.c

Hi Rusty,

current virtio-ring.c has a BUG_ON in virtqueue_add that checks
total_sg > vg->vring.num, however I'm not sure it really is 100%
correct.

If I have an indirect ring and I'm adding sgs to it and the host is
delayed (say I've got a thread consuming things from the vring and its
off doing something interesting),
I'd really like to get ENOSPC back from virtqueue_add. However if the
indirect addition fails due to free_sg being 0, we hit the BUG_ON
before we ever get to the ENOSPC check.

the BUG_ON is quite valid in the no indirect case, but when we have
indirect buffers it doesn't seem like it always makes sense.

Not sure best way to fix it, I'm just a virtio newbie :)

Dave.

2013-05-27 02:46:00

by Rusty Russell

[permalink] [raw]

Subject: Re: BUG_ON in virtio-ring.c

Dave Airlie <[email protected]> writes:
> Hi Rusty,
>
> current virtio-ring.c has a BUG_ON in virtqueue_add that checks
> total_sg > vg->vring.num, however I'm not sure it really is 100%
> correct.
>
> If I have an indirect ring and I'm adding sgs to it and the host is
> delayed (say I've got a thread consuming things from the vring and its
> off doing something interesting),
> I'd really like to get ENOSPC back from virtqueue_add. However if the
> indirect addition fails due to free_sg being 0, we hit the BUG_ON
> before we ever get to the ENOSPC check.

It is correct for the moment: drivers can't assume indirect buffer
support in the transport.

BUT for a new device, we could say "this depends on indirect descriptor
support", put the appropriate check in the device init, and then remove
the BUG_ON().

> the BUG_ON is quite valid in the no indirect case, but when we have
> indirect buffers it doesn't seem like it always makes sense.
>
> Not sure best way to fix it, I'm just a virtio newbie :)

Mailing me and the list was the right thing, since this raises question
of the spec as well as the Linux implementation.

Good luck!
Rusty.

2013-05-27 03:29:53

by Dave Airlie

[permalink] [raw]

Subject: Re: BUG_ON in virtio-ring.c

>> correct.
>>
>> If I have an indirect ring and I'm adding sgs to it and the host is
>> delayed (say I've got a thread consuming things from the vring and its
>> off doing something interesting),
>> I'd really like to get ENOSPC back from virtqueue_add. However if the
>> indirect addition fails due to free_sg being 0, we hit the BUG_ON
>> before we ever get to the ENOSPC check.
>
> It is correct for the moment: drivers can't assume indirect buffer
> support in the transport.
>
> BUT for a new device, we could say "this depends on indirect descriptor
> support", put the appropriate check in the device init, and then remove
> the BUG_ON().

But if the transport has indirect buffer support, can it change its
mind at runtime?

In this case we have vq->indirect set, but the device has run out of
free buffers,
but it isn't a case that in+out would overflow it if it had free
buffers since it would use
an indirect and succeed.

Getting -ENOSPC is definitely what should happen from what I can see,
not a BUG_ON,
I should get a BUG_ON only if the device reports no indirect support.

Dave.

2013-05-27 05:40:04

by Rusty Russell

[permalink] [raw]

Subject: Re: BUG_ON in virtio-ring.c

Dave Airlie <[email protected]> writes:
>>> correct.
>>>
>>> If I have an indirect ring and I'm adding sgs to it and the host is
>>> delayed (say I've got a thread consuming things from the vring and its
>>> off doing something interesting),
>>> I'd really like to get ENOSPC back from virtqueue_add. However if the
>>> indirect addition fails due to free_sg being 0, we hit the BUG_ON
>>> before we ever get to the ENOSPC check.
>>
>> It is correct for the moment: drivers can't assume indirect buffer
>> support in the transport.
>>
>> BUT for a new device, we could say "this depends on indirect descriptor
>> support", put the appropriate check in the device init, and then remove
>> the BUG_ON().
>
> But if the transport has indirect buffer support, can it change its
> mind at runtime?

It's a layering violation. The current rule is simple:

A driver should never submit a buffer which can't fit in the
ring.

This has the nice property that OOM (ie. indirect buffer alloction
fail) just slows things down, doesn't break things.

> In this case we have vq->indirect set, but the device has run out of
> free buffers,
> but it isn't a case that in+out would overflow it if it had free
> buffers since it would use
> an indirect and succeed.

OK, but when do you re-xmit? What if the ring is empty, and you
submitted a buffer that needs indirect? You won't get interrupted
again, because the device has consumed all the buffers. You need to
have your own timer or something equally hackish.

> Getting -ENOSPC is definitely what should happen from what I can see,
> not a BUG_ON,
> I should get a BUG_ON only if the device reports no indirect support.

I think we should return -ENOMEM if we can't indirect because of failed
allocation and it doesn't fit in the ring, ie:

This:
BUG_ON(total_sg > vq->vring.num);
BUG_ON(total_sg == 0);

if (vq->vq.num_free < total_sg) {
pr_debug("Can't add buf len %i - avail = %i\n",
total_sg, vq->vq.num_free);
/* FIXME: for historical reasons, we force a notify here if
* there are outgoing parts to the buffer. Presumably the
* host should service the ring ASAP. */
if (out_sgs)
vq->notify(&vq->vq);
END_USE(vq);
return -ENOSPC;
}

Becomes (untested):

BUG_ON(total_sg == 0);

if (vq->vq.num_free < total_sg) {
if (total_sg > vq->vring.num) {
BUG_ON(!vq->indirect);
/* Return -ENOMEM only if we have nothing else to do */
if (vq->vq.num_free == vq->vring.num)
return -ENOMEM;
}
pr_debug("Can't add buf len %i - avail = %i\n",
total_sg, vq->vq.num_free);
/* FIXME: for historical reasons, we force a notify here if
* there are outgoing parts to the buffer. Presumably the
* host should service the ring ASAP. */
if (out_sgs)
vq->notify(&vq->vq);
END_USE(vq);
return -ENOSPC;
}

Cheers,
Rusty.

2014-10-07 15:46:58

by Alexey Lapitsky

[permalink] [raw]

Subject: Re: BUG_ON in virtio-ring.c

Hi,

I'm hitting this bug with both ext4 and btrfs.

Here's an example of the backtrace:
https://gist.github.com/vzctl/e888a821333979120932

I tried raising this BUG only for direct ring and it solved the problem:

- BUG_ON(total_sg > vq->vring.num);
+ BUG_ON(total_sg > vq->vring.num && !vq->indirect);

Shall I submit the patch or is a more elaborate fix required?

--

Alexey

On Mon, May 27, 2013 at 7:38 AM, Rusty Russell <[email protected]> wrote:
> Dave Airlie <[email protected]> writes:
>>>> correct.
>>>>
>>>> If I have an indirect ring and I'm adding sgs to it and the host is
>>>> delayed (say I've got a thread consuming things from the vring and its
>>>> off doing something interesting),
>>>> I'd really like to get ENOSPC back from virtqueue_add. However if the
>>>> indirect addition fails due to free_sg being 0, we hit the BUG_ON
>>>> before we ever get to the ENOSPC check.
>>>
>>> It is correct for the moment: drivers can't assume indirect buffer
>>> support in the transport.
>>>
>>> BUT for a new device, we could say "this depends on indirect descriptor
>>> support", put the appropriate check in the device init, and then remove
>>> the BUG_ON().
>>
>> But if the transport has indirect buffer support, can it change its
>> mind at runtime?
>
> It's a layering violation. The current rule is simple:
>
> A driver should never submit a buffer which can't fit in the
> ring.
>
> This has the nice property that OOM (ie. indirect buffer alloction
> fail) just slows things down, doesn't break things.
>
>> In this case we have vq->indirect set, but the device has run out of
>> free buffers,
>> but it isn't a case that in+out would overflow it if it had free
>> buffers since it would use
>> an indirect and succeed.
>
> OK, but when do you re-xmit? What if the ring is empty, and you
> submitted a buffer that needs indirect? You won't get interrupted
> again, because the device has consumed all the buffers. You need to
> have your own timer or something equally hackish.
>
>> Getting -ENOSPC is definitely what should happen from what I can see,
>> not a BUG_ON,
>> I should get a BUG_ON only if the device reports no indirect support.
>
> I think we should return -ENOMEM if we can't indirect because of failed
> allocation and it doesn't fit in the ring, ie:
>
> This:
> BUG_ON(total_sg > vq->vring.num);
> BUG_ON(total_sg == 0);
>
> if (vq->vq.num_free < total_sg) {
> pr_debug("Can't add buf len %i - avail = %i\n",
> total_sg, vq->vq.num_free);
> /* FIXME: for historical reasons, we force a notify here if
> * there are outgoing parts to the buffer. Presumably the
> * host should service the ring ASAP. */
> if (out_sgs)
> vq->notify(&vq->vq);
> END_USE(vq);
> return -ENOSPC;
> }
>
> Becomes (untested):
>
> BUG_ON(total_sg == 0);
>
> if (vq->vq.num_free < total_sg) {
> if (total_sg > vq->vring.num) {
> BUG_ON(!vq->indirect);
> /* Return -ENOMEM only if we have nothing else to do */
> if (vq->vq.num_free == vq->vring.num)
> return -ENOMEM;
> }
> pr_debug("Can't add buf len %i - avail = %i\n",
> total_sg, vq->vq.num_free);
> /* FIXME: for historical reasons, we force a notify here if
> * there are outgoing parts to the buffer. Presumably the
> * host should service the ring ASAP. */
> if (out_sgs)
> vq->notify(&vq->vq);
> END_USE(vq);
> return -ENOSPC;
> }
>
> Cheers,
> Rusty.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-10-13 06:54:48

by Rusty Russell

[permalink] [raw]

Subject: Re: BUG_ON in virtio-ring.c

Alexey Lapitsky <[email protected]> writes:
> Hi,
>
> I'm hitting this bug with both ext4 and btrfs.
>
> Here's an example of the backtrace:
> https://gist.github.com/vzctl/e888a821333979120932
>
> I tried raising this BUG only for direct ring and it solved the problem:
>
> - BUG_ON(total_sg > vq->vring.num);
> + BUG_ON(total_sg > vq->vring.num && !vq->indirect);
>
> Shall I submit the patch or is a more elaborate fix required?

This is wrong. It looks like the block layer is spending down
more sg elements than we have ring entries, yet we tell it not to:

blk_queue_max_segments(q, vblk->sg_elems-2);

If you apply this patch, what happens? Here it prints:

[ 0.616564] virtqueue elements = 128, max_segments = 126 (1 queues)
[ 0.621244] vda: vda1 vda2 < vda5 >
[ 0.632290] virtqueue elements = 128, max_segments = 126 (1 queues)
[ 0.683526] vdb: vdb1 vdb2 < vdb5 >

Cheers,
Rusty.

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 0a581400de0f..aa9d4d313587 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -683,6 +683,13 @@ static int virtblk_probe(struct virtio_device *vdev)
/* We can handle whatever the host told us to handle. */
blk_queue_max_segments(q, vblk->sg_elems-2);

+ printk("virtqueue elements = %u, max_segments = %u (%u queues)",
+ virtqueue_get_vring_size(vblk->vqs[0].vq),
+ vblk->sg_elems-2,
+ vblk->num_vqs);
+
+ BUG_ON(vblk->sg_elems-2 > virtqueue_get_vring_size(vblk->vqs[0].vq));
+
/* No need to bounce any requests */
blk_queue_bounce_limit(q, BLK_BOUNCE_ANY);

2014-11-04 14:45:01

by Alexey Lapitsky

[permalink] [raw]

Subject: Re: BUG_ON in virtio-ring.c

Hi,

Sorry for the long delay. It prints exactly the same:

[ 3.792033] virtqueue elements = 128, max_segments = 126 (1 queues)
[ 3.802191] vda: vda1 vda2 < vda5 >

A little bit more about my setup (if it helps):

It's a qemu-system-x86_64 kvm instance with 16 cores and 10G of RAM.
I can reproduce the bug every time with mkfs.btrfs on a 10GB LVM
volume (right after the reboot).

I have almost no knowledge of vring / virtio.
Is it correct that we need just one sg_elem entry in the vq->vring if
vq->indirect flag is set?
That's what I thought when applying the "BUG_ON(total_sg >
vq->vring.num && !vq->indirect)" patch.

2014-12-18 04:23:29

by Rusty Russell

[permalink] [raw]

Subject: Re: BUG_ON in virtio-ring.c

Alexey Lapitsky <[email protected]> writes:
> Hi,
>
> Sorry for the long delay. It prints exactly the same:
>
> [ 3.792033] virtqueue elements = 128, max_segments = 126 (1 queues)
> [ 3.802191] vda: vda1 vda2 < vda5 >
>
> A little bit more about my setup (if it helps):

OK, I think this is fixed by Ming Lei's recent count updates,
which interestingly did not seem to be CC:stable.

Cheers,
Rusty.