2020-01-06 10:51:47

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

On Wed, Dec 18, 2019 at 04:59:02PM +0100, Christian Borntraeger wrote:
> On 18.12.19 16:10, Michael S. Tsirkin wrote:
> > On Wed, Dec 18, 2019 at 03:43:43PM +0100, Christian Borntraeger wrote:
> >> Michael,
> >>
> >> with
> >> commit db7286b100b503ef80612884453bed53d74c9a16 (refs/bisect/skip-db7286b100b503ef80612884453bed53d74c9a16)
> >> vhost: use batched version by default
> >> plus
> >> commit 6bd262d5eafcdf8cdfae491e2e748e4e434dcda6 (HEAD, refs/bisect/bad)
> >> Revert "vhost/net: add an option to test new code"
> >> to make things compile (your next tree is not easily bisectable, can you fix that as well?).
> >
> > I'll try.
> >
> >>
> >> I get random crashes in my s390 KVM guests after reboot.
> >> Reverting both patches together with commit decd9b8 "vhost: use vhost_desc instead of vhost_log" to
> >> make it compile again) on top of linux-next-1218 makes the problem go away.
> >>
> >> Looks like the batched version is not yet ready for prime time. Can you drop these patches until
> >> we have fixed the issues?
> >>
> >> Christian
> >>
> >
> > Will do, thanks for letting me know.
>
> I have confirmed with the initial reporter (internal test team) that <driver name='qemu'/>
> with a known to be broken linux next kernel also fixes the problem, so it is really the
> vhost changes.

OK I'm back and trying to make it more bisectable.

I pushed a new tag "batch-v2".
It's same code but with this bisect should get more information.


I suspect one of the following:

commit 1414d7ee3d10d2ec2bc4ee652d1d90ec91da1c79
Author: Michael S. Tsirkin <[email protected]>
Date: Mon Oct 7 06:11:18 2019 -0400

vhost: batching fetches

With this patch applied, new and old code perform identically.

Lots of extra optimizations are now possible, e.g.
we can fetch multiple heads with copy_from/to_user now.
We can get rid of maintaining the log array. Etc etc.

Signed-off-by: Michael S. Tsirkin <[email protected]>

commit 50297a8480b439efc5f3f23088cb2d90b799acef
Author: Michael S. Tsirkin <[email protected]>
Date: Wed Dec 11 12:19:26 2019 -0500

vhost: use batched version by default

As testing shows no performance change, switch to that now.

Signed-off-by: Michael S. Tsirkin <[email protected]>


and would like to know which.

Thanks!



2020-01-07 09:00:18

by Christian Borntraeger

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot



On 06.01.20 11:50, Michael S. Tsirkin wrote:
> On Wed, Dec 18, 2019 at 04:59:02PM +0100, Christian Borntraeger wrote:
>> On 18.12.19 16:10, Michael S. Tsirkin wrote:
>>> On Wed, Dec 18, 2019 at 03:43:43PM +0100, Christian Borntraeger wrote:
>>>> Michael,
>>>>
>>>> with
>>>> commit db7286b100b503ef80612884453bed53d74c9a16 (refs/bisect/skip-db7286b100b503ef80612884453bed53d74c9a16)
>>>> vhost: use batched version by default
>>>> plus
>>>> commit 6bd262d5eafcdf8cdfae491e2e748e4e434dcda6 (HEAD, refs/bisect/bad)
>>>> Revert "vhost/net: add an option to test new code"
>>>> to make things compile (your next tree is not easily bisectable, can you fix that as well?).
>>>
>>> I'll try.
>>>
>>>>
>>>> I get random crashes in my s390 KVM guests after reboot.
>>>> Reverting both patches together with commit decd9b8 "vhost: use vhost_desc instead of vhost_log" to
>>>> make it compile again) on top of linux-next-1218 makes the problem go away.
>>>>
>>>> Looks like the batched version is not yet ready for prime time. Can you drop these patches until
>>>> we have fixed the issues?
>>>>
>>>> Christian
>>>>
>>>
>>> Will do, thanks for letting me know.
>>
>> I have confirmed with the initial reporter (internal test team) that <driver name='qemu'/>
>> with a known to be broken linux next kernel also fixes the problem, so it is really the
>> vhost changes.
>
> OK I'm back and trying to make it more bisectable.
>
> I pushed a new tag "batch-v2".
> It's same code but with this bisect should get more information.

I get the following with this tag

drivers/vhost/net.c: In function ‘vhost_net_tx_get_vq_desc’:
drivers/vhost/net.c:574:7: error: implicit declaration of function ‘vhost_get_vq_desc_batch’; did you mean ‘vhost_get_vq_desc’? [-Werror=implicit-function-declaration]
574 | r = vhost_get_vq_desc_batch(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
| ^~~~~~~~~~~~~~~~~~~~~~~
| vhost_get_vq_desc


2020-01-07 09:41:19

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

On Tue, Jan 07, 2020 at 09:59:16AM +0100, Christian Borntraeger wrote:
>
>
> On 06.01.20 11:50, Michael S. Tsirkin wrote:
> > On Wed, Dec 18, 2019 at 04:59:02PM +0100, Christian Borntraeger wrote:
> >> On 18.12.19 16:10, Michael S. Tsirkin wrote:
> >>> On Wed, Dec 18, 2019 at 03:43:43PM +0100, Christian Borntraeger wrote:
> >>>> Michael,
> >>>>
> >>>> with
> >>>> commit db7286b100b503ef80612884453bed53d74c9a16 (refs/bisect/skip-db7286b100b503ef80612884453bed53d74c9a16)
> >>>> vhost: use batched version by default
> >>>> plus
> >>>> commit 6bd262d5eafcdf8cdfae491e2e748e4e434dcda6 (HEAD, refs/bisect/bad)
> >>>> Revert "vhost/net: add an option to test new code"
> >>>> to make things compile (your next tree is not easily bisectable, can you fix that as well?).
> >>>
> >>> I'll try.
> >>>
> >>>>
> >>>> I get random crashes in my s390 KVM guests after reboot.
> >>>> Reverting both patches together with commit decd9b8 "vhost: use vhost_desc instead of vhost_log" to
> >>>> make it compile again) on top of linux-next-1218 makes the problem go away.
> >>>>
> >>>> Looks like the batched version is not yet ready for prime time. Can you drop these patches until
> >>>> we have fixed the issues?
> >>>>
> >>>> Christian
> >>>>
> >>>
> >>> Will do, thanks for letting me know.
> >>
> >> I have confirmed with the initial reporter (internal test team) that <driver name='qemu'/>
> >> with a known to be broken linux next kernel also fixes the problem, so it is really the
> >> vhost changes.
> >
> > OK I'm back and trying to make it more bisectable.
> >
> > I pushed a new tag "batch-v2".
> > It's same code but with this bisect should get more information.
>
> I get the following with this tag
>
> drivers/vhost/net.c: In function ‘vhost_net_tx_get_vq_desc’:
> drivers/vhost/net.c:574:7: error: implicit declaration of function ‘vhost_get_vq_desc_batch’; did you mean ‘vhost_get_vq_desc’? [-Werror=implicit-function-declaration]
> 574 | r = vhost_get_vq_desc_batch(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
> | ^~~~~~~~~~~~~~~~~~~~~~~
> | vhost_get_vq_desc
>

Not sure why but I pushed a wrong commit. Sorry. Should be good now.

--
MST

2020-01-07 11:36:03

by Christian Borntraeger

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot



On 07.01.20 10:39, Michael S. Tsirkin wrote:
> On Tue, Jan 07, 2020 at 09:59:16AM +0100, Christian Borntraeger wrote:
>>
>>
>> On 06.01.20 11:50, Michael S. Tsirkin wrote:
>>> On Wed, Dec 18, 2019 at 04:59:02PM +0100, Christian Borntraeger wrote:
>>>> On 18.12.19 16:10, Michael S. Tsirkin wrote:
>>>>> On Wed, Dec 18, 2019 at 03:43:43PM +0100, Christian Borntraeger wrote:
>>>>>> Michael,
>>>>>>
>>>>>> with
>>>>>> commit db7286b100b503ef80612884453bed53d74c9a16 (refs/bisect/skip-db7286b100b503ef80612884453bed53d74c9a16)
>>>>>> vhost: use batched version by default
>>>>>> plus
>>>>>> commit 6bd262d5eafcdf8cdfae491e2e748e4e434dcda6 (HEAD, refs/bisect/bad)
>>>>>> Revert "vhost/net: add an option to test new code"
>>>>>> to make things compile (your next tree is not easily bisectable, can you fix that as well?).
>>>>>
>>>>> I'll try.
>>>>>
>>>>>>
>>>>>> I get random crashes in my s390 KVM guests after reboot.
>>>>>> Reverting both patches together with commit decd9b8 "vhost: use vhost_desc instead of vhost_log" to
>>>>>> make it compile again) on top of linux-next-1218 makes the problem go away.
>>>>>>
>>>>>> Looks like the batched version is not yet ready for prime time. Can you drop these patches until
>>>>>> we have fixed the issues?
>>>>>>
>>>>>> Christian
>>>>>>
>>>>>
>>>>> Will do, thanks for letting me know.
>>>>
>>>> I have confirmed with the initial reporter (internal test team) that <driver name='qemu'/>
>>>> with a known to be broken linux next kernel also fixes the problem, so it is really the
>>>> vhost changes.
>>>
>>> OK I'm back and trying to make it more bisectable.
>>>
>>> I pushed a new tag "batch-v2".
>>> It's same code but with this bisect should get more information.
>>
>> I get the following with this tag
>>
>> drivers/vhost/net.c: In function ‘vhost_net_tx_get_vq_desc’:
>> drivers/vhost/net.c:574:7: error: implicit declaration of function ‘vhost_get_vq_desc_batch’; did you mean ‘vhost_get_vq_desc’? [-Werror=implicit-function-declaration]
>> 574 | r = vhost_get_vq_desc_batch(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
>> | ^~~~~~~~~~~~~~~~~~~~~~~
>> | vhost_get_vq_desc
>>
>
> Not sure why but I pushed a wrong commit. Sorry. Should be good now.
>

during bisect:

drivers/vhost/vhost.c: In function ‘vhost_get_vq_desc_batch’:
drivers/vhost/vhost.c:2634:8: error: ‘id’ undeclared (first use in this function); did you mean ‘i’?
2634 | ret = id;
| ^~
| i

I changed that to i


The last step then gave me (on commit 50297a8480b439efc5f3f23088cb2d90b799acef vhost: use batched version by default)
net enc1: Unexpected TXQ (0) queue failure: -5
in the guest.

bisect log so far:
[cborntra@m83lp52 linux]$ git bisect log
git bisect start
# bad: [3131e79bb9e9892a5a6bd33513de9bc90b20e867] vhost: use vhost_desc instead of vhost_log
git bisect bad 3131e79bb9e9892a5a6bd33513de9bc90b20e867
# good: [d1281e3a562ec6a08f944a876481dd043ba739b9] virtio-blk: remove VIRTIO_BLK_F_SCSI support
git bisect good d1281e3a562ec6a08f944a876481dd043ba739b9
# good: [5b00aab5b6332a67e32dace1dcd3a198ab94ed56] vhost: option to fetch descriptors through an independent struct
git bisect good 5b00aab5b6332a67e32dace1dcd3a198ab94ed56
# good: [5b00aab5b6332a67e32dace1dcd3a198ab94ed56] vhost: option to fetch descriptors through an independent struct
git bisect good 5b00aab5b6332a67e32dace1dcd3a198ab94ed56
# bad: [1414d7ee3d10d2ec2bc4ee652d1d90ec91da1c79] vhost: batching fetches
git bisect bad 1414d7ee3d10d2ec2bc4ee652d1d90ec91da1c79




2020-01-07 11:56:57

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

On Tue, Jan 07, 2020 at 12:34:50PM +0100, Christian Borntraeger wrote:
>
>
> On 07.01.20 10:39, Michael S. Tsirkin wrote:
> > On Tue, Jan 07, 2020 at 09:59:16AM +0100, Christian Borntraeger wrote:
> >>
> >>
> >> On 06.01.20 11:50, Michael S. Tsirkin wrote:
> >>> On Wed, Dec 18, 2019 at 04:59:02PM +0100, Christian Borntraeger wrote:
> >>>> On 18.12.19 16:10, Michael S. Tsirkin wrote:
> >>>>> On Wed, Dec 18, 2019 at 03:43:43PM +0100, Christian Borntraeger wrote:
> >>>>>> Michael,
> >>>>>>
> >>>>>> with
> >>>>>> commit db7286b100b503ef80612884453bed53d74c9a16 (refs/bisect/skip-db7286b100b503ef80612884453bed53d74c9a16)
> >>>>>> vhost: use batched version by default
> >>>>>> plus
> >>>>>> commit 6bd262d5eafcdf8cdfae491e2e748e4e434dcda6 (HEAD, refs/bisect/bad)
> >>>>>> Revert "vhost/net: add an option to test new code"
> >>>>>> to make things compile (your next tree is not easily bisectable, can you fix that as well?).
> >>>>>
> >>>>> I'll try.
> >>>>>
> >>>>>>
> >>>>>> I get random crashes in my s390 KVM guests after reboot.
> >>>>>> Reverting both patches together with commit decd9b8 "vhost: use vhost_desc instead of vhost_log" to
> >>>>>> make it compile again) on top of linux-next-1218 makes the problem go away.
> >>>>>>
> >>>>>> Looks like the batched version is not yet ready for prime time. Can you drop these patches until
> >>>>>> we have fixed the issues?
> >>>>>>
> >>>>>> Christian
> >>>>>>
> >>>>>
> >>>>> Will do, thanks for letting me know.
> >>>>
> >>>> I have confirmed with the initial reporter (internal test team) that <driver name='qemu'/>
> >>>> with a known to be broken linux next kernel also fixes the problem, so it is really the
> >>>> vhost changes.
> >>>
> >>> OK I'm back and trying to make it more bisectable.
> >>>
> >>> I pushed a new tag "batch-v2".
> >>> It's same code but with this bisect should get more information.
> >>
> >> I get the following with this tag
> >>
> >> drivers/vhost/net.c: In function ‘vhost_net_tx_get_vq_desc’:
> >> drivers/vhost/net.c:574:7: error: implicit declaration of function ‘vhost_get_vq_desc_batch’; did you mean ‘vhost_get_vq_desc’? [-Werror=implicit-function-declaration]
> >> 574 | r = vhost_get_vq_desc_batch(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
> >> | ^~~~~~~~~~~~~~~~~~~~~~~
> >> | vhost_get_vq_desc
> >>
> >
> > Not sure why but I pushed a wrong commit. Sorry. Should be good now.
> >
>
> during bisect:
>
> drivers/vhost/vhost.c: In function ‘vhost_get_vq_desc_batch’:
> drivers/vhost/vhost.c:2634:8: error: ‘id’ undeclared (first use in this function); did you mean ‘i’?
> 2634 | ret = id;
> | ^~
> | i
>
> I changed that to i
>
>
> The last step then gave me (on commit 50297a8480b439efc5f3f23088cb2d90b799acef vhost: use batched version by default)
> net enc1: Unexpected TXQ (0) queue failure: -5
> in the guest.
>
> bisect log so far:
> [cborntra@m83lp52 linux]$ git bisect log
> git bisect start
> # bad: [3131e79bb9e9892a5a6bd33513de9bc90b20e867] vhost: use vhost_desc instead of vhost_log
> git bisect bad 3131e79bb9e9892a5a6bd33513de9bc90b20e867
> # good: [d1281e3a562ec6a08f944a876481dd043ba739b9] virtio-blk: remove VIRTIO_BLK_F_SCSI support
> git bisect good d1281e3a562ec6a08f944a876481dd043ba739b9
> # good: [5b00aab5b6332a67e32dace1dcd3a198ab94ed56] vhost: option to fetch descriptors through an independent struct
> git bisect good 5b00aab5b6332a67e32dace1dcd3a198ab94ed56
> # good: [5b00aab5b6332a67e32dace1dcd3a198ab94ed56] vhost: option to fetch descriptors through an independent struct
> git bisect good 5b00aab5b6332a67e32dace1dcd3a198ab94ed56
> # bad: [1414d7ee3d10d2ec2bc4ee652d1d90ec91da1c79] vhost: batching fetches
> git bisect bad 1414d7ee3d10d2ec2bc4ee652d1d90ec91da1c79
>
>

I pushed batched-v3 - same head but bisect should work now.

--
MST

2020-01-07 12:10:02

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

On Tue, Jan 07, 2020 at 12:34:50PM +0100, Christian Borntraeger wrote:
>
>
> On 07.01.20 10:39, Michael S. Tsirkin wrote:
> > On Tue, Jan 07, 2020 at 09:59:16AM +0100, Christian Borntraeger wrote:
> >>
> >>
> >> On 06.01.20 11:50, Michael S. Tsirkin wrote:
> >>> On Wed, Dec 18, 2019 at 04:59:02PM +0100, Christian Borntraeger wrote:
> >>>> On 18.12.19 16:10, Michael S. Tsirkin wrote:
> >>>>> On Wed, Dec 18, 2019 at 03:43:43PM +0100, Christian Borntraeger wrote:
> >>>>>> Michael,
> >>>>>>
> >>>>>> with
> >>>>>> commit db7286b100b503ef80612884453bed53d74c9a16 (refs/bisect/skip-db7286b100b503ef80612884453bed53d74c9a16)
> >>>>>> vhost: use batched version by default
> >>>>>> plus
> >>>>>> commit 6bd262d5eafcdf8cdfae491e2e748e4e434dcda6 (HEAD, refs/bisect/bad)
> >>>>>> Revert "vhost/net: add an option to test new code"
> >>>>>> to make things compile (your next tree is not easily bisectable, can you fix that as well?).
> >>>>>
> >>>>> I'll try.
> >>>>>
> >>>>>>
> >>>>>> I get random crashes in my s390 KVM guests after reboot.
> >>>>>> Reverting both patches together with commit decd9b8 "vhost: use vhost_desc instead of vhost_log" to
> >>>>>> make it compile again) on top of linux-next-1218 makes the problem go away.
> >>>>>>
> >>>>>> Looks like the batched version is not yet ready for prime time. Can you drop these patches until
> >>>>>> we have fixed the issues?
> >>>>>>
> >>>>>> Christian
> >>>>>>
> >>>>>
> >>>>> Will do, thanks for letting me know.
> >>>>
> >>>> I have confirmed with the initial reporter (internal test team) that <driver name='qemu'/>
> >>>> with a known to be broken linux next kernel also fixes the problem, so it is really the
> >>>> vhost changes.
> >>>
> >>> OK I'm back and trying to make it more bisectable.
> >>>
> >>> I pushed a new tag "batch-v2".
> >>> It's same code but with this bisect should get more information.
> >>
> >> I get the following with this tag
> >>
> >> drivers/vhost/net.c: In function ‘vhost_net_tx_get_vq_desc’:
> >> drivers/vhost/net.c:574:7: error: implicit declaration of function ‘vhost_get_vq_desc_batch’; did you mean ‘vhost_get_vq_desc’? [-Werror=implicit-function-declaration]
> >> 574 | r = vhost_get_vq_desc_batch(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
> >> | ^~~~~~~~~~~~~~~~~~~~~~~
> >> | vhost_get_vq_desc
> >>
> >
> > Not sure why but I pushed a wrong commit. Sorry. Should be good now.
> >
>
> during bisect:
>
> drivers/vhost/vhost.c: In function ‘vhost_get_vq_desc_batch’:
> drivers/vhost/vhost.c:2634:8: error: ‘id’ undeclared (first use in this function); did you mean ‘i’?
> 2634 | ret = id;
> | ^~
> | i
>
> I changed that to i

Hmm no that's wrong I think. Sorry about all the errors. Let me push a
fixed v3.

>
> The last step then gave me (on commit 50297a8480b439efc5f3f23088cb2d90b799acef vhost: use batched version by default)
> net enc1: Unexpected TXQ (0) queue failure: -5
> in the guest.
>
> bisect log so far:
> [cborntra@m83lp52 linux]$ git bisect log
> git bisect start
> # bad: [3131e79bb9e9892a5a6bd33513de9bc90b20e867] vhost: use vhost_desc instead of vhost_log
> git bisect bad 3131e79bb9e9892a5a6bd33513de9bc90b20e867
> # good: [d1281e3a562ec6a08f944a876481dd043ba739b9] virtio-blk: remove VIRTIO_BLK_F_SCSI support
> git bisect good d1281e3a562ec6a08f944a876481dd043ba739b9
> # good: [5b00aab5b6332a67e32dace1dcd3a198ab94ed56] vhost: option to fetch descriptors through an independent struct
> git bisect good 5b00aab5b6332a67e32dace1dcd3a198ab94ed56
> # good: [5b00aab5b6332a67e32dace1dcd3a198ab94ed56] vhost: option to fetch descriptors through an independent struct
> git bisect good 5b00aab5b6332a67e32dace1dcd3a198ab94ed56
> # bad: [1414d7ee3d10d2ec2bc4ee652d1d90ec91da1c79] vhost: batching fetches
> git bisect bad 1414d7ee3d10d2ec2bc4ee652d1d90ec91da1c79
>
>
>

2020-01-07 12:18:01

by Christian Borntraeger

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

On 07.01.20 12:55, Michael S. Tsirkin wrote:

>
> I pushed batched-v3 - same head but bisect should work now.
>

With
commit 38ced0208491103b50f1056f0d1c8f28e2e13d08 (HEAD)
Author: Michael S. Tsirkin <[email protected]>
AuthorDate: Wed Dec 11 12:19:26 2019 -0500
Commit: Michael S. Tsirkin <[email protected]>
CommitDate: Tue Jan 7 06:52:42 2020 -0500

vhost: use batched version by default


I have exactly one successful ping and then the network inside the guest is broken (no packet
anymore).

So you could consider this commit broken (but in a different way and also without any
guest reboot necessary).


bisect log:
git bisect start
# bad: [d2f6175f52062ee51ee69754a6925608213475d2] vhost: use vhost_desc instead of vhost_log
git bisect bad d2f6175f52062ee51ee69754a6925608213475d2
# good: [d1281e3a562ec6a08f944a876481dd043ba739b9] virtio-blk: remove VIRTIO_BLK_F_SCSI support
git bisect good d1281e3a562ec6a08f944a876481dd043ba739b9
# good: [fac7c0f46996e32d996f5c46121df24a6b95ec3b] vhost: option to fetch descriptors through an independent struct
git bisect good fac7c0f46996e32d996f5c46121df24a6b95ec3b
# bad: [539eb9d738f048cd7be61f404e8f9c7d9d2ff3cc] vhost: batching fetches
git bisect bad 539eb9d738f048cd7be61f404e8f9c7d9d2ff3cc

2020-01-20 06:29:06

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

On Tue, Jan 07, 2020 at 01:16:50PM +0100, Christian Borntraeger wrote:
> On 07.01.20 12:55, Michael S. Tsirkin wrote:
>
> >
> > I pushed batched-v3 - same head but bisect should work now.
> >
>
> With
> commit 38ced0208491103b50f1056f0d1c8f28e2e13d08 (HEAD)
> Author: Michael S. Tsirkin <[email protected]>
> AuthorDate: Wed Dec 11 12:19:26 2019 -0500
> Commit: Michael S. Tsirkin <[email protected]>
> CommitDate: Tue Jan 7 06:52:42 2020 -0500
>
> vhost: use batched version by default
>
>
> I have exactly one successful ping and then the network inside the guest is broken (no packet
> anymore).

Does anything appear in host's dmesg when this happens?


> So you could consider this commit broken (but in a different way and also without any
> guest reboot necessary).
>
>
> bisect log:
> git bisect start
> # bad: [d2f6175f52062ee51ee69754a6925608213475d2] vhost: use vhost_desc instead of vhost_log
> git bisect bad d2f6175f52062ee51ee69754a6925608213475d2
> # good: [d1281e3a562ec6a08f944a876481dd043ba739b9] virtio-blk: remove VIRTIO_BLK_F_SCSI support
> git bisect good d1281e3a562ec6a08f944a876481dd043ba739b9
> # good: [fac7c0f46996e32d996f5c46121df24a6b95ec3b] vhost: option to fetch descriptors through an independent struct
> git bisect good fac7c0f46996e32d996f5c46121df24a6b95ec3b
> # bad: [539eb9d738f048cd7be61f404e8f9c7d9d2ff3cc] vhost: batching fetches
> git bisect bad 539eb9d738f048cd7be61f404e8f9c7d9d2ff3cc

2020-01-22 19:33:40

by Christian Borntraeger

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot



On 20.01.20 07:27, Michael S. Tsirkin wrote:
> On Tue, Jan 07, 2020 at 01:16:50PM +0100, Christian Borntraeger wrote:
>> On 07.01.20 12:55, Michael S. Tsirkin wrote:
>>
>>>
>>> I pushed batched-v3 - same head but bisect should work now.
>>>
>>
>> With
>> commit 38ced0208491103b50f1056f0d1c8f28e2e13d08 (HEAD)
>> Author: Michael S. Tsirkin <[email protected]>
>> AuthorDate: Wed Dec 11 12:19:26 2019 -0500
>> Commit: Michael S. Tsirkin <[email protected]>
>> CommitDate: Tue Jan 7 06:52:42 2020 -0500
>>
>> vhost: use batched version by default
>>
>>
>> I have exactly one successful ping and then the network inside the guest is broken (no packet
>> anymore).
>
> Does anything appear in host's dmesg when this happens?

I think there was nothing, but I am not sure. I would need to redo the test if this is important to know.

>
>
>> So you could consider this commit broken (but in a different way and also without any
>> guest reboot necessary).
>>
>>
>> bisect log:
>> git bisect start
>> # bad: [d2f6175f52062ee51ee69754a6925608213475d2] vhost: use vhost_desc instead of vhost_log
>> git bisect bad d2f6175f52062ee51ee69754a6925608213475d2
>> # good: [d1281e3a562ec6a08f944a876481dd043ba739b9] virtio-blk: remove VIRTIO_BLK_F_SCSI support
>> git bisect good d1281e3a562ec6a08f944a876481dd043ba739b9
>> # good: [fac7c0f46996e32d996f5c46121df24a6b95ec3b] vhost: option to fetch descriptors through an independent struct
>> git bisect good fac7c0f46996e32d996f5c46121df24a6b95ec3b
>> # bad: [539eb9d738f048cd7be61f404e8f9c7d9d2ff3cc] vhost: batching fetches
>> git bisect bad 539eb9d738f048cd7be61f404e8f9c7d9d2ff3cc
>

2020-02-06 14:24:24

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

Hi Christian.

Could you try this patch on top of ("38ced0208491 vhost: use batched version by default")?

It will not solve your first random crash but it should help with the lost of network connectivity.

Please let me know how does it goes.

Thanks!

From 99f0f543f3939dbe803988c9153a95616ccccacd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Eugenio=20P=C3=A9rez?= <[email protected]>
Date: Thu, 6 Feb 2020 15:13:42 +0100
Subject: [PATCH] vhost: filter valid vhost descriptors flags
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Previous commit copy _NEXT flag, and it complains if a copied descriptor
contains it.

Signed-off-by: Eugenio Pérez <[email protected]>
---
drivers/vhost/vhost.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 27ae5b4872a0..56c5253056ee 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2125,6 +2125,8 @@ static void pop_split_desc(struct vhost_virtqueue *vq)
--vq->ndescs;
}

+#define VHOST_DESC_FLAGS (VRING_DESC_F_INDIRECT | VRING_DESC_F_WRITE | \
+ VRING_DESC_F_NEXT)
static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc *desc, u16 id)
{
struct vhost_desc *h;
@@ -2134,7 +2136,7 @@ static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc *desc,
h = &vq->descs[vq->ndescs++];
h->addr = vhost64_to_cpu(vq, desc->addr);
h->len = vhost32_to_cpu(vq, desc->len);
- h->flags = vhost16_to_cpu(vq, desc->flags);
+ h->flags = vhost16_to_cpu(vq, desc->flags) & VHOST_DESC_FLAGS;
h->id = id;

return 0;
@@ -2343,7 +2345,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
struct vhost_desc *desc = &vq->descs[i];
int access;

- if (desc->flags & ~(VRING_DESC_F_INDIRECT | VRING_DESC_F_WRITE)) {
+ if (desc->flags & ~VHOST_DESC_FLAGS) {
vq_err(vq, "Unexpected flags: 0x%x at descriptor id 0x%x\n",
desc->flags, desc->id);
ret = -EINVAL;
--
2.18.1


On Wed, 2020-01-22 at 20:32 +0100, Christian Borntraeger wrote:
>
> On 20.01.20 07:27, Michael S. Tsirkin wrote:
> > On Tue, Jan 07, 2020 at 01:16:50PM +0100, Christian Borntraeger wrote:
> > > On 07.01.20 12:55, Michael S. Tsirkin wrote:
> > >
> > > > I pushed batched-v3 - same head but bisect should work now.
> > > >
> > >
> > > With
> > > commit 38ced0208491103b50f1056f0d1c8f28e2e13d08 (HEAD)
> > > Author: Michael S. Tsirkin <[email protected]>
> > > AuthorDate: Wed Dec 11 12:19:26 2019 -0500
> > > Commit: Michael S. Tsirkin <[email protected]>
> > > CommitDate: Tue Jan 7 06:52:42 2020 -0500
> > >
> > > vhost: use batched version by default
> > >
> > >
> > > I have exactly one successful ping and then the network inside the guest is broken (no packet
> > > anymore).
> >
> > Does anything appear in host's dmesg when this happens?
>
> I think there was nothing, but I am not sure. I would need to redo the test if this is important to know.
>
> >
> > > So you could consider this commit broken (but in a different way and also without any
> > > guest reboot necessary).
> > >
> > >
> > > bisect log:
> > > git bisect start
> > > # bad: [d2f6175f52062ee51ee69754a6925608213475d2] vhost: use vhost_desc instead of vhost_log
> > > git bisect bad d2f6175f52062ee51ee69754a6925608213475d2
> > > # good: [d1281e3a562ec6a08f944a876481dd043ba739b9] virtio-blk: remove VIRTIO_BLK_F_SCSI support
> > > git bisect good d1281e3a562ec6a08f944a876481dd043ba739b9
> > > # good: [fac7c0f46996e32d996f5c46121df24a6b95ec3b] vhost: option to fetch descriptors through an independent
> > > struct
> > > git bisect good fac7c0f46996e32d996f5c46121df24a6b95ec3b
> > > # bad: [539eb9d738f048cd7be61f404e8f9c7d9d2ff3cc] vhost: batching fetches
> > > git bisect bad 539eb9d738f048cd7be61f404e8f9c7d9d2ff3cc

2020-02-06 15:14:28

by Christian Borntraeger

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot



On 06.02.20 15:22, [email protected] wrote:
> Hi Christian.
>
> Could you try this patch on top of ("38ced0208491 vhost: use batched version by default")?
>
> It will not solve your first random crash but it should help with the lost of network connectivity.
>
> Please let me know how does it goes.


38ced0208491 + this seem to be ok.

Not sure if you can make out anything of this (and the previous git bisect log)

2020-02-06 22:08:57

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

On Thu, Feb 06, 2020 at 03:22:39PM +0100, [email protected] wrote:
> Hi Christian.
>
> Could you try this patch on top of ("38ced0208491 vhost: use batched version by default")?
>
> It will not solve your first random crash but it should help with the lost of network connectivity.
>
> Please let me know how does it goes.
>
> Thanks!
>
> >From 99f0f543f3939dbe803988c9153a95616ccccacd Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Eugenio=20P=C3=A9rez?= <[email protected]>
> Date: Thu, 6 Feb 2020 15:13:42 +0100
> Subject: [PATCH] vhost: filter valid vhost descriptors flags
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Previous commit copy _NEXT flag, and it complains if a copied descriptor
> contains it.
>
> Signed-off-by: Eugenio P?rez <[email protected]>
> ---
> drivers/vhost/vhost.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 27ae5b4872a0..56c5253056ee 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2125,6 +2125,8 @@ static void pop_split_desc(struct vhost_virtqueue *vq)
> --vq->ndescs;
> }
>
> +#define VHOST_DESC_FLAGS (VRING_DESC_F_INDIRECT | VRING_DESC_F_WRITE | \
> + VRING_DESC_F_NEXT)
> static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc *desc, u16 id)
> {
> struct vhost_desc *h;
> @@ -2134,7 +2136,7 @@ static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc *desc,
> h = &vq->descs[vq->ndescs++];
> h->addr = vhost64_to_cpu(vq, desc->addr);
> h->len = vhost32_to_cpu(vq, desc->len);
> - h->flags = vhost16_to_cpu(vq, desc->flags);
> + h->flags = vhost16_to_cpu(vq, desc->flags) & VHOST_DESC_FLAGS;
> h->id = id;
>
> return 0;



> @@ -2343,7 +2345,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
> struct vhost_desc *desc = &vq->descs[i];
> int access;
>
> - if (desc->flags & ~(VRING_DESC_F_INDIRECT | VRING_DESC_F_WRITE)) {
> + if (desc->flags & ~VHOST_DESC_FLAGS) {
> vq_err(vq, "Unexpected flags: 0x%x at descriptor id 0x%x\n",
> desc->flags, desc->id);
> ret = -EINVAL;
> --
> 2.18.1

Thanks for catching this!

Do we need the 1st chunk though?

It seems preferable to just muck with flags in 1 place, when we
validate them ...

>
> On Wed, 2020-01-22 at 20:32 +0100, Christian Borntraeger wrote:
> >
> > On 20.01.20 07:27, Michael S. Tsirkin wrote:
> > > On Tue, Jan 07, 2020 at 01:16:50PM +0100, Christian Borntraeger wrote:
> > > > On 07.01.20 12:55, Michael S. Tsirkin wrote:
> > > >
> > > > > I pushed batched-v3 - same head but bisect should work now.
> > > > >
> > > >
> > > > With
> > > > commit 38ced0208491103b50f1056f0d1c8f28e2e13d08 (HEAD)
> > > > Author: Michael S. Tsirkin <[email protected]>
> > > > AuthorDate: Wed Dec 11 12:19:26 2019 -0500
> > > > Commit: Michael S. Tsirkin <[email protected]>
> > > > CommitDate: Tue Jan 7 06:52:42 2020 -0500
> > > >
> > > > vhost: use batched version by default
> > > >
> > > >
> > > > I have exactly one successful ping and then the network inside the guest is broken (no packet
> > > > anymore).
> > >
> > > Does anything appear in host's dmesg when this happens?
> >
> > I think there was nothing, but I am not sure. I would need to redo the test if this is important to know.
> >
> > >
> > > > So you could consider this commit broken (but in a different way and also without any
> > > > guest reboot necessary).
> > > >
> > > >
> > > > bisect log:
> > > > git bisect start
> > > > # bad: [d2f6175f52062ee51ee69754a6925608213475d2] vhost: use vhost_desc instead of vhost_log
> > > > git bisect bad d2f6175f52062ee51ee69754a6925608213475d2
> > > > # good: [d1281e3a562ec6a08f944a876481dd043ba739b9] virtio-blk: remove VIRTIO_BLK_F_SCSI support
> > > > git bisect good d1281e3a562ec6a08f944a876481dd043ba739b9
> > > > # good: [fac7c0f46996e32d996f5c46121df24a6b95ec3b] vhost: option to fetch descriptors through an independent
> > > > struct
> > > > git bisect good fac7c0f46996e32d996f5c46121df24a6b95ec3b
> > > > # bad: [539eb9d738f048cd7be61f404e8f9c7d9d2ff3cc] vhost: batching fetches
> > > > git bisect bad 539eb9d738f048cd7be61f404e8f9c7d9d2ff3cc

2020-02-06 22:20:03

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

On Thu, Feb 06, 2020 at 04:12:21PM +0100, Christian Borntraeger wrote:
>
>
> On 06.02.20 15:22, [email protected] wrote:
> > Hi Christian.
> >
> > Could you try this patch on top of ("38ced0208491 vhost: use batched version by default")?
> >
> > It will not solve your first random crash but it should help with the lost of network connectivity.
> >
> > Please let me know how does it goes.
>
>
> 38ced0208491 + this seem to be ok.
>
> Not sure if you can make out anything of this (and the previous git bisect log)

Yes it does - that this is just bad split-up of patches, and there's
still a real bug that caused worse crashes :)

So I just pushed batch-v4.
I expect that will fail, and bisect to give us
vhost: batching fetches
Can you try that please?


2020-02-07 07:49:45

by Christian Borntraeger

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

Also adding Cornelia.


On 06.02.20 23:17, Michael S. Tsirkin wrote:
> On Thu, Feb 06, 2020 at 04:12:21PM +0100, Christian Borntraeger wrote:
>>
>>
>> On 06.02.20 15:22, [email protected] wrote:
>>> Hi Christian.
>>>
>>> Could you try this patch on top of ("38ced0208491 vhost: use batched version by default")?
>>>
>>> It will not solve your first random crash but it should help with the lost of network connectivity.
>>>
>>> Please let me know how does it goes.
>>
>>
>> 38ced0208491 + this seem to be ok.
>>
>> Not sure if you can make out anything of this (and the previous git bisect log)
>
> Yes it does - that this is just bad split-up of patches, and there's
> still a real bug that caused worse crashes :)
>
> So I just pushed batch-v4.
> I expect that will fail, and bisect to give us
> vhost: batching fetches
> Can you try that please?
>

yes.

eccb852f1fe6bede630e2e4f1a121a81e34354ab is the first bad commit
commit eccb852f1fe6bede630e2e4f1a121a81e34354ab
Author: Michael S. Tsirkin <[email protected]>
Date: Mon Oct 7 06:11:18 2019 -0400

vhost: batching fetches

With this patch applied, new and old code perform identically.

Lots of extra optimizations are now possible, e.g.
we can fetch multiple heads with copy_from/to_user now.
We can get rid of maintaining the log array. Etc etc.

Signed-off-by: Michael S. Tsirkin <[email protected]>

drivers/vhost/test.c | 2 +-
drivers/vhost/vhost.c | 39 ++++++++++++++++++++++++++++++++++-----
drivers/vhost/vhost.h | 4 +++-
3 files changed, 38 insertions(+), 7 deletions(-)


>


2020-02-07 08:00:12

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

On Fri, Feb 07, 2020 at 08:47:14AM +0100, Christian Borntraeger wrote:
> Also adding Cornelia.
>
>
> On 06.02.20 23:17, Michael S. Tsirkin wrote:
> > On Thu, Feb 06, 2020 at 04:12:21PM +0100, Christian Borntraeger wrote:
> >>
> >>
> >> On 06.02.20 15:22, [email protected] wrote:
> >>> Hi Christian.
> >>>
> >>> Could you try this patch on top of ("38ced0208491 vhost: use batched version by default")?
> >>>
> >>> It will not solve your first random crash but it should help with the lost of network connectivity.
> >>>
> >>> Please let me know how does it goes.
> >>
> >>
> >> 38ced0208491 + this seem to be ok.
> >>
> >> Not sure if you can make out anything of this (and the previous git bisect log)
> >
> > Yes it does - that this is just bad split-up of patches, and there's
> > still a real bug that caused worse crashes :)
> >
> > So I just pushed batch-v4.
> > I expect that will fail, and bisect to give us
> > vhost: batching fetches
> > Can you try that please?
> >
>
> yes.
>
> eccb852f1fe6bede630e2e4f1a121a81e34354ab is the first bad commit
> commit eccb852f1fe6bede630e2e4f1a121a81e34354ab
> Author: Michael S. Tsirkin <[email protected]>
> Date: Mon Oct 7 06:11:18 2019 -0400
>
> vhost: batching fetches
>
> With this patch applied, new and old code perform identically.
>
> Lots of extra optimizations are now possible, e.g.
> we can fetch multiple heads with copy_from/to_user now.
> We can get rid of maintaining the log array. Etc etc.
>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
>
> drivers/vhost/test.c | 2 +-
> drivers/vhost/vhost.c | 39 ++++++++++++++++++++++++++++++++++-----
> drivers/vhost/vhost.h | 4 +++-
> 3 files changed, 38 insertions(+), 7 deletions(-)
>


And the symptom is still the same - random crashes
after a bit of traffic, right?

> >
>

2020-02-07 08:14:47

by Christian Borntraeger

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot



On 07.02.20 08:58, Michael S. Tsirkin wrote:
> On Fri, Feb 07, 2020 at 08:47:14AM +0100, Christian Borntraeger wrote:
>> Also adding Cornelia.
>>
>>
>> On 06.02.20 23:17, Michael S. Tsirkin wrote:
>>> On Thu, Feb 06, 2020 at 04:12:21PM +0100, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 06.02.20 15:22, [email protected] wrote:
>>>>> Hi Christian.
>>>>>
>>>>> Could you try this patch on top of ("38ced0208491 vhost: use batched version by default")?
>>>>>
>>>>> It will not solve your first random crash but it should help with the lost of network connectivity.
>>>>>
>>>>> Please let me know how does it goes.
>>>>
>>>>
>>>> 38ced0208491 + this seem to be ok.
>>>>
>>>> Not sure if you can make out anything of this (and the previous git bisect log)
>>>
>>> Yes it does - that this is just bad split-up of patches, and there's
>>> still a real bug that caused worse crashes :)
>>>
>>> So I just pushed batch-v4.
>>> I expect that will fail, and bisect to give us
>>> vhost: batching fetches
>>> Can you try that please?
>>>
>>
>> yes.
>>
>> eccb852f1fe6bede630e2e4f1a121a81e34354ab is the first bad commit
>> commit eccb852f1fe6bede630e2e4f1a121a81e34354ab
>> Author: Michael S. Tsirkin <[email protected]>
>> Date: Mon Oct 7 06:11:18 2019 -0400
>>
>> vhost: batching fetches
>>
>> With this patch applied, new and old code perform identically.
>>
>> Lots of extra optimizations are now possible, e.g.
>> we can fetch multiple heads with copy_from/to_user now.
>> We can get rid of maintaining the log array. Etc etc.
>>
>> Signed-off-by: Michael S. Tsirkin <[email protected]>
>>
>> drivers/vhost/test.c | 2 +-
>> drivers/vhost/vhost.c | 39 ++++++++++++++++++++++++++++++++++-----
>> drivers/vhost/vhost.h | 4 +++-
>> 3 files changed, 38 insertions(+), 7 deletions(-)
>>
>
>
> And the symptom is still the same - random crashes
> after a bit of traffic, right?

random guest crashes after a reboot of the guests. As if vhost would still
write into now stale buffers.

2020-02-07 08:56:16

by Cornelia Huck

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

On Fri, 7 Feb 2020 09:13:14 +0100
Christian Borntraeger <[email protected]> wrote:

> On 07.02.20 08:58, Michael S. Tsirkin wrote:
> > On Fri, Feb 07, 2020 at 08:47:14AM +0100, Christian Borntraeger wrote:
> >> Also adding Cornelia.
> >>
> >>
> >> On 06.02.20 23:17, Michael S. Tsirkin wrote:
> >>> On Thu, Feb 06, 2020 at 04:12:21PM +0100, Christian Borntraeger wrote:
> >>>>
> >>>>
> >>>> On 06.02.20 15:22, [email protected] wrote:
> >>>>> Hi Christian.
> >>>>>
> >>>>> Could you try this patch on top of ("38ced0208491 vhost: use batched version by default")?
> >>>>>
> >>>>> It will not solve your first random crash but it should help with the lost of network connectivity.
> >>>>>
> >>>>> Please let me know how does it goes.
> >>>>
> >>>>
> >>>> 38ced0208491 + this seem to be ok.
> >>>>
> >>>> Not sure if you can make out anything of this (and the previous git bisect log)
> >>>
> >>> Yes it does - that this is just bad split-up of patches, and there's
> >>> still a real bug that caused worse crashes :)
> >>>
> >>> So I just pushed batch-v4.
> >>> I expect that will fail, and bisect to give us
> >>> vhost: batching fetches
> >>> Can you try that please?
> >>>
> >>
> >> yes.
> >>
> >> eccb852f1fe6bede630e2e4f1a121a81e34354ab is the first bad commit
> >> commit eccb852f1fe6bede630e2e4f1a121a81e34354ab
> >> Author: Michael S. Tsirkin <[email protected]>
> >> Date: Mon Oct 7 06:11:18 2019 -0400
> >>
> >> vhost: batching fetches
> >>
> >> With this patch applied, new and old code perform identically.
> >>
> >> Lots of extra optimizations are now possible, e.g.
> >> we can fetch multiple heads with copy_from/to_user now.
> >> We can get rid of maintaining the log array. Etc etc.
> >>
> >> Signed-off-by: Michael S. Tsirkin <[email protected]>
> >>
> >> drivers/vhost/test.c | 2 +-
> >> drivers/vhost/vhost.c | 39 ++++++++++++++++++++++++++++++++++-----
> >> drivers/vhost/vhost.h | 4 +++-
> >> 3 files changed, 38 insertions(+), 7 deletions(-)
> >>
> >
> >
> > And the symptom is still the same - random crashes
> > after a bit of traffic, right?
>
> random guest crashes after a reboot of the guests. As if vhost would still
> write into now stale buffers.
>

I'm late to the party; but where is that commit located? Or has it been
dropped again already?

2020-02-07 10:09:12

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: vhost changes (batched) in linux-next after 12/13 trigger random crashes in KVM guests after reboot

On Fri, Feb 07, 2020 at 09:53:53AM +0100, Cornelia Huck wrote:
> On Fri, 7 Feb 2020 09:13:14 +0100
> Christian Borntraeger <[email protected]> wrote:
>
> > On 07.02.20 08:58, Michael S. Tsirkin wrote:
> > > On Fri, Feb 07, 2020 at 08:47:14AM +0100, Christian Borntraeger wrote:
> > >> Also adding Cornelia.
> > >>
> > >>
> > >> On 06.02.20 23:17, Michael S. Tsirkin wrote:
> > >>> On Thu, Feb 06, 2020 at 04:12:21PM +0100, Christian Borntraeger wrote:
> > >>>>
> > >>>>
> > >>>> On 06.02.20 15:22, [email protected] wrote:
> > >>>>> Hi Christian.
> > >>>>>
> > >>>>> Could you try this patch on top of ("38ced0208491 vhost: use batched version by default")?
> > >>>>>
> > >>>>> It will not solve your first random crash but it should help with the lost of network connectivity.
> > >>>>>
> > >>>>> Please let me know how does it goes.
> > >>>>
> > >>>>
> > >>>> 38ced0208491 + this seem to be ok.
> > >>>>
> > >>>> Not sure if you can make out anything of this (and the previous git bisect log)
> > >>>
> > >>> Yes it does - that this is just bad split-up of patches, and there's
> > >>> still a real bug that caused worse crashes :)
> > >>>
> > >>> So I just pushed batch-v4.
> > >>> I expect that will fail, and bisect to give us
> > >>> vhost: batching fetches
> > >>> Can you try that please?
> > >>>
> > >>
> > >> yes.
> > >>
> > >> eccb852f1fe6bede630e2e4f1a121a81e34354ab is the first bad commit
> > >> commit eccb852f1fe6bede630e2e4f1a121a81e34354ab
> > >> Author: Michael S. Tsirkin <[email protected]>
> > >> Date: Mon Oct 7 06:11:18 2019 -0400
> > >>
> > >> vhost: batching fetches
> > >>
> > >> With this patch applied, new and old code perform identically.
> > >>
> > >> Lots of extra optimizations are now possible, e.g.
> > >> we can fetch multiple heads with copy_from/to_user now.
> > >> We can get rid of maintaining the log array. Etc etc.
> > >>
> > >> Signed-off-by: Michael S. Tsirkin <[email protected]>
> > >>
> > >> drivers/vhost/test.c | 2 +-
> > >> drivers/vhost/vhost.c | 39 ++++++++++++++++++++++++++++++++++-----
> > >> drivers/vhost/vhost.h | 4 +++-
> > >> 3 files changed, 38 insertions(+), 7 deletions(-)
> > >>
> > >
> > >
> > > And the symptom is still the same - random crashes
> > > after a bit of traffic, right?
> >
> > random guest crashes after a reboot of the guests. As if vhost would still
> > write into now stale buffers.
> >
>
> I'm late to the party; but where is that commit located? Or has it been
> dropped again already?

my vhost tree. Tag batch-v4.