Hi All,
The commit 22540ca3d00d2990a4148a13b92209c3dc5422db causes a Windows KVM
guest running under QEMU with a VFIO passthrough GPU to randomly stall
when using the GPU leading to the guest assuming that the driver has
hung. Reverting this commit resolves the problem.
The host system is configured with the following kernel arguments which
may be related:
isolcpus=0-5,24-29,6-11,30-35 rcu_nocbs=0-5,24-29,6-11,30-35
The system is an AMD Threadripper 2970WX on a Gigabyte x399 AORUS Gaming
7 board.
It has two GPUs each being passed through to two separate KVM guests,
one is an AMD Radeon 7 in a Linux guest, the other is a GeForce 1080Ti
in a Windows guest.
The cores used for these two guests are isolated from the host for
performance reasons.
Any insight as to why this is occurring would be appreciated. If you
need any more information or would like to test patches please let me
know.
Kind Regards,
Geoffrey McRae
HostFission
https://hostfission.com
On 2020-07-26 23:30, Alex Williamson wrote:
> On Sun, 26 Jul 2020 17:49:07 +1000
> Geoffrey McRae <[email protected]> wrote:
>
>> Hi All,
>>
>> The commit 22540ca3d00d2990a4148a13b92209c3dc5422db causes a Windows
>> KVM
>> guest running under QEMU with a VFIO passthrough GPU to randomly stall
>> when using the GPU leading to the guest assuming that the driver has
>> hung. Reverting this commit resolves the problem.
>
> Please double check this commit ID, I can't find it in mainline or
> linux-next. Thanks,
>
> Alex
Confirmed:
https://github.com/torvalds/linux/commit/22540ca3d00d2990a4148a13b92209c3dc5422db
>
>> The host system is configured with the following kernel arguments
>> which
>> may be related:
>> isolcpus=0-5,24-29,6-11,30-35 rcu_nocbs=0-5,24-29,6-11,30-35
>>
>> The system is an AMD Threadripper 2970WX on a Gigabyte x399 AORUS
>> Gaming
>> 7 board.
>> It has two GPUs each being passed through to two separate KVM guests,
>> one is an AMD Radeon 7 in a Linux guest, the other is a GeForce 1080Ti
>> in a Windows guest.
>> The cores used for these two guests are isolated from the host for
>> performance reasons.
>>
>> Any insight as to why this is occurring would be appreciated. If you
>> need any more information or would like to test patches please let me
>> know.
>>
>> Kind Regards,
>> Geoffrey McRae
>> HostFission
>>
>> https://hostfission.com
>>
On Sun, 26 Jul 2020 17:49:07 +1000
Geoffrey McRae <[email protected]> wrote:
> Hi All,
>
> The commit 22540ca3d00d2990a4148a13b92209c3dc5422db causes a Windows KVM
> guest running under QEMU with a VFIO passthrough GPU to randomly stall
> when using the GPU leading to the guest assuming that the driver has
> hung. Reverting this commit resolves the problem.
Please double check this commit ID, I can't find it in mainline or
linux-next. Thanks,
Alex
> The host system is configured with the following kernel arguments which
> may be related:
> isolcpus=0-5,24-29,6-11,30-35 rcu_nocbs=0-5,24-29,6-11,30-35
>
> The system is an AMD Threadripper 2970WX on a Gigabyte x399 AORUS Gaming
> 7 board.
> It has two GPUs each being passed through to two separate KVM guests,
> one is an AMD Radeon 7 in a Linux guest, the other is a GeForce 1080Ti
> in a Windows guest.
> The cores used for these two guests are isolated from the host for
> performance reasons.
>
> Any insight as to why this is occurring would be appreciated. If you
> need any more information or would like to test patches please let me
> know.
>
> Kind Regards,
> Geoffrey McRae
> HostFission
>
> https://hostfission.com
>
On 2020-07-26 23:32, Geoffrey McRae wrote:
> On 2020-07-26 23:30, Alex Williamson wrote:
>> On Sun, 26 Jul 2020 17:49:07 +1000
>> Geoffrey McRae <[email protected]> wrote:
>>
>>> Hi All,
>>>
>>> The commit 22540ca3d00d2990a4148a13b92209c3dc5422db causes a Windows
>>> KVM
>>> guest running under QEMU with a VFIO passthrough GPU to randomly
>>> stall
>>> when using the GPU leading to the guest assuming that the driver has
>>> hung. Reverting this commit resolves the problem.
>>
>> Please double check this commit ID, I can't find it in mainline or
>> linux-next. Thanks,
>>
>> Alex
>
> Confirmed:
>
> https://github.com/torvalds/linux/commit/22540ca3d00d2990a4148a13b92209c3dc5422db
Sorry, I just noticed my error, it should be:
aa202f1f56960c60e7befaa0f49c72b8fa11b0a8
>
>>
>>> The host system is configured with the following kernel arguments
>>> which
>>> may be related:
>>> isolcpus=0-5,24-29,6-11,30-35 rcu_nocbs=0-5,24-29,6-11,30-35
>>>
>>> The system is an AMD Threadripper 2970WX on a Gigabyte x399 AORUS
>>> Gaming
>>> 7 board.
>>> It has two GPUs each being passed through to two separate KVM guests,
>>> one is an AMD Radeon 7 in a Linux guest, the other is a GeForce
>>> 1080Ti
>>> in a Windows guest.
>>> The cores used for these two guests are isolated from the host for
>>> performance reasons.
>>>
>>> Any insight as to why this is occurring would be appreciated. If you
>>> need any more information or would like to test patches please let me
>>> know.
>>>
>>> Kind Regards,
>>> Geoffrey McRae
>>> HostFission
>>>
>>> https://hostfission.com
>>>
Quoting Alex Williamson (2020-07-26 14:30:52)
> On Sun, 26 Jul 2020 17:49:07 +1000
> Geoffrey McRae <[email protected]> wrote:
>
> > Hi All,
> >
> > The commit 22540ca3d00d2990a4148a13b92209c3dc5422db causes a Windows KVM
> > guest running under QEMU with a VFIO passthrough GPU to randomly stall
> > when using the GPU leading to the guest assuming that the driver has
> > hung. Reverting this commit resolves the problem.
>
> Please double check this commit ID, I can't find it in mainline or
> linux-next. Thanks,
See commit aa202f1f5696 ("workqueue: don't use wq_select_unbound_cpu()
for bound works"). 22540ca3 is the cherry-pick into v5.4.26
-Chris