In our Armv8a server (qualcomm Amberwing, non VHE), after applying
Christoffer's timer optimizing patchset(Optimize arch timer register
handling), the guest is hang during kernel booting.
The error root cause might be as follows:
1. in kvm_arch_timer_handler, it reset vtimer->cnt_ctl with current
cntv_ctl register value. And then it missed some cases to update timer's
irq (irq.level) when kvm_timer_irq_can_fire() is false
2. It causes kvm_vcpu_check_block return 0 instead of -EINTR
kvm_vcpu_check_block
kvm_cpu_has_pending_timer
kvm_timer_is_pending
kvm_timer_should_fire
3. Thus, the kvm hyp code can not break the loop in kvm_vcpu_block (halt
poll process) and the guest is hang forever
Fixes: b103cc3f10c0 ("KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit")
Signed-off-by: Jia He <[email protected]>
---
virt/kvm/arm/arch_timer.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index f9555b1..bb86433 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -100,7 +100,6 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
vtimer = vcpu_vtimer(vcpu);
if (!vtimer->irq.level) {
- vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
if (kvm_timer_irq_can_fire(vtimer))
kvm_timer_update_irq(vcpu, true, vtimer);
}
--
2.7.4
Hi Jia,
On 13/12/17 07:00, Jia He wrote:
> In our Armv8a server (qualcomm Amberwing, non VHE), after applying
> Christoffer's timer optimizing patchset(Optimize arch timer register
> handling), the guest is hang during kernel booting.
>
> The error root cause might be as follows:
> 1. in kvm_arch_timer_handler, it reset vtimer->cnt_ctl with current
> cntv_ctl register value. And then it missed some cases to update timer's
> irq (irq.level) when kvm_timer_irq_can_fire() is false
> 2. It causes kvm_vcpu_check_block return 0 instead of -EINTR
> kvm_vcpu_check_block
> kvm_cpu_has_pending_timer
> kvm_timer_is_pending
> kvm_timer_should_fire
> 3. Thus, the kvm hyp code can not break the loop in kvm_vcpu_block (halt
> poll process) and the guest is hang forever
>
> Fixes: b103cc3f10c0 ("KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit")
> Signed-off-by: Jia He <[email protected]>
> ---
> virt/kvm/arm/arch_timer.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index f9555b1..bb86433 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -100,7 +100,6 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> vtimer = vcpu_vtimer(vcpu);
>
> if (!vtimer->irq.level) {
> - vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> if (kvm_timer_irq_can_fire(vtimer))
> kvm_timer_update_irq(vcpu, true, vtimer);
> }
>
Which patches are you looking at? The current code in mainline looks
like this:
vtimer = vcpu_vtimer(vcpu);
vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
if (kvm_timer_irq_can_fire(vtimer))
kvm_timer_update_irq(vcpu, true, vtimer);
I'd suggest you use mainline and report if this doesn't work.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
Marc,
On 13/12/17 09:56, Marc Zyngier wrote:
> Hi Jia,
>
> On 13/12/17 07:00, Jia He wrote:
>> In our Armv8a server (qualcomm Amberwing, non VHE), after applying
>> Christoffer's timer optimizing patchset(Optimize arch timer register
>> handling), the guest is hang during kernel booting.
>>
>> The error root cause might be as follows:
>> 1. in kvm_arch_timer_handler, it reset vtimer->cnt_ctl with current
>> cntv_ctl register value. And then it missed some cases to update timer's
>> irq (irq.level) when kvm_timer_irq_can_fire() is false
>> 2. It causes kvm_vcpu_check_block return 0 instead of -EINTR
>> kvm_vcpu_check_block
>> kvm_cpu_has_pending_timer
>> kvm_timer_is_pending
>> kvm_timer_should_fire
>> 3. Thus, the kvm hyp code can not break the loop in kvm_vcpu_block (halt
>> poll process) and the guest is hang forever
>>
>> Fixes: b103cc3f10c0 ("KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit")
>> Signed-off-by: Jia He <[email protected]>
>> ---
>> virt/kvm/arm/arch_timer.c | 1 -
>> 1 file changed, 1 deletion(-)
>>
>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>> index f9555b1..bb86433 100644
>> --- a/virt/kvm/arm/arch_timer.c
>> +++ b/virt/kvm/arm/arch_timer.c
>> @@ -100,7 +100,6 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>> vtimer = vcpu_vtimer(vcpu);
>>
>> if (!vtimer->irq.level) {
>> - vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
>> if (kvm_timer_irq_can_fire(vtimer))
>> kvm_timer_update_irq(vcpu, true, vtimer);
>> }
>>
>
> Which patches are you looking at? The current code in mainline looks
> like this:
>
> vtimer = vcpu_vtimer(vcpu);
>
> vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> if (kvm_timer_irq_can_fire(vtimer))
> kvm_timer_update_irq(vcpu, true, vtimer);
>
> I'd suggest you use mainline and report if this doesn't work
the removal of if (!vtimer->irq.level) test happened in:
[PATCH v7 3/8] KVM: arm/arm64: Don't cache the timer IRQ level
which is not upstream.
Thanks
Eric
>
> Thanks,
>
> M.
>
On Tue, Dec 12, 2017 at 11:00:07PM -0800, Jia He wrote:
> In our Armv8a server (qualcomm Amberwing, non VHE), after applying
> Christoffer's timer optimizing patchset(Optimize arch timer register
> handling), the guest is hang during kernel booting.
>
> The error root cause might be as follows:
> 1. in kvm_arch_timer_handler, it reset vtimer->cnt_ctl with current
> cntv_ctl register value. And then it missed some cases to update timer's
> irq (irq.level) when kvm_timer_irq_can_fire() is false
Why should it set the irq level to true when the timer cannot fire?
> 2. It causes kvm_vcpu_check_block return 0 instead of -EINTR
> kvm_vcpu_check_block
> kvm_cpu_has_pending_timer
> kvm_timer_is_pending
> kvm_timer_should_fire
> 3. Thus, the kvm hyp code can not break the loop in kvm_vcpu_block (halt
> poll process) and the guest is hang forever
This is just a polling loop which will expire after some time, so it
shouldn't halt the guest indefinitely, but merely slow it down for some
while, if we have a bug. Is that the behavior you're seeing or are you
seeing the guest coming to a complete halt?
>
> Fixes: b103cc3f10c0 ("KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit")
> Signed-off-by: Jia He <[email protected]>
> ---
> virt/kvm/arm/arch_timer.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index f9555b1..bb86433 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -100,7 +100,6 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> vtimer = vcpu_vtimer(vcpu);
>
> if (!vtimer->irq.level) {
> - vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
This fix is clearly not correct, as it would prevent forwarding timer
interrupts in some cases.
> if (kvm_timer_irq_can_fire(vtimer))
> kvm_timer_update_irq(vcpu, true, vtimer);
> }
> --
> 2.7.4
>
I actually don't see how the above scenario you painted can happen.
If you're in the polling loop, that means that the timer state is loaded
on the vcpu, and that means you can take interrupts from the timer, and
when you take interrupts, you will set the irq.level.
And here's the first bit of logic in kvm_timer_is_pending():
if (vtimer->irq.level || ptimer->irq.level)
return true;
So that would break the loop.
I'm not able to reproduce on my side with a non-VHE platform.
What is the workload you're running to reproduce this, and what is the
exact kernel tree and kernel configuration you're using?
Thanks,
-Christoffer
On Wed, Dec 13, 2017 at 08:56:12AM +0000, Marc Zyngier wrote:
> Hi Jia,
>
> On 13/12/17 07:00, Jia He wrote:
> > In our Armv8a server (qualcomm Amberwing, non VHE), after applying
> > Christoffer's timer optimizing patchset(Optimize arch timer register
> > handling), the guest is hang during kernel booting.
> >
> > The error root cause might be as follows:
> > 1. in kvm_arch_timer_handler, it reset vtimer->cnt_ctl with current
> > cntv_ctl register value. And then it missed some cases to update timer's
> > irq (irq.level) when kvm_timer_irq_can_fire() is false
> > 2. It causes kvm_vcpu_check_block return 0 instead of -EINTR
> > kvm_vcpu_check_block
> > kvm_cpu_has_pending_timer
> > kvm_timer_is_pending
> > kvm_timer_should_fire
> > 3. Thus, the kvm hyp code can not break the loop in kvm_vcpu_block (halt
> > poll process) and the guest is hang forever
> >
> > Fixes: b103cc3f10c0 ("KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit")
> > Signed-off-by: Jia He <[email protected]>
> > ---
> > virt/kvm/arm/arch_timer.c | 1 -
> > 1 file changed, 1 deletion(-)
> >
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index f9555b1..bb86433 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -100,7 +100,6 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > vtimer = vcpu_vtimer(vcpu);
> >
> > if (!vtimer->irq.level) {
> > - vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > if (kvm_timer_irq_can_fire(vtimer))
> > kvm_timer_update_irq(vcpu, true, vtimer);
> > }
> >
>
> Which patches are you looking at? The current code in mainline looks
> like this:
>
> vtimer = vcpu_vtimer(vcpu);
>
> vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> if (kvm_timer_irq_can_fire(vtimer))
> kvm_timer_update_irq(vcpu, true, vtimer);
>
> I'd suggest you use mainline and report if this doesn't work.
>
That looks like you have the level-triggered mapped series applied?
That would be an interesting data point to get from Jia as well though.
Jia, can you try applying this series and see if it helps?
git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v7
Thanks,
-Christoffer
> Thanks,
>
> M.
> --
> Jazz is not dead. It just smells funny...
On 13/12/17 09:08, Auger Eric wrote:
> Marc,
> On 13/12/17 09:56, Marc Zyngier wrote:
>> Hi Jia,
>>
>> On 13/12/17 07:00, Jia He wrote:
>>> In our Armv8a server (qualcomm Amberwing, non VHE), after applying
>>> Christoffer's timer optimizing patchset(Optimize arch timer register
>>> handling), the guest is hang during kernel booting.
>>>
>>> The error root cause might be as follows:
>>> 1. in kvm_arch_timer_handler, it reset vtimer->cnt_ctl with current
>>> cntv_ctl register value. And then it missed some cases to update timer's
>>> irq (irq.level) when kvm_timer_irq_can_fire() is false
>>> 2. It causes kvm_vcpu_check_block return 0 instead of -EINTR
>>> kvm_vcpu_check_block
>>> kvm_cpu_has_pending_timer
>>> kvm_timer_is_pending
>>> kvm_timer_should_fire
>>> 3. Thus, the kvm hyp code can not break the loop in kvm_vcpu_block (halt
>>> poll process) and the guest is hang forever
>>>
>>> Fixes: b103cc3f10c0 ("KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit")
>>> Signed-off-by: Jia He <[email protected]>
>>> ---
>>> virt/kvm/arm/arch_timer.c | 1 -
>>> 1 file changed, 1 deletion(-)
>>>
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index f9555b1..bb86433 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -100,7 +100,6 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>>> vtimer = vcpu_vtimer(vcpu);
>>>
>>> if (!vtimer->irq.level) {
>>> - vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
>>> if (kvm_timer_irq_can_fire(vtimer))
>>> kvm_timer_update_irq(vcpu, true, vtimer);
>>> }
>>>
>>
>> Which patches are you looking at? The current code in mainline looks
>> like this:
>>
>> vtimer = vcpu_vtimer(vcpu);
>>
>> vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
>> if (kvm_timer_irq_can_fire(vtimer))
>> kvm_timer_update_irq(vcpu, true, vtimer);
>>
>> I'd suggest you use mainline and report if this doesn't work
> the removal of if (!vtimer->irq.level) test happened in:
> [PATCH v7 3/8] KVM: arm/arm64: Don't cache the timer IRQ level
>
> which is not upstream.
Ah, my bad (I have that series in my working tree already...).
I still think Jia's approach to this is not quite right. If you don't
update the status of the timer by reading the HW value, how can you
decide whether the timer can fire or not?
Thanks,
M.
--
Jazz is not dead. It just smells funny...
On Wed, Dec 13, 2017 at 10:27 AM, Marc Zyngier <[email protected]> wrote:
> On 13/12/17 09:08, Auger Eric wrote:
>> Marc,
>> On 13/12/17 09:56, Marc Zyngier wrote:
>>> Hi Jia,
>>>
>>> On 13/12/17 07:00, Jia He wrote:
>>>> In our Armv8a server (qualcomm Amberwing, non VHE), after applying
>>>> Christoffer's timer optimizing patchset(Optimize arch timer register
>>>> handling), the guest is hang during kernel booting.
>>>>
>>>> The error root cause might be as follows:
>>>> 1. in kvm_arch_timer_handler, it reset vtimer->cnt_ctl with current
>>>> cntv_ctl register value. And then it missed some cases to update timer's
>>>> irq (irq.level) when kvm_timer_irq_can_fire() is false
>>>> 2. It causes kvm_vcpu_check_block return 0 instead of -EINTR
>>>> kvm_vcpu_check_block
>>>> kvm_cpu_has_pending_timer
>>>> kvm_timer_is_pending
>>>> kvm_timer_should_fire
>>>> 3. Thus, the kvm hyp code can not break the loop in kvm_vcpu_block (halt
>>>> poll process) and the guest is hang forever
>>>>
>>>> Fixes: b103cc3f10c0 ("KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit")
>>>> Signed-off-by: Jia He <[email protected]>
>>>> ---
>>>> virt/kvm/arm/arch_timer.c | 1 -
>>>> 1 file changed, 1 deletion(-)
>>>>
>>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>>> index f9555b1..bb86433 100644
>>>> --- a/virt/kvm/arm/arch_timer.c
>>>> +++ b/virt/kvm/arm/arch_timer.c
>>>> @@ -100,7 +100,6 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>>>> vtimer = vcpu_vtimer(vcpu);
>>>>
>>>> if (!vtimer->irq.level) {
>>>> - vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
>>>> if (kvm_timer_irq_can_fire(vtimer))
>>>> kvm_timer_update_irq(vcpu, true, vtimer);
>>>> }
>>>>
>>>
>>> Which patches are you looking at? The current code in mainline looks
>>> like this:
>>>
>>> vtimer = vcpu_vtimer(vcpu);
>>>
>>> vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
>>> if (kvm_timer_irq_can_fire(vtimer))
>>> kvm_timer_update_irq(vcpu, true, vtimer);
>>>
>>> I'd suggest you use mainline and report if this doesn't work
>> the removal of if (!vtimer->irq.level) test happened in:
>> [PATCH v7 3/8] KVM: arm/arm64: Don't cache the timer IRQ level
>>
>> which is not upstream.
> Ah, my bad (I have that series in my working tree already...).
>
> I still think Jia's approach to this is not quite right. If you don't
> update the status of the timer by reading the HW value, how can you
> decide whether the timer can fire or not?
>
Exactly. We need to know the exact kernel source, symptoms, how to
reproduce, and then trace what's going on. It may be needed to tweak
kvm_timer_is_pending(), but I don't yet see a case where it breaks.
Thanks,
-Christoffer
Hi Christoffer
I have tried your newer level-mapped-v7 branch, but bug is still there.
There is no special load in both host and guest. The guest (kernel 4.14)
is often hanging when booting
the guest kernel log
[ OK ] Reached target Remote File Systems.
Starting File System Check on /dev/mapper/fedora-root...
[ OK ] Started File System Check on /dev/mapper/fedora-root.
Mounting /sysroot...
[ 2.670764] SGI XFS with ACLs, security attributes, no debug enabled
[ 2.678180] XFS (dm-0): Mounting V5 Filesystem
[ 2.740364] XFS (dm-0): Ending clean mount
[ OK ] Mounted /sysroot.
[ OK ] Reached target Initrd Root File System.
Starting Reload Configuration from the Real Root...
[ 61.288215] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 61.290791] 1-...!: (0 ticks this GP) idle=574/0/0 softirq=5/5 fqs=1
[ 61.293664] (detected by 0, t=6002 jiffies, g=-263, c=-264, q=39760)
[ 61.296480] Task dump for CPU 1:
[ 61.297938] swapper/1 R running task 0 0 1 0x00000020
[ 61.300643] Call trace:
[ 61.301260] __switch_to+0x6c/0x78
[ 61.302095] cpu_number+0x0/0x8
[ 61.302867] rcu_sched kthread starved for 6000 jiffies!
g18446744073709551353 c18446744073709551352 f0x0 RCU_GP_WAIT_FQS(3)
->state=0x402 ->cpu=1
[ 61.305941] rcu_sched I 0 8 2 0x00000020
[ 61.307250] Call trace:
[ 61.307854] __switch_to+0x6c/0x78
[ 61.308693] __schedule+0x268/0x8f0
[ 61.309545] schedule+0x2c/0x88
[ 61.310325] schedule_timeout+0x84/0x3b8
[ 61.311278] rcu_gp_kthread+0x4d4/0x7d8
[ 61.312213] kthread+0x134/0x138
[ 61.313001] ret_from_fork+0x10/0x1c
Maybe my previous patch is not perfect enough, thanks for your comments.
I digged it futher more, do you think below code logic is possibly
problematic?
vtimer_save_state (vtimer->loaded = false, cntv_ctl is 0)
kvm_arch_timer_handler (read cntv_ctl and set vtimer->cnt_ctl = 0)
vtimer_restore_state (write vtimer->cnt_ctl to cntv_ctl, then
cntv_ctl will
be 0 forever)
If above analysis is reasonable, how about below patch? already tested
in my arm64 server.
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index f9555b1..ee6dd3f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -99,7 +99,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq,
void *dev_id)
}
vtimer = vcpu_vtimer(vcpu);
- if (!vtimer->irq.level) {
+ if (vtimer->loaded && !vtimer->irq.level) {
vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
if (kvm_timer_irq_can_fire(vtimer))
kvm_timer_update_irq(vcpu, true, vtimer);
Cheers,
Jia
On 12/13/2017 5:18 PM, Christoffer Dall Wrote:
> On Tue, Dec 12, 2017 at 11:00:07PM -0800, Jia He wrote:
>> In our Armv8a server (qualcomm Amberwing, non VHE), after applying
>> Christoffer's timer optimizing patchset(Optimize arch timer register
>> handling), the guest is hang during kernel booting.
>>
>> The error root cause might be as follows:
>> 1. in kvm_arch_timer_handler, it reset vtimer->cnt_ctl with current
>> cntv_ctl register value. And then it missed some cases to update timer's
>> irq (irq.level) when kvm_timer_irq_can_fire() is false
> Why should it set the irq level to true when the timer cannot fire?
>
>> 2. It causes kvm_vcpu_check_block return 0 instead of -EINTR
>> kvm_vcpu_check_block
>> kvm_cpu_has_pending_timer
>> kvm_timer_is_pending
>> kvm_timer_should_fire
>> 3. Thus, the kvm hyp code can not break the loop in kvm_vcpu_block (halt
>> poll process) and the guest is hang forever
> This is just a polling loop which will expire after some time, so it
> shouldn't halt the guest indefinitely, but merely slow it down for some
> while, if we have a bug. Is that the behavior you're seeing or are you
> seeing the guest coming to a complete halt?
>
>> Fixes: b103cc3f10c0 ("KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit")
>> Signed-off-by: Jia He <[email protected]>
>> ---
>> virt/kvm/arm/arch_timer.c | 1 -
>> 1 file changed, 1 deletion(-)
>>
>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>> index f9555b1..bb86433 100644
>> --- a/virt/kvm/arm/arch_timer.c
>> +++ b/virt/kvm/arm/arch_timer.c
>> @@ -100,7 +100,6 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>> vtimer = vcpu_vtimer(vcpu);
>>
>> if (!vtimer->irq.level) {
>> - vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> This fix is clearly not correct, as it would prevent forwarding timer
> interrupts in some cases.
>
>> if (kvm_timer_irq_can_fire(vtimer))
>> kvm_timer_update_irq(vcpu, true, vtimer);
>> }
>> --
>> 2.7.4
>>
> I actually don't see how the above scenario you painted can happen.
>
> If you're in the polling loop, that means that the timer state is loaded
> on the vcpu, and that means you can take interrupts from the timer, and
> when you take interrupts, you will set the irq.level.
>
> And here's the first bit of logic in kvm_timer_is_pending():
>
> if (vtimer->irq.level || ptimer->irq.level)
> return true;
>
> So that would break the loop.
>
> I'm not able to reproduce on my side with a non-VHE platform.
>
> What is the workload you're running to reproduce this, and what is the
> exact kernel tree and kernel configuration you're using?
>
> Thanks,
> -Christoffer
>
>
>
Hi
On 12/14/2017 12:57 PM, Jia He Wrote:
> Hi Christoffer
>
> I have tried your newer level-mapped-v7 branch, but bug is still there.
>
> There is no special load in both host and guest. The guest (kernel
> 4.14) is often hanging when booting
>
> the guest kernel log
>
> [ OK ] Reached target Remote File Systems.
> Starting File System Check on /dev/mapper/fedora-root...
> [ OK ] Started File System Check on /dev/mapper/fedora-root.
> Mounting /sysroot...
> [ 2.670764] SGI XFS with ACLs, security attributes, no debug enabled
> [ 2.678180] XFS (dm-0): Mounting V5 Filesystem
> [ 2.740364] XFS (dm-0): Ending clean mount
> [ OK ] Mounted /sysroot.
> [ OK ] Reached target Initrd Root File System.
> Starting Reload Configuration from the Real Root...
> [ 61.288215] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 61.290791] 1-...!: (0 ticks this GP) idle=574/0/0 softirq=5/5 fqs=1
> [ 61.293664] (detected by 0, t=6002 jiffies, g=-263, c=-264, q=39760)
> [ 61.296480] Task dump for CPU 1:
> [ 61.297938] swapper/1 R running task 0 0 1 0x00000020
> [ 61.300643] Call trace:
> [ 61.301260] __switch_to+0x6c/0x78
> [ 61.302095] cpu_number+0x0/0x8
> [ 61.302867] rcu_sched kthread starved for 6000 jiffies!
> g18446744073709551353 c18446744073709551352 f0x0 RCU_GP_WAIT_FQS(3)
> ->state=0x402 ->cpu=1
> [ 61.305941] rcu_sched I 0 8 2 0x00000020
> [ 61.307250] Call trace:
> [ 61.307854] __switch_to+0x6c/0x78
> [ 61.308693] __schedule+0x268/0x8f0
> [ 61.309545] schedule+0x2c/0x88
> [ 61.310325] schedule_timeout+0x84/0x3b8
> [ 61.311278] rcu_gp_kthread+0x4d4/0x7d8
> [ 61.312213] kthread+0x134/0x138
> [ 61.313001] ret_from_fork+0x10/0x1c
>
> Maybe my previous patch is not perfect enough, thanks for your comments.
>
> I digged it futher more, do you think below code logic is possibly
> problematic?
>
>
> vtimer_save_state (vtimer->loaded = false, cntv_ctl is 0)
>
> kvm_arch_timer_handler (read cntv_ctl and set vtimer->cnt_ctl = 0)
>
> vtimer_restore_state (write vtimer->cnt_ctl to cntv_ctl,
> then cntv_ctl will
>
> be 0 forever)
sorry, adjust the format, make it easy for reading:
vtimer_save_state (vtimer->loaded = false, cntv_ctl is 0)
kvm_arch_timer_handler (read cntv_ctl and set vtimer->cnt_ctl = 0)
vtimer_restore_state (write vtimer->cnt_ctl to cntv_ctl,
then cntv_ctl will be 0 forever)
--
Cheers,
Jia
>
>
> If above analysis is reasonable, how about below patch? already tested
> in my arm64 server.
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index f9555b1..ee6dd3f 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -99,7 +99,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq,
> void *dev_id)
> }
> vtimer = vcpu_vtimer(vcpu);
>
> - if (!vtimer->irq.level) {
> + if (vtimer->loaded && !vtimer->irq.level) {
> vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> if (kvm_timer_irq_can_fire(vtimer))
> kvm_timer_update_irq(vcpu, true, vtimer);
>
> Cheers,
>
> Jia
>
>
>
> On 12/13/2017 5:18 PM, Christoffer Dall Wrote:
>> On Tue, Dec 12, 2017 at 11:00:07PM -0800, Jia He wrote:
>>> In our Armv8a server (qualcomm Amberwing, non VHE), after applying
>>> Christoffer's timer optimizing patchset(Optimize arch timer register
>>> handling), the guest is hang during kernel booting.
>>>
>>> The error root cause might be as follows:
>>> 1. in kvm_arch_timer_handler, it reset vtimer->cnt_ctl with current
>>> cntv_ctl register value. And then it missed some cases to update
>>> timer's
>>> irq (irq.level) when kvm_timer_irq_can_fire() is false
>> Why should it set the irq level to true when the timer cannot fire?
>>
>>> 2. It causes kvm_vcpu_check_block return 0 instead of -EINTR
>>> kvm_vcpu_check_block
>>> kvm_cpu_has_pending_timer
>>> kvm_timer_is_pending
>>> kvm_timer_should_fire
>>> 3. Thus, the kvm hyp code can not break the loop in kvm_vcpu_block
>>> (halt
>>> poll process) and the guest is hang forever
>> This is just a polling loop which will expire after some time, so it
>> shouldn't halt the guest indefinitely, but merely slow it down for some
>> while, if we have a bug. Is that the behavior you're seeing or are you
>> seeing the guest coming to a complete halt?
>>
>>> Fixes: b103cc3f10c0 ("KVM: arm/arm64: Avoid timer save/restore in
>>> vcpu entry/exit")
>>> Signed-off-by: Jia He <[email protected]>
>>> ---
>>> virt/kvm/arm/arch_timer.c | 1 -
>>> 1 file changed, 1 deletion(-)
>>>
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index f9555b1..bb86433 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -100,7 +100,6 @@ static irqreturn_t kvm_arch_timer_handler(int
>>> irq, void *dev_id)
>>> vtimer = vcpu_vtimer(vcpu);
>>> if (!vtimer->irq.level) {
>>> - vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
>> This fix is clearly not correct, as it would prevent forwarding timer
>> interrupts in some cases.
>>
>>> if (kvm_timer_irq_can_fire(vtimer))
>>> kvm_timer_update_irq(vcpu, true, vtimer);
>>> }
>>> --
>>> 2.7.4
>>>
>> I actually don't see how the above scenario you painted can happen.
>>
>> If you're in the polling loop, that means that the timer state is loaded
>> on the vcpu, and that means you can take interrupts from the timer, and
>> when you take interrupts, you will set the irq.level.
>>
>> And here's the first bit of logic in kvm_timer_is_pending():
>>
>> if (vtimer->irq.level || ptimer->irq.level)
>> return true;
>>
>> So that would break the loop.
>>
>> I'm not able to reproduce on my side with a non-VHE platform.
>>
>> What is the workload you're running to reproduce this, and what is the
>> exact kernel tree and kernel configuration you're using?
>>
>> Thanks,
>> -Christoffer
>>
>>
>>
On Thu, Dec 14, 2017 at 12:57:54PM +0800, Jia He wrote:
Hi Jia,
>
> I have tried your newer level-mapped-v7 branch, but bug is still there.
>
> There is no special load in both host and guest. The guest (kernel
> 4.14) is often hanging when booting
>
> the guest kernel log
>
> [ OK ] Reached target Remote File Systems.
> Starting File System Check on /dev/mapper/fedora-root...
> [ OK ] Started File System Check on /dev/mapper/fedora-root.
> Mounting /sysroot...
> [ 2.670764] SGI XFS with ACLs, security attributes, no debug enabled
> [ 2.678180] XFS (dm-0): Mounting V5 Filesystem
> [ 2.740364] XFS (dm-0): Ending clean mount
> [ OK ] Mounted /sysroot.
> [ OK ] Reached target Initrd Root File System.
> Starting Reload Configuration from the Real Root...
> [ 61.288215] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 61.290791] 1-...!: (0 ticks this GP) idle=574/0/0 softirq=5/5 fqs=1
> [ 61.293664] (detected by 0, t=6002 jiffies, g=-263, c=-264, q=39760)
> [ 61.296480] Task dump for CPU 1:
> [ 61.297938] swapper/1 R running task 0 0 1 0x00000020
> [ 61.300643] Call trace:
> [ 61.301260] __switch_to+0x6c/0x78
> [ 61.302095] cpu_number+0x0/0x8
> [ 61.302867] rcu_sched kthread starved for 6000 jiffies!
> g18446744073709551353 c18446744073709551352 f0x0 RCU_GP_WAIT_FQS(3)
> ->state=0x402 ->cpu=1
> [ 61.305941] rcu_sched I 0 8 2 0x00000020
> [ 61.307250] Call trace:
> [ 61.307854] __switch_to+0x6c/0x78
> [ 61.308693] __schedule+0x268/0x8f0
> [ 61.309545] schedule+0x2c/0x88
> [ 61.310325] schedule_timeout+0x84/0x3b8
> [ 61.311278] rcu_gp_kthread+0x4d4/0x7d8
> [ 61.312213] kthread+0x134/0x138
> [ 61.313001] ret_from_fork+0x10/0x1c
>
> Maybe my previous patch is not perfect enough, thanks for your comments.
>
> I digged it futher more, do you think below code logic is possibly
> problematic?
>
>
> vtimer_save_state?????????? (vtimer->loaded = false, cntv_ctl is 0)
>
> kvm_arch_timer_handler????????(read cntv_ctl and set vtimer->cnt_ctl = 0)
>
> vtimer_restore_state ? ? ? ? ?? (write vtimer->cnt_ctl to cntv_ctl,
> then cntv_ctl will
>
> ??? ??? ??? ??? ?? ? ? be 0 forever)
>
>
> If above analysis is reasonable
Yes, I think there's something there if the hardware doesn't retire the
signal fast enough...
> how about below patch? already
> tested in my arm64 server.
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index f9555b1..ee6dd3f 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -99,7 +99,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq,
> void *dev_id)
> ??????? }
> ??????? vtimer = vcpu_vtimer(vcpu);
>
> -?????? if (!vtimer->irq.level) {
> +?????? if (vtimer->loaded && !vtimer->irq.level) {
> ??????????????? vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> ??????????????? if (kvm_timer_irq_can_fire(vtimer))
> ??????????????????????? kvm_timer_update_irq(vcpu, true, vtimer);
>
There's nothing really wrong with that patch, I just didn't think it
would be necessary, as we really shouldn't see interrupts if the timer
is not loaded. Can you confirm that a WARN_ON(!vtimer->loaded) in
kvm_arch_timer_handler() gives you a splat?
Also, could you give the following a try (without your patch):
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 73d262c4712b..4751255345d1 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -367,6 +367,7 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
/* Disable the virtual timer */
write_sysreg_el0(0, cntv_ctl);
+ isb();
vtimer->loaded = false;
out:
Thanks,
-Christoffer
Hi Christoffer
On 12/14/2017 9:09 PM, Christoffer Dall Wrote:
> On Thu, Dec 14, 2017 at 12:57:54PM +0800, Jia He wrote:
> Hi Jia,
>
>> I have tried your newer level-mapped-v7 branch, but bug is still there.
>>
>> There is no special load in both host and guest. The guest (kernel
>> 4.14) is often hanging when booting
>>
>> the guest kernel log
>>
>> [ OK ] Reached target Remote File Systems.
>> Starting File System Check on /dev/mapper/fedora-root...
>> [ OK ] Started File System Check on /dev/mapper/fedora-root.
>> Mounting /sysroot...
>> [ 2.670764] SGI XFS with ACLs, security attributes, no debug enabled
>> [ 2.678180] XFS (dm-0): Mounting V5 Filesystem
>> [ 2.740364] XFS (dm-0): Ending clean mount
>> [ OK ] Mounted /sysroot.
>> [ OK ] Reached target Initrd Root File System.
>> Starting Reload Configuration from the Real Root...
>> [ 61.288215] INFO: rcu_sched detected stalls on CPUs/tasks:
>> [ 61.290791] 1-...!: (0 ticks this GP) idle=574/0/0 softirq=5/5 fqs=1
>> [ 61.293664] (detected by 0, t=6002 jiffies, g=-263, c=-264, q=39760)
>> [ 61.296480] Task dump for CPU 1:
>> [ 61.297938] swapper/1 R running task 0 0 1 0x00000020
>> [ 61.300643] Call trace:
>> [ 61.301260] __switch_to+0x6c/0x78
>> [ 61.302095] cpu_number+0x0/0x8
>> [ 61.302867] rcu_sched kthread starved for 6000 jiffies!
>> g18446744073709551353 c18446744073709551352 f0x0 RCU_GP_WAIT_FQS(3)
>> ->state=0x402 ->cpu=1
>> [ 61.305941] rcu_sched I 0 8 2 0x00000020
>> [ 61.307250] Call trace:
>> [ 61.307854] __switch_to+0x6c/0x78
>> [ 61.308693] __schedule+0x268/0x8f0
>> [ 61.309545] schedule+0x2c/0x88
>> [ 61.310325] schedule_timeout+0x84/0x3b8
>> [ 61.311278] rcu_gp_kthread+0x4d4/0x7d8
>> [ 61.312213] kthread+0x134/0x138
>> [ 61.313001] ret_from_fork+0x10/0x1c
>>
>> Maybe my previous patch is not perfect enough, thanks for your comments.
>>
>> I digged it futher more, do you think below code logic is possibly
>> problematic?
>>
>>
>> vtimer_save_state (vtimer->loaded = false, cntv_ctl is 0)
>>
>> kvm_arch_timer_handler (read cntv_ctl and set vtimer->cnt_ctl = 0)
>>
>> vtimer_restore_state (write vtimer->cnt_ctl to cntv_ctl,
>> then cntv_ctl will
>>
>> be 0 forever)
>>
>>
>> If above analysis is reasonable
> Yes, I think there's something there if the hardware doesn't retire the
> signal fast enough...
>
>> how about below patch? already
>> tested in my arm64 server.
>>
>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>> index f9555b1..ee6dd3f 100644
>> --- a/virt/kvm/arm/arch_timer.c
>> +++ b/virt/kvm/arm/arch_timer.c
>> @@ -99,7 +99,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq,
>> void *dev_id)
>> }
>> vtimer = vcpu_vtimer(vcpu);
>>
>> - if (!vtimer->irq.level) {
>> + if (vtimer->loaded && !vtimer->irq.level) {
>> vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
>> if (kvm_timer_irq_can_fire(vtimer))
>> kvm_timer_update_irq(vcpu, true, vtimer);
>>
> There's nothing really wrong with that patch, I just didn't think it
> would be necessary, as we really shouldn't see interrupts if the timer
> is not loaded. Can you confirm that a WARN_ON(!vtimer->loaded) in
> kvm_arch_timer_handler() gives you a splat?
Please see the WARN_ON result (without my patch)
[ 72.171706] WARNING: CPU: 24 PID: 1768 at
arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:101
kvm_arch_timer_handler+0xc0/0xc8
[ 72.182305] Modules linked in: vhost_net tap xt_CHECKSUM
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
ip_tables x_tables vhost_scsi vhost tcm_qla2xxx qla2xxx nvme_fc
nvme_fabrics tcm_fc libfc scsi_transport_fc ib_srpt ib_cm ib_core
iscsi_target_mod tcm_loop target_core_file target_core_iblock
target_core_pscsi target_core_mod binfmt_misc nls_iso8859_1 shpchp
crc32_ce crct10dif_ce i2c_qup dm_multipath autofs4 btrfs zstd_decompress
zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy
async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath
linear at803x ixgbe
[ 72.252877] xhci_plat_hcd xhci_hcd usbcore qcom_emac mdio
ahci_platform libahci_platform libahci
[ 72.261733] CPU: 24 PID: 1768 Comm: qemu-system-aar Tainted: G
W 4.15.0-rc3+ #128
[ 72.270412] Hardware name: WIWYNN QDF2400 Reference Evaluation
Platform CV90-LA115-P151/QDF2400 Customer Reference Board, BIOS 0ACJA425
09/07/2017
[ 72.283520] pstate: 80400085 (Nzcv daIf +PAN -UAO)
[ 72.288295] pc : kvm_arch_timer_handler+0xc0/0xc8
[ 72.292984] lr : handle_percpu_devid_irq+0x8c/0x230
[ 72.297842] sp : ffff00000830ff00
[ 72.301141] x29: ffff00000830ff00 x28: ffffc17b9feaff00
[ 72.306436] x27: ffff2e7462531000 x26: ffff000008310000
[ 72.311731] x25: ffff000008300000 x24: ffffc17b80069100
[ 72.317026] x23: 0000000000000003 x22: ffff2e74632248e8
[ 72.322321] x21: ffffc17b80054c00 x20: 0000000000000000
[ 72.327616] x19: ffffc17b8ac37dc0 x18: 0000000000000010
[ 72.332911] x17: 000000000000000a x16: 0000000000007fff
[ 72.338207] x15: ffffffffffffffff x14: 6d202c6666666666
[ 72.343502] x13: 6666667830203a73 x12: 656c6379635f7861
[ 72.348797] x11: ffff000009395448 x10: ffff00000862b5b8
[ 72.354092] x9 : 0000000000000040 x8 : ffffc17b60007238
[ 72.359387] x7 : 0000000000000000 x6 : ffffc17b80054c00
[ 72.364682] x5 : ffffc17b60007250 x4 : 0000930756ce0000
[ 72.369977] x3 : ffff2e74619cfff0 x2 : 0000930756ce0000
[ 72.375273] x1 : 0000000000000000 x0 : ffffc17b8ac399c0
[ 72.380568] Call trace:
[ 72.383000] kvm_arch_timer_handler+0xc0/0xc8
[ 72.387340] handle_percpu_devid_irq+0x8c/0x230
[ 72.391853] generic_handle_irq+0x34/0x50
[ 72.395846] __handle_domain_irq+0x68/0xc0
[ 72.399926] gic_handle_irq+0xcc/0x188
[ 72.403658] el1_irq+0xd8/0x180
[ 72.406785] hrtimer_try_to_cancel+0x0/0x160
[ 72.411037] kvm_timer_vcpu_put+0x3c/0x50
[ 72.415030] kvm_arch_vcpu_put+0x20/0x50
[ 72.418937] vcpu_put+0x20/0x40
[ 72.422061] kvm_vcpu_ioctl+0x244/0x7b8
[ 72.425882] do_vfs_ioctl+0xc4/0x988
[ 72.429440] SyS_ioctl+0x94/0xa8
[ 72.432652] el0_svc_naked+0x20/0x24
[ 72.436210] ---[ end trace 11d0c8bba284e766 ]---
> Also, could you give the following a try (without your patch):
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 73d262c4712b..4751255345d1 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -367,6 +367,7 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
>
> /* Disable the virtual timer */
> write_sysreg_el0(0, cntv_ctl);
> + isb();
No luck, the bug is still there
Cheers,
Jia
On Thu, Dec 14, 2017 at 11:28:04PM +0800, Jia He wrote:
>
> On 12/14/2017 9:09 PM, Christoffer Dall Wrote:
> >On Thu, Dec 14, 2017 at 12:57:54PM +0800, Jia He wrote:
> >Hi Jia,
> >
> >>I have tried your newer level-mapped-v7 branch, but bug is still there.
> >>
> >>There is no special load in both host and guest. The guest (kernel
> >>4.14) is often hanging when booting
> >>
> >>the guest kernel log
> >>
> >>[ OK ] Reached target Remote File Systems.
> >>Starting File System Check on /dev/mapper/fedora-root...
> >>[ OK ] Started File System Check on /dev/mapper/fedora-root.
> >>Mounting /sysroot...
> >>[ 2.670764] SGI XFS with ACLs, security attributes, no debug enabled
> >>[ 2.678180] XFS (dm-0): Mounting V5 Filesystem
> >>[ 2.740364] XFS (dm-0): Ending clean mount
> >>[ OK ] Mounted /sysroot.
> >>[ OK ] Reached target Initrd Root File System.
> >>Starting Reload Configuration from the Real Root...
> >>[ 61.288215] INFO: rcu_sched detected stalls on CPUs/tasks:
> >>[ 61.290791] 1-...!: (0 ticks this GP) idle=574/0/0 softirq=5/5 fqs=1
> >>[ 61.293664] (detected by 0, t=6002 jiffies, g=-263, c=-264, q=39760)
> >>[ 61.296480] Task dump for CPU 1:
> >>[ 61.297938] swapper/1 R running task 0 0 1 0x00000020
> >>[ 61.300643] Call trace:
> >>[ 61.301260] __switch_to+0x6c/0x78
> >>[ 61.302095] cpu_number+0x0/0x8
> >>[ 61.302867] rcu_sched kthread starved for 6000 jiffies!
> >>g18446744073709551353 c18446744073709551352 f0x0 RCU_GP_WAIT_FQS(3)
> >>->state=0x402 ->cpu=1
> >>[ 61.305941] rcu_sched I 0 8 2 0x00000020
> >>[ 61.307250] Call trace:
> >>[ 61.307854] __switch_to+0x6c/0x78
> >>[ 61.308693] __schedule+0x268/0x8f0
> >>[ 61.309545] schedule+0x2c/0x88
> >>[ 61.310325] schedule_timeout+0x84/0x3b8
> >>[ 61.311278] rcu_gp_kthread+0x4d4/0x7d8
> >>[ 61.312213] kthread+0x134/0x138
> >>[ 61.313001] ret_from_fork+0x10/0x1c
> >>
> >>Maybe my previous patch is not perfect enough, thanks for your comments.
> >>
> >>I digged it futher more, do you think below code logic is possibly
> >>problematic?
> >>
> >>
> >>vtimer_save_state (vtimer->loaded = false, cntv_ctl is 0)
> >>
> >>kvm_arch_timer_handler (read cntv_ctl and set vtimer->cnt_ctl = 0)
> >>
> >>vtimer_restore_state (write vtimer->cnt_ctl to cntv_ctl,
> >>then cntv_ctl will
> >>
> >> be 0 forever)
> >>
> >>
> >>If above analysis is reasonable
> >Yes, I think there's something there if the hardware doesn't retire the
> >signal fast enough...
> >
> >>how about below patch? already
> >>tested in my arm64 server.
> >>
> >>diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >>index f9555b1..ee6dd3f 100644
> >>--- a/virt/kvm/arm/arch_timer.c
> >>+++ b/virt/kvm/arm/arch_timer.c
> >>@@ -99,7 +99,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq,
> >>void *dev_id)
> >> }
> >> vtimer = vcpu_vtimer(vcpu);
> >>
> >>- if (!vtimer->irq.level) {
> >>+ if (vtimer->loaded && !vtimer->irq.level) {
> >> vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> >> if (kvm_timer_irq_can_fire(vtimer))
> >> kvm_timer_update_irq(vcpu, true, vtimer);
> >>
> >There's nothing really wrong with that patch, I just didn't think it
> >would be necessary, as we really shouldn't see interrupts if the timer
> >is not loaded. Can you confirm that a WARN_ON(!vtimer->loaded) in
> >kvm_arch_timer_handler() gives you a splat?
> Please see the WARN_ON result (without my patch)
> [ 72.171706] WARNING: CPU: 24 PID: 1768 at
> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:101
> kvm_arch_timer_handler+0xc0/0xc8
>
> >Also, could you give the following a try (without your patch):
> >
> >diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >index 73d262c4712b..4751255345d1 100644
> >--- a/virt/kvm/arm/arch_timer.c
> >+++ b/virt/kvm/arm/arch_timer.c
> >@@ -367,6 +367,7 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
> > /* Disable the virtual timer */
> > write_sysreg_el0(0, cntv_ctl);
> >+ isb();
> No luck, the bug is still there
>
ok, so this is a slightly different approach to what you were trying to
do. Can you please give this a try and let me know how it goes?
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 73d262c4712b..544ed15fbbb3 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -46,7 +46,7 @@ static const struct kvm_irq_level default_vtimer_irq = {
.level = 1,
};
-static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
+static bool kvm_timer_irq_can_fire(u32 cnt_ctl);
static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
struct arch_timer_context *timer_ctx);
static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
@@ -94,6 +94,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
{
struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
struct arch_timer_context *vtimer;
+ u32 cnt_ctl;
if (!vcpu) {
pr_warn_once("Spurious arch timer IRQ on non-VCPU thread\n");
@@ -101,8 +102,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
}
vtimer = vcpu_vtimer(vcpu);
- vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
- if (kvm_timer_irq_can_fire(vtimer))
+ cnt_ctl = read_sysreg_el0(cntv_ctl);
+ if (kvm_timer_irq_can_fire(cnt_ctl))
kvm_timer_update_irq(vcpu, true, vtimer);
if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
@@ -148,10 +149,10 @@ static u64 kvm_timer_compute_delta(struct arch_timer_context *timer_ctx)
return 0;
}
-static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx)
+static bool kvm_timer_irq_can_fire(u32 cnt_ctl)
{
- return !(timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
- (timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_ENABLE);
+ return !(cnt_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
+ (cnt_ctl & ARCH_TIMER_CTRL_ENABLE);
}
/*
@@ -164,10 +165,10 @@ static u64 kvm_timer_earliest_exp(struct kvm_vcpu *vcpu)
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
- if (kvm_timer_irq_can_fire(vtimer))
+ if (kvm_timer_irq_can_fire(vtimer->cnt_ctl))
min_virt = kvm_timer_compute_delta(vtimer);
- if (kvm_timer_irq_can_fire(ptimer))
+ if (kvm_timer_irq_can_fire(ptimer->cnt_ctl))
min_phys = kvm_timer_compute_delta(ptimer);
/* If none of timers can fire, then return 0 */
@@ -231,7 +232,7 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
{
u64 cval, now;
- if (!kvm_timer_irq_can_fire(timer_ctx))
+ if (!kvm_timer_irq_can_fire(timer_ctx->cnt_ctl))
return false;
cval = timer_ctx->cnt_cval;
@@ -306,7 +307,7 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu)
* don't need to have a soft timer scheduled for the future. If the
* timer cannot fire at all, then we also don't need a soft timer.
*/
- if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer)) {
+ if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer->cnt_ctl)) {
soft_timer_cancel(&timer->phys_timer, NULL);
return;
}
@@ -367,6 +368,7 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
/* Disable the virtual timer */
write_sysreg_el0(0, cntv_ctl);
+ isb();
vtimer->loaded = false;
out:
@@ -398,7 +400,8 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
* If both timers are not capable of raising interrupts (disabled or
* masked), then there's no more work for us to do.
*/
- if (!kvm_timer_irq_can_fire(vtimer) && !kvm_timer_irq_can_fire(ptimer))
+ if (!kvm_timer_irq_can_fire(vtimer->cnt_ctl) &&
+ !kvm_timer_irq_can_fire(ptimer->cnt_ctl))
return;
/*
Thanks,
-Christoffer
On 12/14/2017 11:45 PM, Christoffer Dall Wrote:
> On Thu, Dec 14, 2017 at 11:28:04PM +0800, Jia He wrote:
>> On 12/14/2017 9:09 PM, Christoffer Dall Wrote:
>>> On Thu, Dec 14, 2017 at 12:57:54PM +0800, Jia He wrote:
>>> Hi Jia,
>>>
>>>> I have tried your newer level-mapped-v7 branch, but bug is still there.
>>>>
>>>> There is no special load in both host and guest. The guest (kernel
>>>> 4.14) is often hanging when booting
>>>>
>>>> the guest kernel log
>>>>
>>>> [ OK ] Reached target Remote File Systems.
>>>> Starting File System Check on /dev/mapper/fedora-root...
>>>> [ OK ] Started File System Check on /dev/mapper/fedora-root.
>>>> Mounting /sysroot...
>>>> [ 2.670764] SGI XFS with ACLs, security attributes, no debug enabled
>>>> [ 2.678180] XFS (dm-0): Mounting V5 Filesystem
>>>> [ 2.740364] XFS (dm-0): Ending clean mount
>>>> [ OK ] Mounted /sysroot.
>>>> [ OK ] Reached target Initrd Root File System.
>>>> Starting Reload Configuration from the Real Root...
>>>> [ 61.288215] INFO: rcu_sched detected stalls on CPUs/tasks:
>>>> [ 61.290791] 1-...!: (0 ticks this GP) idle=574/0/0 softirq=5/5 fqs=1
>>>> [ 61.293664] (detected by 0, t=6002 jiffies, g=-263, c=-264, q=39760)
>>>> [ 61.296480] Task dump for CPU 1:
>>>> [ 61.297938] swapper/1 R running task 0 0 1 0x00000020
>>>> [ 61.300643] Call trace:
>>>> [ 61.301260] __switch_to+0x6c/0x78
>>>> [ 61.302095] cpu_number+0x0/0x8
>>>> [ 61.302867] rcu_sched kthread starved for 6000 jiffies!
>>>> g18446744073709551353 c18446744073709551352 f0x0 RCU_GP_WAIT_FQS(3)
>>>> ->state=0x402 ->cpu=1
>>>> [ 61.305941] rcu_sched I 0 8 2 0x00000020
>>>> [ 61.307250] Call trace:
>>>> [ 61.307854] __switch_to+0x6c/0x78
>>>> [ 61.308693] __schedule+0x268/0x8f0
>>>> [ 61.309545] schedule+0x2c/0x88
>>>> [ 61.310325] schedule_timeout+0x84/0x3b8
>>>> [ 61.311278] rcu_gp_kthread+0x4d4/0x7d8
>>>> [ 61.312213] kthread+0x134/0x138
>>>> [ 61.313001] ret_from_fork+0x10/0x1c
>>>>
>>>> Maybe my previous patch is not perfect enough, thanks for your comments.
>>>>
>>>> I digged it futher more, do you think below code logic is possibly
>>>> problematic?
>>>>
>>>>
>>>> vtimer_save_state (vtimer->loaded = false, cntv_ctl is 0)
>>>>
>>>> kvm_arch_timer_handler (read cntv_ctl and set vtimer->cnt_ctl = 0)
>>>>
>>>> vtimer_restore_state (write vtimer->cnt_ctl to cntv_ctl,
>>>> then cntv_ctl will
>>>>
>>>> be 0 forever)
>>>>
>>>>
>>>> If above analysis is reasonable
>>> Yes, I think there's something there if the hardware doesn't retire the
>>> signal fast enough...
>>>
>>>> how about below patch? already
>>>> tested in my arm64 server.
>>>>
>>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>>> index f9555b1..ee6dd3f 100644
>>>> --- a/virt/kvm/arm/arch_timer.c
>>>> +++ b/virt/kvm/arm/arch_timer.c
>>>> @@ -99,7 +99,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq,
>>>> void *dev_id)
>>>> }
>>>> vtimer = vcpu_vtimer(vcpu);
>>>>
>>>> - if (!vtimer->irq.level) {
>>>> + if (vtimer->loaded && !vtimer->irq.level) {
>>>> vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
>>>> if (kvm_timer_irq_can_fire(vtimer))
>>>> kvm_timer_update_irq(vcpu, true, vtimer);
>>>>
>>> There's nothing really wrong with that patch, I just didn't think it
>>> would be necessary, as we really shouldn't see interrupts if the timer
>>> is not loaded. Can you confirm that a WARN_ON(!vtimer->loaded) in
>>> kvm_arch_timer_handler() gives you a splat?
>> Please see the WARN_ON result (without my patch)
>> [ 72.171706] WARNING: CPU: 24 PID: 1768 at
>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:101
>> kvm_arch_timer_handler+0xc0/0xc8
>>
>>> Also, could you give the following a try (without your patch):
>>>
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index 73d262c4712b..4751255345d1 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -367,6 +367,7 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
>>> /* Disable the virtual timer */
>>> write_sysreg_el0(0, cntv_ctl);
>>> + isb();
>> No luck, the bug is still there
>>
> ok, so this is a slightly different approach to what you were trying to
> do. Can you please give this a try and let me know how it goes?
>
This patch fixes the bug in our platform.
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 73d262c4712b..544ed15fbbb3 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -46,7 +46,7 @@ static const struct kvm_irq_level default_vtimer_irq = {
> .level = 1,
> };
>
> -static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
> +static bool kvm_timer_irq_can_fire(u32 cnt_ctl);
> static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> struct arch_timer_context *timer_ctx);
> static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
> @@ -94,6 +94,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> {
> struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> struct arch_timer_context *vtimer;
> + u32 cnt_ctl;
>
> if (!vcpu) {
> pr_warn_once("Spurious arch timer IRQ on non-VCPU thread\n");
> @@ -101,8 +102,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> }
> vtimer = vcpu_vtimer(vcpu);
>
> - vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> - if (kvm_timer_irq_can_fire(vtimer))
> + cnt_ctl = read_sysreg_el0(cntv_ctl);
> + if (kvm_timer_irq_can_fire(cnt_ctl))
> kvm_timer_update_irq(vcpu, true, vtimer);
IIUC, your patch makes kvm_arch_timer_handler never changesvtimer->cnt_ctl
>
> if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> @@ -148,10 +149,10 @@ static u64 kvm_timer_compute_delta(struct arch_timer_context *timer_ctx)
> return 0;
> }
>
> -static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx)
> +static bool kvm_timer_irq_can_fire(u32 cnt_ctl)
> {
> - return !(timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> - (timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_ENABLE);
> + return !(cnt_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> + (cnt_ctl & ARCH_TIMER_CTRL_ENABLE);
> }
>
> /*
> @@ -164,10 +165,10 @@ static u64 kvm_timer_earliest_exp(struct kvm_vcpu *vcpu)
> struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>
> - if (kvm_timer_irq_can_fire(vtimer))
> + if (kvm_timer_irq_can_fire(vtimer->cnt_ctl))
> min_virt = kvm_timer_compute_delta(vtimer);
>
> - if (kvm_timer_irq_can_fire(ptimer))
> + if (kvm_timer_irq_can_fire(ptimer->cnt_ctl))
> min_phys = kvm_timer_compute_delta(ptimer);
>
> /* If none of timers can fire, then return 0 */
> @@ -231,7 +232,7 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
> {
> u64 cval, now;
>
> - if (!kvm_timer_irq_can_fire(timer_ctx))
> + if (!kvm_timer_irq_can_fire(timer_ctx->cnt_ctl))
> return false;
>
> cval = timer_ctx->cnt_cval;
> @@ -306,7 +307,7 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu)
> * don't need to have a soft timer scheduled for the future. If the
> * timer cannot fire at all, then we also don't need a soft timer.
> */
> - if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer)) {
> + if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer->cnt_ctl)) {
> soft_timer_cancel(&timer->phys_timer, NULL);
> return;
> }
> @@ -367,6 +368,7 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
>
> /* Disable the virtual timer */
> write_sysreg_el0(0, cntv_ctl);
> + isb();
My only concern is whether this isb() is required here?
Sorryif this is a stupid question.I understand little about arm arch
memory barrier. But seems isb will flush all the instruction prefetch.Do
you think if an timer interrupt irq arrives, arm will use the previous
instruction prefetch?
Cheers,
Jia
On 15/12/17 02:27, Jia He wrote:
>
>
[...]
>> @@ -367,6 +368,7 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
>>
>> /* Disable the virtual timer */
>> write_sysreg_el0(0, cntv_ctl);
>> + isb();
> My only concern is whether this isb() is required here?
> Sorryif this is a stupid question.I understand little about arm arch
> memory barrier. But seems isb will flush all the instruction prefetch.Do
> you think if an timer interrupt irq arrives, arm will use the previous
> instruction prefetch?
This barrier has little to do with prefetch. It just guarantees that the
code after the isb() is now running with a disabled virtual timer.
Otherwise, a CPU can freely reorder the write_sysreg() until the next
context synchronization event.
An interrupt coming between the write and the barrier will also act as a
context synchronization event. For more details, see the ARMv8 ARM (the
glossary has a section on the concept).
Thanks,
M.
--
Jazz is not dead. It just smells funny...
On Fri, Dec 15, 2017 at 10:27:02AM +0800, Jia He wrote:
[...]
> >diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >index 73d262c4712b..544ed15fbbb3 100644
> >--- a/virt/kvm/arm/arch_timer.c
> >+++ b/virt/kvm/arm/arch_timer.c
> >@@ -46,7 +46,7 @@ static const struct kvm_irq_level default_vtimer_irq = {
> > .level = 1,
> > };
> >-static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
> >+static bool kvm_timer_irq_can_fire(u32 cnt_ctl);
> > static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> > struct arch_timer_context *timer_ctx);
> > static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
> >@@ -94,6 +94,7 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > {
> > struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > struct arch_timer_context *vtimer;
> >+ u32 cnt_ctl;
> > if (!vcpu) {
> > pr_warn_once("Spurious arch timer IRQ on non-VCPU thread\n");
> >@@ -101,8 +102,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > }
> > vtimer = vcpu_vtimer(vcpu);
> >- vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> >- if (kvm_timer_irq_can_fire(vtimer))
> >+ cnt_ctl = read_sysreg_el0(cntv_ctl);
> >+ if (kvm_timer_irq_can_fire(cnt_ctl))
> > kvm_timer_update_irq(vcpu, true, vtimer);
> IIUC, your patch makes kvm_arch_timer_handler never changesvtimer->cnt_ctl
Yes, that's the idea.
Meanwhile, I think I thought of a cleaner way to do this. Could you
test the following two patches on your platform as well?
>From 3a594a3aa222bd64a86f6c6afcb209c9be20d5c5 Mon Sep 17 00:00:00 2001
From: Christoffer Dall <[email protected]>
Date: Thu, 14 Dec 2017 19:54:50 +0100
Subject: [PATCH 1/2] KVM: arm/arm64: Properly handle arch-timer IRQs after
vtimer_save_state
The recent timer rework was assuming that once the timer was disabled,
we should no longer see any interrupts from the timer. This assumption
turns out to not be true, and instead we have to handle the case when
the timer ISR runs even after the timer has been disabled.
This requires a couple of changes:
First, we should never overwrite the cached guest state of the timer
control register when the ISR runs, because KVM may have disabled its
timers when doing vcpu_put(), even though the guest still had the timer
enabled.
Second, we shouldn't assume that the timer is actually firing just
because we see an ISR, but we should check the ISTATUS field of the
timer control register to understand if the hardware timer is really
firing or not.
Signed-off-by: Christoffer Dall <[email protected]>
---
virt/kvm/arm/arch_timer.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index aa9adfafe12b..792bcf6277b6 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -92,16 +92,21 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
{
struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
struct arch_timer_context *vtimer;
+ u32 cnt_ctl;
- if (!vcpu) {
- pr_warn_once("Spurious arch timer IRQ on non-VCPU thread\n");
- return IRQ_NONE;
- }
- vtimer = vcpu_vtimer(vcpu);
+ /*
+ * We may see a timer interrupt after vcpu_put() has been called which
+ * sets the CPU's vcpu pointer to NULL, because even though the timer
+ * has been disabled in vtimer_save_state(), the singal may not have
+ * been retired from the interrupt controller yet.
+ */
+ if (!vcpu)
+ return IRQ_HANDLED;
+ vtimer = vcpu_vtimer(vcpu);
if (!vtimer->irq.level) {
- vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
- if (kvm_timer_irq_can_fire(vtimer))
+ cnt_ctl = read_sysreg_el0(cntv_ctl);
+ if (cnt_ctl & ARCH_TIMER_CTRL_IT_STAT)
kvm_timer_update_irq(vcpu, true, vtimer);
}
>From ed96302b47d209024814df116994f64dc8f07c96 Mon Sep 17 00:00:00 2001
From: Christoffer Dall <[email protected]>
Date: Fri, 15 Dec 2017 00:30:12 +0100
Subject: [PATCH 2/2] KVM: arm/arm64: Fix timer enable flow
When enabling the timer on the first run, we fail to ever restore the
state and mark it as loaded. That means, that in the initial entry to
the VCPU ioctl, unless we exit to userspace for some reason such as a
pending signal, if the guest programs a timer and blocks, we will wait
forever, because we never read back the hardware state (the loaded flag
is not set), and so we think the timer is disabled, and we never
schedule a background soft timer.
The end result? The VCPU blocks forever, and the only solution is to
kill the thread.
Signed-off-by: Christoffer Dall <[email protected]>
---
virt/kvm/arm/arch_timer.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 792bcf6277b6..8869658e6983 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -843,10 +843,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
no_vgic:
preempt_disable();
timer->enabled = 1;
- if (!irqchip_in_kernel(vcpu->kvm))
- kvm_timer_vcpu_load_user(vcpu);
- else
- kvm_timer_vcpu_load_vgic(vcpu);
+ kvm_timer_vcpu_load(vcpu);
preempt_enable();
return 0;
Thanks,
-Christoffer
On Fri, Dec 15, 2017 at 09:09:05AM +0000, Marc Zyngier wrote:
> On 15/12/17 02:27, Jia He wrote:
> >
> >
>
> [...]
>
> >> @@ -367,6 +368,7 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
> >>
> >> /* Disable the virtual timer */
> >> write_sysreg_el0(0, cntv_ctl);
> >> + isb();
> > My only concern is whether this isb() is required here?
> > Sorryif this is a stupid question.I understand little about arm arch
> > memory barrier. But seems isb will flush all the instruction prefetch.Do
> > you think if an timer interrupt irq arrives, arm will use the previous
> > instruction prefetch?
>
>
> This barrier has little to do with prefetch. It just guarantees that the
> code after the isb() is now running with a disabled virtual timer.
> Otherwise, a CPU can freely reorder the write_sysreg() until the next
> context synchronization event.
>
> An interrupt coming between the write and the barrier will also act as a
> context synchronization event. For more details, see the ARMv8 ARM (the
> glossary has a section on the concept).
>
So since an ISB doesn't guarantee that the timer will actually be
disabled, is it a waste of energy to have it, or should we keep it as a
best effort kind of thing?
Thanks,
-Christoffer
On 15/12/17 10:10, Christoffer Dall wrote:
> On Fri, Dec 15, 2017 at 09:09:05AM +0000, Marc Zyngier wrote:
>> On 15/12/17 02:27, Jia He wrote:
>>>
>>>
>>
>> [...]
>>
>>>> @@ -367,6 +368,7 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
>>>>
>>>> /* Disable the virtual timer */
>>>> write_sysreg_el0(0, cntv_ctl);
>>>> + isb();
>>> My only concern is whether this isb() is required here?
>>> Sorryif this is a stupid question.I understand little about arm arch
>>> memory barrier. But seems isb will flush all the instruction prefetch.Do
>>> you think if an timer interrupt irq arrives, arm will use the previous
>>> instruction prefetch?
>>
>>
>> This barrier has little to do with prefetch. It just guarantees that the
>> code after the isb() is now running with a disabled virtual timer.
>> Otherwise, a CPU can freely reorder the write_sysreg() until the next
>> context synchronization event.
>>
>> An interrupt coming between the write and the barrier will also act as a
>> context synchronization event. For more details, see the ARMv8 ARM (the
>> glossary has a section on the concept).
>>
>
> So since an ISB doesn't guarantee that the timer will actually be
> disabled, is it a waste of energy to have it, or should we keep it as a
> best effort kind of thing?
nit: the ISB does offer that guarantee. It is just that the guarantee
doesn't extend to an interrupt that has already been signalled.
The main issue I have with not having an ISB is that it makes it harder
to think of when the disabling actually happens. The disabling could be
delayed for a very long time, and would make things harder to debug if
they were going wrong.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
On Fri, Dec 15, 2017 at 10:33:48AM +0000, Marc Zyngier wrote:
> On 15/12/17 10:10, Christoffer Dall wrote:
> > On Fri, Dec 15, 2017 at 09:09:05AM +0000, Marc Zyngier wrote:
> >> On 15/12/17 02:27, Jia He wrote:
> >>>
> >>>
> >>
> >> [...]
> >>
> >>>> @@ -367,6 +368,7 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
> >>>>
> >>>> /* Disable the virtual timer */
> >>>> write_sysreg_el0(0, cntv_ctl);
> >>>> + isb();
> >>> My only concern is whether this isb() is required here?
> >>> Sorryif this is a stupid question.I understand little about arm arch
> >>> memory barrier. But seems isb will flush all the instruction prefetch.Do
> >>> you think if an timer interrupt irq arrives, arm will use the previous
> >>> instruction prefetch?
> >>
> >>
> >> This barrier has little to do with prefetch. It just guarantees that the
> >> code after the isb() is now running with a disabled virtual timer.
> >> Otherwise, a CPU can freely reorder the write_sysreg() until the next
> >> context synchronization event.
> >>
> >> An interrupt coming between the write and the barrier will also act as a
> >> context synchronization event. For more details, see the ARMv8 ARM (the
> >> glossary has a section on the concept).
> >>
> >
> > So since an ISB doesn't guarantee that the timer will actually be
> > disabled, is it a waste of energy to have it, or should we keep it as a
> > best effort kind of thing?
>
> nit: the ISB does offer that guarantee. It is just that the guarantee
> doesn't extend to an interrupt that has already been signalled.
right, I should have said that it doesn't guarantee that an already
signalled interrupt will have been retired prior to enabling interrupts
on the CPU again.
>
> The main issue I have with not having an ISB is that it makes it harder
> to think of when the disabling actually happens. The disabling could be
> delayed for a very long time, and would make things harder to debug if
> they were going wrong.
>
Yes, it definitely indicates the intention of the code and the flow, and
the fact that we have to work around delayed signals in the ISR is then
something we can describe with a comment there.
I'll add an ISB in the other version of the patch I sent before.
Thanks,
-Christoffer
Hi Christoffer
Sorry for the late, I ever thought you would send out v2 with isb(). It
seems not.
On 12/15/2017 6:04 PM, Christoffer Dall Wrote:
> On Fri, Dec 15, 2017 at 10:27:02AM +0800, Jia He wrote:
>
> [...]
[...]
>
> Meanwhile, I think I thought of a cleaner way to do this. Could you
> test the following two patches on your platform as well?
>
> >From 3a594a3aa222bd64a86f6c6afcb209c9be20d5c5 Mon Sep 17 00:00:00 2001
> From: Christoffer Dall <[email protected]>
> Date: Thu, 14 Dec 2017 19:54:50 +0100
> Subject: [PATCH 1/2] KVM: arm/arm64: Properly handle arch-timer IRQs after
> vtimer_save_state
>
> The recent timer rework was assuming that once the timer was disabled,
> we should no longer see any interrupts from the timer. This assumption
> turns out to not be true, and instead we have to handle the case when
> the timer ISR runs even after the timer has been disabled.
>
> This requires a couple of changes:
>
> First, we should never overwrite the cached guest state of the timer
> control register when the ISR runs, because KVM may have disabled its
> timers when doing vcpu_put(), even though the guest still had the timer
> enabled.
>
> Second, we shouldn't assume that the timer is actually firing just
> because we see an ISR, but we should check the ISTATUS field of the
> timer control register to understand if the hardware timer is really
> firing or not.
>
> Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Jia He <[email protected]>
> ---
> virt/kvm/arm/arch_timer.c | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index aa9adfafe12b..792bcf6277b6 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -92,16 +92,21 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> {
> struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> struct arch_timer_context *vtimer;
> + u32 cnt_ctl;
>
> - if (!vcpu) {
> - pr_warn_once("Spurious arch timer IRQ on non-VCPU thread\n");
> - return IRQ_NONE;
> - }
> - vtimer = vcpu_vtimer(vcpu);
> + /*
> + * We may see a timer interrupt after vcpu_put() has been called which
> + * sets the CPU's vcpu pointer to NULL, because even though the timer
> + * has been disabled in vtimer_save_state(), the singal may not have
> + * been retired from the interrupt controller yet.
> + */
> + if (!vcpu)
> + return IRQ_HANDLED;
>
> + vtimer = vcpu_vtimer(vcpu);
> if (!vtimer->irq.level) {
> - vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> - if (kvm_timer_irq_can_fire(vtimer))
> + cnt_ctl = read_sysreg_el0(cntv_ctl);
> + if (cnt_ctl & ARCH_TIMER_CTRL_IT_STAT)
> kvm_timer_update_irq(vcpu, true, vtimer);
> }
>
>
>
> >From ed96302b47d209024814df116994f64dc8f07c96 Mon Sep 17 00:00:00 2001
> From: Christoffer Dall <[email protected]>
> Date: Fri, 15 Dec 2017 00:30:12 +0100
> Subject: [PATCH 2/2] KVM: arm/arm64: Fix timer enable flow
>
> When enabling the timer on the first run, we fail to ever restore the
> state and mark it as loaded. That means, that in the initial entry to
> the VCPU ioctl, unless we exit to userspace for some reason such as a
> pending signal, if the guest programs a timer and blocks, we will wait
> forever, because we never read back the hardware state (the loaded flag
> is not set), and so we think the timer is disabled, and we never
> schedule a background soft timer.
>
> The end result? The VCPU blocks forever, and the only solution is to
> kill the thread.
>
> Signed-off-by: Christoffer Dall <[email protected]>
> ---
> virt/kvm/arm/arch_timer.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 792bcf6277b6..8869658e6983 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -843,10 +843,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
> no_vgic:
> preempt_disable();
> timer->enabled = 1;
> - if (!irqchip_in_kernel(vcpu->kvm))
> - kvm_timer_vcpu_load_user(vcpu);
> - else
> - kvm_timer_vcpu_load_vgic(vcpu);
> + kvm_timer_vcpu_load(vcpu);
> preempt_enable();
>
> return 0;
>
>
> Thanks,
> -Christoffer
>
I have tested your 2 patches in my QDF2400 server for 10 times, the
guest can be boot up without any issues.
Without this patch, the guest will always hang in booting stages.
--
Cheers,
Jia
On Thu, Dec 21, 2017 at 05:16:48PM +0800, Jia He wrote:
>
> Sorry for the late, I ever thought you would send out v2 with isb().
> It seems not.
>
I did:
https://lists.cs.columbia.edu/pipermail/kvmarm/2017-December/029078.html
>
> On 12/15/2017 6:04 PM, Christoffer Dall Wrote:
> >On Fri, Dec 15, 2017 at 10:27:02AM +0800, Jia He wrote:
> >
> >[...]
> [...]
> >
> >Meanwhile, I think I thought of a cleaner way to do this. Could you
> >test the following two patches on your platform as well?
> >
> >>From 3a594a3aa222bd64a86f6c6afcb209c9be20d5c5 Mon Sep 17 00:00:00 2001
> >From: Christoffer Dall <[email protected]>
> >Date: Thu, 14 Dec 2017 19:54:50 +0100
> >Subject: [PATCH 1/2] KVM: arm/arm64: Properly handle arch-timer IRQs after
> > vtimer_save_state
> >
> >The recent timer rework was assuming that once the timer was disabled,
> >we should no longer see any interrupts from the timer. This assumption
> >turns out to not be true, and instead we have to handle the case when
> >the timer ISR runs even after the timer has been disabled.
> >
> >This requires a couple of changes:
> >
> >First, we should never overwrite the cached guest state of the timer
> >control register when the ISR runs, because KVM may have disabled its
> >timers when doing vcpu_put(), even though the guest still had the timer
> >enabled.
> >
> >Second, we shouldn't assume that the timer is actually firing just
> >because we see an ISR, but we should check the ISTATUS field of the
> >timer control register to understand if the hardware timer is really
> >firing or not.
> >
> >Signed-off-by: Christoffer Dall <[email protected]>
>
> Signed-off-by: Jia He <[email protected]>
>
Did you write parts of this patch and thus I should have had your
signed-off-by ?
Or did you mean to provide another tag.
Anyway, these patches have been pulled already, so I hope we can live
with the way they are.
> >---
> > virt/kvm/arm/arch_timer.c | 19 ++++++++++++-------
> > 1 file changed, 12 insertions(+), 7 deletions(-)
> >
> >diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >index aa9adfafe12b..792bcf6277b6 100644
> >--- a/virt/kvm/arm/arch_timer.c
> >+++ b/virt/kvm/arm/arch_timer.c
> >@@ -92,16 +92,21 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > {
> > struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > struct arch_timer_context *vtimer;
> >+ u32 cnt_ctl;
> >- if (!vcpu) {
> >- pr_warn_once("Spurious arch timer IRQ on non-VCPU thread\n");
> >- return IRQ_NONE;
> >- }
> >- vtimer = vcpu_vtimer(vcpu);
> >+ /*
> >+ * We may see a timer interrupt after vcpu_put() has been called which
> >+ * sets the CPU's vcpu pointer to NULL, because even though the timer
> >+ * has been disabled in vtimer_save_state(), the singal may not have
> >+ * been retired from the interrupt controller yet.
> >+ */
> >+ if (!vcpu)
> >+ return IRQ_HANDLED;
> >+ vtimer = vcpu_vtimer(vcpu);
> > if (!vtimer->irq.level) {
> >- vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> >- if (kvm_timer_irq_can_fire(vtimer))
> >+ cnt_ctl = read_sysreg_el0(cntv_ctl);
> >+ if (cnt_ctl & ARCH_TIMER_CTRL_IT_STAT)
> > kvm_timer_update_irq(vcpu, true, vtimer);
> > }
> >
> >
> >>From ed96302b47d209024814df116994f64dc8f07c96 Mon Sep 17 00:00:00 2001
> >From: Christoffer Dall <[email protected]>
> >Date: Fri, 15 Dec 2017 00:30:12 +0100
> >Subject: [PATCH 2/2] KVM: arm/arm64: Fix timer enable flow
> >
> >When enabling the timer on the first run, we fail to ever restore the
> >state and mark it as loaded. That means, that in the initial entry to
> >the VCPU ioctl, unless we exit to userspace for some reason such as a
> >pending signal, if the guest programs a timer and blocks, we will wait
> >forever, because we never read back the hardware state (the loaded flag
> >is not set), and so we think the timer is disabled, and we never
> >schedule a background soft timer.
> >
> >The end result? The VCPU blocks forever, and the only solution is to
> >kill the thread.
> >
> >Signed-off-by: Christoffer Dall <[email protected]>
> >---
> > virt/kvm/arm/arch_timer.c | 5 +----
> > 1 file changed, 1 insertion(+), 4 deletions(-)
> >
> >diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >index 792bcf6277b6..8869658e6983 100644
> >--- a/virt/kvm/arm/arch_timer.c
> >+++ b/virt/kvm/arm/arch_timer.c
> >@@ -843,10 +843,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
> > no_vgic:
> > preempt_disable();
> > timer->enabled = 1;
> >- if (!irqchip_in_kernel(vcpu->kvm))
> >- kvm_timer_vcpu_load_user(vcpu);
> >- else
> >- kvm_timer_vcpu_load_vgic(vcpu);
> >+ kvm_timer_vcpu_load(vcpu);
> > preempt_enable();
> > return 0;
> >
> >
> >Thanks,
> >-Christoffer
> >
> I have tested your 2 patches in my QDF2400 server for 10 times, the
> guest can be boot up without any issues.
> Without this patch, the guest will always hang in booting stages.
>
Thanks for this, it's comforting to know.
Thanks,
-Christoffer