LinuxLists.cc - [PATCH] powerpc/time: use get_tb instead of get_vtb in running

2017-07-12 15:01:36

Subject: [PATCH] powerpc/time: use get_tb instead of get_vtb in running_clock

Virtual time base(vtb) is a register which increases only in guest.
Any exit from guest to host will stop the vtb(saved and restored by kvm).
But if there is an IO causes guest exits to host, the guest's watchdog
(watchdog_timer_fn -> is_softlockup -> get_timestamp -> running_clock)
needs to also include the time elapsed in host. get_vtb is not correct in
this case.

Also, the TB_OFFSET is well saved and restored by qemu after commit [1].
So we can use get_tb here.

[1] http://git.qemu.org/?p=qemu.git;a=commit;h=42043e4f1

Signed-off-by: Jia He <[email protected]>
---
arch/powerpc/kernel/time.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index fe6f3a2..c542dd3 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -695,16 +695,15 @@ notrace unsigned long long sched_clock(void)
unsigned long long running_clock(void)
{
/*
- * Don't read the VTB as a host since KVM does not switch in host
- * timebase into the VTB when it takes a guest off the CPU, reading the
- * VTB would result in reading 'last switched out' guest VTB.
+ * Use get_tb instead of get_vtb for guest since the TB_OFFSET has been
+ * well saved/restored when qemu does suspend/resume.
*
* Host kernels are often compiled with CONFIG_PPC_PSERIES checked, it
* would be unsafe to rely only on the #ifdef above.
*/
if (firmware_has_feature(FW_FEATURE_LPAR) &&
cpu_has_feature(CPU_FTR_ARCH_207S))
- return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
+ return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;

/*
* This is a next best approximation without a VTB.
--
2.9.3

2017-07-12 22:45:40

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [PATCH] powerpc/time: use get_tb instead of get_vtb in running_clock

On Wed, 2017-07-12 at 23:01 +0800, Jia He wrote:
> Virtual time base(vtb) is a register which increases only in guest.
> Any exit from guest to host will stop the vtb(saved and restored by kvm).
> But if there is an IO causes guest exits to host, the guest's watchdog
> (watchdog_timer_fn -> is_softlockup -> get_timestamp -> running_clock)
> needs to also include the time elapsed in host. get_vtb is not correct in
> this case.
>
> Also, the TB_OFFSET is well saved and restored by qemu after commit [1].
> So we can use get_tb here.

That completely defeats the purpose here... This was done specifically
to exploit the VTB which doesn't count in hypervisor mode.

>
> [1] http://git.qemu.org/?p=qemu.git;a=commit;h=42043e4f1
>
> Signed-off-by: Jia He <[email protected]>
> ---
> arch/powerpc/kernel/time.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index fe6f3a2..c542dd3 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -695,16 +695,15 @@ notrace unsigned long long sched_clock(void)
> unsigned long long running_clock(void)
> {
> /*
> - * Don't read the VTB as a host since KVM does not switch in host
> - * timebase into the VTB when it takes a guest off the CPU, reading the
> - * VTB would result in reading 'last switched out' guest VTB.
> + * Use get_tb instead of get_vtb for guest since the TB_OFFSET has been
> + * well saved/restored when qemu does suspend/resume.
> *
> * Host kernels are often compiled with CONFIG_PPC_PSERIES checked, it
> * would be unsafe to rely only on the #ifdef above.
> */
> if (firmware_has_feature(FW_FEATURE_LPAR) &&
> cpu_has_feature(CPU_FTR_ARCH_207S))
> - return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> + return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
>
> /*
> * This is a next best approximation without a VTB.

2017-07-13 06:55:51

by Jia He

[permalink] [raw]

Subject: Re: [PATCH] powerpc/time: use get_tb instead of get_vtb in running_clock

Hi Ben
I add some printk logs in watchdog_timer_fn in the guest
[ 16.025222] get_vtb=8236291881, get_tb=13756711357, get_timestamp=4
[ 20.025624] get_vtb=9745285807, get_tb=15804711283, get_timestamp=7
[ 24.025042] get_vtb=11518119641, get_tb=17852711085, get_timestamp=10
[ 28.024074] get_vtb=13192704319, get_tb=19900711071, get_timestamp=13
[ 32.024086] get_vtb=14856516982, get_tb=21948711066, get_timestamp=16
[ 36.024075] get_vtb=16569127618, get_tb=23996711078, get_timestamp=20
[ 40.024138] get_vtb=17008865823, get_tb=26044718418, get_timestamp=20
[ 44.023993] get_vtb=17020637241, get_tb=28092716383, get_timestamp=20
[ 48.023996] get_vtb=17022857170, get_tb=30140718472, get_timestamp=20
[ 52.023996] get_vtb=17024268541, get_tb=32188718432, get_timestamp=20
[ 56.023996] get_vtb=17036577783, get_tb=34236718077, get_timestamp=20
[ 60.023996] get_vtb=17037829743, get_tb=36284718437, get_timestamp=20
[ 64.023992] get_vtb=17039846747, get_tb=38332716609, get_timestamp=20
[ 68.023991] get_vtb=17041448345, get_tb=40380715903, get_timestamp=20

The get_timestamp(use get_vtb(),unit is second) is slower down compared
with printk time. You also can obviously watch the get_vtb increment is
slowly less than get_tb increment.

Without this patch, I thought there might be some softlockup warnings
missed in guest.

-Jia

On 13/07/2017 6:45 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2017-07-12 at 23:01 +0800, Jia He wrote:
>> Virtual time base(vtb) is a register which increases only in guest.
>> Any exit from guest to host will stop the vtb(saved and restored by kvm).
>> But if there is an IO causes guest exits to host, the guest's watchdog
>> (watchdog_timer_fn -> is_softlockup -> get_timestamp -> running_clock)
>> needs to also include the time elapsed in host. get_vtb is not correct in
>> this case.
>>
>> Also, the TB_OFFSET is well saved and restored by qemu after commit [1].
>> So we can use get_tb here.
>
> That completely defeats the purpose here... This was done specifically
> to exploit the VTB which doesn't count in hypervisor mode.
>
>>
>> [1] http://git.qemu.org/?p=qemu.git;a=commit;h=42043e4f1
>>
>> Signed-off-by: Jia He <[email protected]>
>> ---
>> arch/powerpc/kernel/time.c | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
>> index fe6f3a2..c542dd3 100644
>> --- a/arch/powerpc/kernel/time.c
>> +++ b/arch/powerpc/kernel/time.c
>> @@ -695,16 +695,15 @@ notrace unsigned long long sched_clock(void)
>> unsigned long long running_clock(void)
>> {
>> /*
>> - * Don't read the VTB as a host since KVM does not switch in host
>> - * timebase into the VTB when it takes a guest off the CPU, reading the
>> - * VTB would result in reading 'last switched out' guest VTB.
>> + * Use get_tb instead of get_vtb for guest since the TB_OFFSET has been
>> + * well saved/restored when qemu does suspend/resume.
>> *
>> * Host kernels are often compiled with CONFIG_PPC_PSERIES checked, it
>> * would be unsafe to rely only on the #ifdef above.
>> */
>> if (firmware_has_feature(FW_FEATURE_LPAR) &&
>> cpu_has_feature(CPU_FTR_ARCH_207S))
>> - return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
>> + return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
>>
>> /*
>> * This is a next best approximation without a VTB.
>

2017-07-13 21:52:10

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [PATCH] powerpc/time: use get_tb instead of get_vtb in running_clock

On Thu, 2017-07-13 at 14:55 +0800, hejianet wrote:
> Hi Ben
> I add some printk logs in watchdog_timer_fn in the guest
> [   16.025222] get_vtb=8236291881, get_tb=13756711357, get_timestamp=4
> [   20.025624] get_vtb=9745285807, get_tb=15804711283, get_timestamp=7
> [   24.025042] get_vtb=11518119641, get_tb=17852711085, get_timestamp=10
> [   28.024074] get_vtb=13192704319, get_tb=19900711071, get_timestamp=13
> [   32.024086] get_vtb=14856516982, get_tb=21948711066, get_timestamp=16
> [   36.024075] get_vtb=16569127618, get_tb=23996711078, get_timestamp=20
> [   40.024138] get_vtb=17008865823, get_tb=26044718418, get_timestamp=20
> [   44.023993] get_vtb=17020637241, get_tb=28092716383, get_timestamp=20
> [   48.023996] get_vtb=17022857170, get_tb=30140718472, get_timestamp=20
> [   52.023996] get_vtb=17024268541, get_tb=32188718432, get_timestamp=20
> [   56.023996] get_vtb=17036577783, get_tb=34236718077, get_timestamp=20
> [   60.023996] get_vtb=17037829743, get_tb=36284718437, get_timestamp=20
> [   64.023992] get_vtb=17039846747, get_tb=38332716609, get_timestamp=20
> [   68.023991] get_vtb=17041448345, get_tb=40380715903, get_timestamp=20
>
> The get_timestamp(use get_vtb(),unit is second) is slower down compared
> with printk time. You also can obviously watch the get_vtb increment is
> slowly less than get_tb increment.

But that is the entire point of vtb ... because it only counts when
running in the guest, not the time spent in the host.

> Without this patch, I thought there might be some softlockup warnings
> missed in guest.

Ugh ? We already have too many of these ! On the contrary, we don't
want the guest to start to spew soft lockup warnings because it was not
scheduled by the host enough. The guest soft lockup warnings should be
specifically about something bad happening inside the guest.

Your patch seems to defeat the whole purpose of that running_clock()
function unless I'm somewhat mistaken.

Ben.