2020-05-28 19:48:47

by Miklos Szeredi

[permalink] [raw]
Subject: system time goes weird in kvm guest after host suspend/resume

Bisected it to:

b95a8a27c300 ("x86/vdso: Use generic VDSO clock mode storage")

The effect observed is that after the host is resumed, the clock in
the guest is somewhat in the future and is stopped. I.e. repeated
date(1) invocations show the same time.

Attaching the .config

Thanks,
Miklos


Attachments:
.config (56.79 kB)

2020-05-28 20:45:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

Miklos Szeredi <[email protected]> writes:
> Bisected it to:
>
> b95a8a27c300 ("x86/vdso: Use generic VDSO clock mode storage")
>
> The effect observed is that after the host is resumed, the clock in
> the guest is somewhat in the future and is stopped. I.e. repeated
> date(1) invocations show the same time.

TBH, the bisect does not make any sense at all. It's renaming the
constants and moving the storage space and I just read it line for line
again that the result is equivalent. I'll have a look once the merge
window dust settles a bit.

Thanks,

tglx




2020-05-29 09:54:07

by Miklos Szeredi

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

On Thu, May 28, 2020 at 10:43 PM Thomas Gleixner <[email protected]> wrote:
>
> Miklos Szeredi <[email protected]> writes:
> > Bisected it to:
> >
> > b95a8a27c300 ("x86/vdso: Use generic VDSO clock mode storage")
> >
> > The effect observed is that after the host is resumed, the clock in
> > the guest is somewhat in the future and is stopped. I.e. repeated
> > date(1) invocations show the same time.
>
> TBH, the bisect does not make any sense at all. It's renaming the
> constants and moving the storage space and I just read it line for line
> again that the result is equivalent. I'll have a look once the merge
> window dust settles a bit.

Yet, reverting just that single commit against latest linus tree fixes
the issue. Which I think is a pretty good indication that that commit
*is* doing something.

The jump forward is around 35 minutes; that seems to be consistent as well.

Thanks,
Miklos

2020-05-29 10:00:13

by Miklos Szeredi

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

On Fri, May 29, 2020 at 11:51 AM Miklos Szeredi <[email protected]> wrote:
>
> On Thu, May 28, 2020 at 10:43 PM Thomas Gleixner <[email protected]> wrote:
> >
> > Miklos Szeredi <[email protected]> writes:
> > > Bisected it to:
> > >
> > > b95a8a27c300 ("x86/vdso: Use generic VDSO clock mode storage")
> > >
> > > The effect observed is that after the host is resumed, the clock in
> > > the guest is somewhat in the future and is stopped. I.e. repeated
> > > date(1) invocations show the same time.
> >
> > TBH, the bisect does not make any sense at all. It's renaming the
> > constants and moving the storage space and I just read it line for line
> > again that the result is equivalent. I'll have a look once the merge
> > window dust settles a bit.
>
> Yet, reverting just that single commit against latest linus tree fixes
> the issue. Which I think is a pretty good indication that that commit
> *is* doing something.
>
> The jump forward is around 35 minutes; that seems to be consistent as well.

Oh, and here's a dmesg extract for the good case:

[ 26.402239] clocksource: timekeeping watchdog on CPU0: Marking
clocksource 'tsc' as unstable because the skew is too large:
[ 26.407029] clocksource: 'kvm-clock' wd_now:
635480f3c wd_last: 3ce94a718 mask: ffffffffffffffff
[ 26.407632] clocksource: 'tsc' cs_now:
92d2e5d08 cs_last: 81305ceee mask: ffffffffffffffff
[ 26.409097] tsc: Marking TSC unstable due to clocksource watchdog

and the bad one:

[ 36.667576] clocksource: timekeeping watchdog on CPU1: Marking
clocksource 'tsc' as unstable because the skew is too large:
[ 36.690441] clocksource: 'kvm-clock' wd_now:
89885027c wd_last: 3ea987282 mask: ffffffffffffffff
[ 36.690994] clocksource: 'tsc' cs_now:
95666ec22 cs_last: 84e747930 mask: ffffffffffffffff
[ 36.691901] tsc: Marking TSC unstable due to clocksource watchdog

Thanks,
Miklos

2020-05-29 12:23:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

Miklos,

Miklos Szeredi <[email protected]> writes:
> On Fri, May 29, 2020 at 11:51 AM Miklos Szeredi <[email protected]> wrote:
>> On Thu, May 28, 2020 at 10:43 PM Thomas Gleixner <[email protected]> wrote:
>> >
>> > Miklos Szeredi <[email protected]> writes:
>> > > Bisected it to:
>> > >
>> > > b95a8a27c300 ("x86/vdso: Use generic VDSO clock mode storage")
>> > >
>> > > The effect observed is that after the host is resumed, the clock in
>> > > the guest is somewhat in the future and is stopped. I.e. repeated
>> > > date(1) invocations show the same time.
>> >
>> > TBH, the bisect does not make any sense at all. It's renaming the
>> > constants and moving the storage space and I just read it line for line
>> > again that the result is equivalent. I'll have a look once the merge
>> > window dust settles a bit.
>>
>> Yet, reverting just that single commit against latest linus tree fixes
>> the issue. Which I think is a pretty good indication that that commit
>> *is* doing something.

A revert on top of Linus latest surely does something, it disables VDSO
because clocksource.vdso_clock_mode becomes NONE.

That's a data point maybe, but it clearly does not restore the situation
_before_ that commit.

>> The jump forward is around 35 minutes; that seems to be consistent as
>> well.
>
> Oh, and here's a dmesg extract for the good case:
>
> [ 26.402239] clocksource: timekeeping watchdog on CPU0: Marking
> clocksource 'tsc' as unstable because the skew is too large:
> [ 26.407029] clocksource: 'kvm-clock' wd_now:
> 635480f3c wd_last: 3ce94a718 mask: ffffffffffffffff
> [ 26.407632] clocksource: 'tsc' cs_now:
> 92d2e5d08 cs_last: 81305ceee mask: ffffffffffffffff
> [ 26.409097] tsc: Marking TSC unstable due to clocksource watchdog
>
> and the bad one:
>
> [ 36.667576] clocksource: timekeeping watchdog on CPU1: Marking
> clocksource 'tsc' as unstable because the skew is too large:
> [ 36.690441] clocksource: 'kvm-clock' wd_now:
> 89885027c wd_last: 3ea987282 mask: ffffffffffffffff
> [ 36.690994] clocksource: 'tsc' cs_now:
> 95666ec22 cs_last: 84e747930 mask: ffffffffffffffff
> [ 36.691901] tsc: Marking TSC unstable due to clocksource watchdog

And the difference is? It's 10 seconds later and the detection happens
on CPU1 and not on CPU0. I really don't see what you are reading out of
this.

Can you please describe the setup of this test?

- Host kernel version
- Guest kernel version
- Is the revert done on the host or guest or both?
- Test flow is:

Boot host, start guest, suspend host, resume host, guest is screwed

correct?

Thanks,

tglx

2020-05-29 12:36:30

by Miklos Szeredi

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

On Fri, May 29, 2020 at 2:21 PM Thomas Gleixner <[email protected]> wrote:
>
> Miklos,
>
> Miklos Szeredi <[email protected]> writes:
> > On Fri, May 29, 2020 at 11:51 AM Miklos Szeredi <[email protected]> wrote:
> >> On Thu, May 28, 2020 at 10:43 PM Thomas Gleixner <[email protected]> wrote:
> >> >
> >> > Miklos Szeredi <[email protected]> writes:
> >> > > Bisected it to:
> >> > >
> >> > > b95a8a27c300 ("x86/vdso: Use generic VDSO clock mode storage")
> >> > >
> >> > > The effect observed is that after the host is resumed, the clock in
> >> > > the guest is somewhat in the future and is stopped. I.e. repeated
> >> > > date(1) invocations show the same time.
> >> >
> >> > TBH, the bisect does not make any sense at all. It's renaming the
> >> > constants and moving the storage space and I just read it line for line
> >> > again that the result is equivalent. I'll have a look once the merge
> >> > window dust settles a bit.
> >>
> >> Yet, reverting just that single commit against latest linus tree fixes
> >> the issue. Which I think is a pretty good indication that that commit
> >> *is* doing something.
>
> A revert on top of Linus latest surely does something, it disables VDSO
> because clocksource.vdso_clock_mode becomes NONE.
>
> That's a data point maybe, but it clearly does not restore the situation
> _before_ that commit.
>
> >> The jump forward is around 35 minutes; that seems to be consistent as
> >> well.
> >
> > Oh, and here's a dmesg extract for the good case:
> >
> > [ 26.402239] clocksource: timekeeping watchdog on CPU0: Marking
> > clocksource 'tsc' as unstable because the skew is too large:
> > [ 26.407029] clocksource: 'kvm-clock' wd_now:
> > 635480f3c wd_last: 3ce94a718 mask: ffffffffffffffff
> > [ 26.407632] clocksource: 'tsc' cs_now:
> > 92d2e5d08 cs_last: 81305ceee mask: ffffffffffffffff
> > [ 26.409097] tsc: Marking TSC unstable due to clocksource watchdog
> >
> > and the bad one:
> >
> > [ 36.667576] clocksource: timekeeping watchdog on CPU1: Marking
> > clocksource 'tsc' as unstable because the skew is too large:
> > [ 36.690441] clocksource: 'kvm-clock' wd_now:
> > 89885027c wd_last: 3ea987282 mask: ffffffffffffffff
> > [ 36.690994] clocksource: 'tsc' cs_now:
> > 95666ec22 cs_last: 84e747930 mask: ffffffffffffffff
> > [ 36.691901] tsc: Marking TSC unstable due to clocksource watchdog
>
> And the difference is? It's 10 seconds later and the detection happens
> on CPU1 and not on CPU0. I really don't see what you are reading out of
> this.

I didn't even try to interpret this. Just reporting what I'm seeing.

> Can you please describe the setup of this test?
>
> - Host kernel version
> - Guest kernel version
> - Is the revert done on the host or guest or both?
> - Test flow is:
>
> Boot host, start guest, suspend host, resume host, guest is screwed
>
> correct?

Yep.

Thanks,
Miklos

2020-05-29 12:36:58

by Miklos Szeredi

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

On Fri, May 29, 2020 at 2:31 PM Miklos Szeredi <[email protected]> wrote:

> > Can you please describe the setup of this test?
> >
> > - Host kernel version

5.5.16-100.fc30.x86_64

> > - Guest kernel version

75caf310d16c ("Merge branch 'akpm' (patches from Andrew)")

> > - Is the revert done on the host or guest or both?

Guest.

> > - Test flow is:
> >
> > Boot host, start guest, suspend host, resume host, guest is screwed
> >
> > correct?
>
> Yep.
>
> Thanks,
> Miklos

2020-06-03 15:46:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

Miklos Szeredi <[email protected]> writes:
> On Fri, May 29, 2020 at 2:31 PM Miklos Szeredi <[email protected]> wrote:
>
>> > Can you please describe the setup of this test?
>> >
>> > - Host kernel version
>
> 5.5.16-100.fc30.x86_64
>
>> > - Guest kernel version
>
> 75caf310d16c ("Merge branch 'akpm' (patches from Andrew)")
>
>> > - Is the revert done on the host or guest or both?
>
> Guest.
>
>> > - Test flow is:
>> >
>> > Boot host, start guest, suspend host, resume host, guest is screwed
>> >
>> > correct?
>>
>> Yep.

Of course this does not reproduce here. What kind of host is this
running on? Can you provide a full demsg of the host please from boot to
post resume?

Thanks,

tglx

2020-06-03 15:49:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

Thomas Gleixner <[email protected]> writes:

> Miklos Szeredi <[email protected]> writes:
>> On Fri, May 29, 2020 at 2:31 PM Miklos Szeredi <[email protected]> wrote:
>>
>>> > Can you please describe the setup of this test?
>>> >
>>> > - Host kernel version
>>
>> 5.5.16-100.fc30.x86_64
>>
>>> > - Guest kernel version
>>
>> 75caf310d16c ("Merge branch 'akpm' (patches from Andrew)")
>>
>>> > - Is the revert done on the host or guest or both?
>>
>> Guest.
>>
>>> > - Test flow is:
>>> >
>>> > Boot host, start guest, suspend host, resume host, guest is screwed
>>> >
>>> > correct?
>>>
>>> Yep.
>
> Of course this does not reproduce here. What kind of host is this
> running on? Can you provide a full demsg of the host please from boot to
> post resume?

Plus /proc/cpuinfo please (one CPU is sufficient)

2020-06-04 17:36:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

Miklos,

Thomas Gleixner <[email protected]> writes:
>> Of course this does not reproduce here. What kind of host is this
>> running on? Can you provide a full demsg of the host please from boot to
>> post resume?
>
> Plus /proc/cpuinfo please (one CPU is sufficient)

thanks for providing the data. Unfortunately not really helpful. The
host has a non-stop TSC and the dmesg does not contain anything which
sheds light on this.

I grabbed a similar machine, installed a guest with 5.7 kernel and I'm
still unable to reproduce. No idea yet how to get down to the real root
cause of this.

Thanks,

tglx


2020-06-04 19:31:56

by Miklos Szeredi

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

On Thu, Jun 4, 2020 at 7:30 PM Thomas Gleixner <[email protected]> wrote:
>
> Miklos,
>
> Thomas Gleixner <[email protected]> writes:
> >> Of course this does not reproduce here. What kind of host is this
> >> running on? Can you provide a full demsg of the host please from boot to
> >> post resume?
> >
> > Plus /proc/cpuinfo please (one CPU is sufficient)
>
> thanks for providing the data. Unfortunately not really helpful. The
> host has a non-stop TSC and the dmesg does not contain anything which
> sheds light on this.
>
> I grabbed a similar machine, installed a guest with 5.7 kernel and I'm
> still unable to reproduce. No idea yet how to get down to the real root
> cause of this.

Well, I have neither. But more investigation turned up some interesting things.

time(2) returns good time, while clock_gettime(2) returns bad time.
Here's an example:

time=1591298725 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
time=1591298726 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
time=1591298727 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
time=1591298728 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
time=1591298729 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582

As you can see, only time(2) is updated, the others remain the same.
date(1) uses clock_gettime(CLOCK_REALTIME) so that shows the bad date.

When the correct time reaches the value returned by CLOCK_REALTIME,
the value jumps exactly 2199 seconds.

Does that make any sense?

Thanks,
Miklos

2020-06-04 22:29:29

by Paolo Bonzini

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

On 04/06/20 21:28, Miklos Szeredi wrote:
> time(2) returns good time, while clock_gettime(2) returns bad time.
> Here's an example:
>
> time=1591298725 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> time=1591298726 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> time=1591298727 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> time=1591298728 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> time=1591298729 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
>
> As you can see, only time(2) is updated, the others remain the same.
> date(1) uses clock_gettime(CLOCK_REALTIME) so that shows the bad date.
>
> When the correct time reaches the value returned by CLOCK_REALTIME,
> the value jumps exactly 2199 seconds.

clockid_to_kclock(CLOCK_REALTIME) is &clock_realtime, so clock_gettime
calls ktime_get_real_ts64, which is:


do {
seq = read_seqcount_begin(&tk_core.seq);

ts->tv_sec = tk->xtime_sec;
nsecs = timekeeping_get_ns(&tk->tkr_mono);

} while (read_seqcount_retry(&tk_core.seq, seq));

ts->tv_nsec = 0;
timespec64_add_ns(ts, nsecs);

time(2) instead should actually be gettimeofday(2), which just returns
tk->xtime_sec. So the problem is the nanosecond part which is off by
2199*10^9 nanoseconds, and that is suspiciously close to 2^31...

Paolo

2020-06-05 07:38:30

by Miklos Szeredi

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

On Thu, Jun 4, 2020 at 10:14 PM Paolo Bonzini <[email protected]> wrote:
>
> On 04/06/20 21:28, Miklos Szeredi wrote:
> > time(2) returns good time, while clock_gettime(2) returns bad time.
> > Here's an example:
> >
> > time=1591298725 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> > time=1591298726 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> > time=1591298727 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> > time=1591298728 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> > time=1591298729 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> >
> > As you can see, only time(2) is updated, the others remain the same.
> > date(1) uses clock_gettime(CLOCK_REALTIME) so that shows the bad date.
> >
> > When the correct time reaches the value returned by CLOCK_REALTIME,
> > the value jumps exactly 2199 seconds.
>
> clockid_to_kclock(CLOCK_REALTIME) is &clock_realtime, so clock_gettime
> calls ktime_get_real_ts64, which is:
>
>
> do {
> seq = read_seqcount_begin(&tk_core.seq);
>
> ts->tv_sec = tk->xtime_sec;
> nsecs = timekeeping_get_ns(&tk->tkr_mono);
>
> } while (read_seqcount_retry(&tk_core.seq, seq));
>
> ts->tv_nsec = 0;
> timespec64_add_ns(ts, nsecs);
>
> time(2) instead should actually be gettimeofday(2), which just returns
> tk->xtime_sec. So the problem is the nanosecond part which is off by
> 2199*10^9 nanoseconds, and that is suspiciously close to 2^31...

Yep: looking at the nanosecond values as well, the difference is
exactly 2199023255552 which is 2^41.

Thanks,
Miklos

2020-06-05 08:14:22

by Paolo Bonzini

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

On 05/06/20 09:35, Miklos Szeredi wrote:
>> time(2) instead should actually be gettimeofday(2), which just returns
>> tk->xtime_sec. So the problem is the nanosecond part which is off by
>> 2199*10^9 nanoseconds, and that is suspiciously close to 2^31...
> Yep: looking at the nanosecond values as well, the difference is
> exactly 2199023255552 which is 2^41.

Umpf, I was off by 3, it's not related to 2^31. But yeah the cause of
the bug seems to be the botched nanosecond part, which I'm sure is not
supposed to be much bigger than 10^9.

Paolo

2020-06-05 10:14:24

by Thomas Gleixner

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

Paolo Bonzini <[email protected]> writes:
> On 04/06/20 21:28, Miklos Szeredi wrote:
>> time(2) returns good time, while clock_gettime(2) returns bad time.
>> Here's an example:
>>
>> time=1591298725 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
>> time=1591298726 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
>> time=1591298727 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
>> time=1591298728 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
>> time=1591298729 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
>>
>> As you can see, only time(2) is updated, the others remain the same.
>> date(1) uses clock_gettime(CLOCK_REALTIME) so that shows the bad date.
>>
>> When the correct time reaches the value returned by CLOCK_REALTIME,
>> the value jumps exactly 2199 seconds.

Which value jumps?

> clockid_to_kclock(CLOCK_REALTIME) is &clock_realtime, so clock_gettime
> calls ktime_get_real_ts64, which is:
>
>
> do {
> seq = read_seqcount_begin(&tk_core.seq);
>
> ts->tv_sec = tk->xtime_sec;
> nsecs = timekeeping_get_ns(&tk->tkr_mono);
>
> } while (read_seqcount_retry(&tk_core.seq, seq));
>
> ts->tv_nsec = 0;
> timespec64_add_ns(ts, nsecs);
>
> time(2) instead should actually be gettimeofday(2), which just returns
> tk->xtime_sec.

time(2) is either handled in the VDSO or it is handled via syscall and
yes, it's only looking at the xtime_sec value.

gettimeofday(2) returns seconds and microseconds. It's using the same
mechanism as clock_gettime(2) and divides the nanoseconds part by 1000.

> So the problem is the nanosecond part which is off by
> 2199*10^9 nanoseconds, and that is suspiciously close to 2^31...

Not really. It's 2^41.

I can actually now reproduce, but I won't be able to investigate that
before monday.

Thanks,

tglx

2020-06-05 10:24:11

by Miklos Szeredi

[permalink] [raw]
Subject: Re: system time goes weird in kvm guest after host suspend/resume

On Fri, Jun 5, 2020 at 12:12 PM Thomas Gleixner <[email protected]> wrote:
>
> Paolo Bonzini <[email protected]> writes:
> > On 04/06/20 21:28, Miklos Szeredi wrote:
> >> time(2) returns good time, while clock_gettime(2) returns bad time.
> >> Here's an example:
> >>
> >> time=1591298725 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> >> time=1591298726 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> >> time=1591298727 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> >> time=1591298728 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> >> time=1591298729 RT=1591300383 MONO=39582 MONO_RAW=39582 BOOT=39582
> >>
> >> As you can see, only time(2) is updated, the others remain the same.
> >> date(1) uses clock_gettime(CLOCK_REALTIME) so that shows the bad date.
> >>
> >> When the correct time reaches the value returned by CLOCK_REALTIME,
> >> the value jumps exactly 2199 seconds.
>
> Which value jumps?

The time value returned by clock_gettime() and gettimeofday() jumps by
2199s after the resume of host, and then again and again jumps with a
period of 2^41 ns.

Only time() returns the correct value and that appears to use the VDSO
since there's no time() syscall in strace.

> I can actually now reproduce, but I won't be able to investigate that
> before monday.

No problem. Good news that you can reproduce it now.

Thanks,
Miklos