2018-03-01 16:54:24

by Thomas Gleixner

[permalink] [raw]
Subject: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

While working through the melted spectrum induced backlog I found that last
years discussion about unifying clock MONOTONIC and clock BOOTTIME has died
out and obviously nobody had time to do testing on that half baken RFC patch:

https://lkml.kernel.org/r/alpine.DEB.2.20.1711180123360.2186@nanos

So I sat down and revisited it a bit more carefully than last time. The
following series is the result of this work. The main differences are:

1) Provide CLOCK_MONOTONIC_ACTIVE which has the current CLOCK_MONOTONIC
behaviour, i.e. it's monotonic time since boot minus the time spent in
suspend.

The reason why I introduced it is that currently it's possible for user
space to determine the time spent in suspend by subtracting clock
monotonic and clock boot time time stamps. When CLOCK_MONOTONIC behaves
like CLOCK_BOOTTIME then this is not longer possible.

As a side effect when CLOCK_MONOTONIC_ACTIVE is supported by a kernel
then this is also an indicator that CLOCK_MONOTONIC and CLOCK_BOOTTIME
are identical. So it's possible to determine this w/o playing games
with kernel versions etc. Older kernels simply return -EiNVAL if
CLOCK_MONOTONIC_ACTIVE is requested in clock_gettime().

2) Fixed a few places which I missed last time, which means also that my
warning in the original changelog was justified.

3) Consolidated the core users to get rid of the seperate implementations
for the two clocks

Note, there are user space visible side effects. The difference between
CLOCK_MONOTONIC and CLOCK_BOOTTIME is well documented and it's unclear
whether applications rely on it or not.

This really needs lot of testing, documentation updates and more input from
userspace folks to make a final decision.

Thanks,

tglx

---

Documentation/trace/ftrace.txt | 14 +-----
drivers/input/evdev.c | 7 ---
include/linux/hrtimer.h | 2
include/linux/timekeeper_internal.h | 2
include/linux/timekeeping.h | 37 +++++------------
include/uapi/linux/time.h | 1
kernel/time/hrtimer.c | 16 -------
kernel/time/posix-stubs.c | 2
kernel/time/posix-timers.c | 26 ++++--------
kernel/time/tick-common.c | 15 ++++++
kernel/time/tick-internal.h | 6 ++
kernel/time/tick-sched.c | 9 ++++
kernel/time/timekeeping.c | 78 ++++++++++++++++++------------------
kernel/time/timekeeping.h | 1
kernel/trace/trace.c | 2
15 files changed, 104 insertions(+), 114 deletions(-)













2018-03-01 17:24:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Thu, Mar 1, 2018 at 8:33 AM, Thomas Gleixner <[email protected]> wrote:
>
> This really needs lot of testing, documentation updates and more input from
> userspace folks to make a final decision.

Honestly, I don't think we'd get the testing this kind of change needs
except by just trying it.

I'm willing to merge this in the 4.17 merge window, with the
understanding that if people end up reporting issues, we may just have
to revert it all, and chalk it up to a learning experience - and add
the appropriate commentary in the kernel code about exactly what it
was that depended on that MONO/BOOT difference.

One non-technical thing I would ask: use some other word than
"conflate". Maybe just "combine". Or better yet, "unify".

"Conflate" technically and historically means the same thing as
combine, but has very much gathered a side meaning of "confuse".

So yes, "conflate" is indeed about mixing or combining, but it's
typically used in the sense of a *bad* combination or mixing. So
"trying to conflate two issues" means "trying to mix two issues that
are not the same into one".

So "unify" and "conflate" mean both the same thing and almost exactly
the opposite at the same time.

And yes, you will find dictionaries (and linguists) that hold purely
to the old meaning. As always, there are fogeys that can't get over
the fact that meanings meander and change.

Linus

2018-03-01 18:42:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Thu, 1 Mar 2018, Linus Torvalds wrote:
> On Thu, Mar 1, 2018 at 8:33 AM, Thomas Gleixner <[email protected]> wrote:
> >
> > This really needs lot of testing, documentation updates and more input from
> > userspace folks to make a final decision.
>
> Honestly, I don't think we'd get the testing this kind of change needs
> except by just trying it.
>
> I'm willing to merge this in the 4.17 merge window, with the
> understanding that if people end up reporting issues, we may just have
> to revert it all, and chalk it up to a learning experience - and add
> the appropriate commentary in the kernel code about exactly what it
> was that depended on that MONO/BOOT difference.

Fair enough. So we maybe just merge the first two patches and merge the
cleanups and consolidation patches when we feel good enough.

I surely can queue the whole lot in next, but from PTI the experience I
know how good the test coverage is. 4.14.stable would be the ideal testing
ground. /me runs fast and hides

> One non-technical thing I would ask: use some other word than
> "conflate". Maybe just "combine". Or better yet, "unify".
>
> "Conflate" technically and historically means the same thing as
> combine, but has very much gathered a side meaning of "confuse".
>
> So yes, "conflate" is indeed about mixing or combining, but it's
> typically used in the sense of a *bad* combination or mixing. So
> "trying to conflate two issues" means "trying to mix two issues that
> are not the same into one".
>
> So "unify" and "conflate" mean both the same thing and almost exactly
> the opposite at the same time.
>
> And yes, you will find dictionaries (and linguists) that hold purely
> to the old meaning. As always, there are fogeys that can't get over
> the fact that meanings meander and change.

I'm old enough to have learned that conflate means unify or combine, but
I'm still not old enough to be stubborn about it :)

Thanks,

tglx

2018-03-01 18:52:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Thu, 1 Mar 2018 19:41:35 +0100 (CET)
Thomas Gleixner <[email protected]> wrote:

> I'm old enough to have learned that conflate means unify or combine, but
> I'm still not old enough to be stubborn about it :)

You need to watch more American Cable News channels to know what
"conflate" means today.

-- Steve

2018-03-01 19:12:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Thu, 1 Mar 2018, Thomas Gleixner wrote:
> On Thu, 1 Mar 2018, Linus Torvalds wrote:
> > On Thu, Mar 1, 2018 at 8:33 AM, Thomas Gleixner <[email protected]> wrote:
> > >
> > > This really needs lot of testing, documentation updates and more input from
> > > userspace folks to make a final decision.
> >
> > Honestly, I don't think we'd get the testing this kind of change needs
> > except by just trying it.
> >
> > I'm willing to merge this in the 4.17 merge window, with the
> > understanding that if people end up reporting issues, we may just have
> > to revert it all, and chalk it up to a learning experience - and add
> > the appropriate commentary in the kernel code about exactly what it
> > was that depended on that MONO/BOOT difference.
>
> Fair enough. So we maybe just merge the first two patches and merge the
> cleanups and consolidation patches when we feel good enough.
>
> I surely can queue the whole lot in next, but from PTI the experience I
> know how good the test coverage is. 4.14.stable would be the ideal testing
> ground. /me runs fast and hides

That said, at least the people who are asking for that should provide
testing results _before_ this gets applied or merged upstream.

Thanks,

tglx

2018-03-13 06:37:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME


* Linus Torvalds <[email protected]> wrote:

> On Thu, Mar 1, 2018 at 8:33 AM, Thomas Gleixner <[email protected]> wrote:
> >
> > This really needs lot of testing, documentation updates and more input from
> > userspace folks to make a final decision.
>
> Honestly, I don't think we'd get the testing this kind of change needs
> except by just trying it.
>
> I'm willing to merge this in the 4.17 merge window, with the
> understanding that if people end up reporting issues, we may just have
> to revert it all, and chalk it up to a learning experience - and add
> the appropriate commentary in the kernel code about exactly what it
> was that depended on that MONO/BOOT difference.
>
> One non-technical thing I would ask: use some other word than
> "conflate". Maybe just "combine". Or better yet, "unify".

Ok, I have edited all the changelogs accordingly (and also flipped around the
'clock MONOTONIC' language to the more readable 'the MONOTONIC clock' variant),
the resulting titles are (in order):

72199320d49d: timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock
d6ed449afdb3: timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock
f2d6fdbfd238: Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior
d6c7270e913d: timekeeping: Remove boot time specific code
7250a4047aa6: posix-timers: Unify MONOTONIC and BOOTTIME clock behavior
127bfa5f4342: hrtimer: Unify MONOTONIC and BOOTTIME clock behavior
92af4dcb4e1c: tracing: Unify the "boot" and "mono" tracing clocks

I'll push these out after testing.

Thanks,

Ingo

2018-03-13 18:12:42

by John Stultz

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Mon, Mar 12, 2018 at 11:36 PM, Ingo Molnar <[email protected]> wrote:
>
> * Linus Torvalds <[email protected]> wrote:
>
>> On Thu, Mar 1, 2018 at 8:33 AM, Thomas Gleixner <[email protected]> wrote:
>> >
>> > This really needs lot of testing, documentation updates and more input from
>> > userspace folks to make a final decision.
>>
>> Honestly, I don't think we'd get the testing this kind of change needs
>> except by just trying it.
>>
>> I'm willing to merge this in the 4.17 merge window, with the
>> understanding that if people end up reporting issues, we may just have
>> to revert it all, and chalk it up to a learning experience - and add
>> the appropriate commentary in the kernel code about exactly what it
>> was that depended on that MONO/BOOT difference.
>>
>> One non-technical thing I would ask: use some other word than
>> "conflate". Maybe just "combine". Or better yet, "unify".
>
> Ok, I have edited all the changelogs accordingly (and also flipped around the
> 'clock MONOTONIC' language to the more readable 'the MONOTONIC clock' variant),
> the resulting titles are (in order):
>
> 72199320d49d: timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock
> d6ed449afdb3: timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock
> f2d6fdbfd238: Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior
> d6c7270e913d: timekeeping: Remove boot time specific code
> 7250a4047aa6: posix-timers: Unify MONOTONIC and BOOTTIME clock behavior
> 127bfa5f4342: hrtimer: Unify MONOTONIC and BOOTTIME clock behavior
> 92af4dcb4e1c: tracing: Unify the "boot" and "mono" tracing clocks
>
> I'll push these out after testing.

I'm still anxious about userspace effects given how much I've seen the
current behavior documented, and wouldn't pushed for this myself (I'm
a worrier), but at least I'm not seeing any failures in initial
testing w/ kselftest so far.

thanks
-john

2018-04-20 04:39:33

by David Herrmann

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

Hey

On Tue, Mar 13, 2018 at 7:11 PM, John Stultz <[email protected]> wrote:
> On Mon, Mar 12, 2018 at 11:36 PM, Ingo Molnar <[email protected]> wrote:
>> Ok, I have edited all the changelogs accordingly (and also flipped around the
>> 'clock MONOTONIC' language to the more readable 'the MONOTONIC clock' variant),
>> the resulting titles are (in order):
>>
>> 72199320d49d: timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock
>> d6ed449afdb3: timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock
>> f2d6fdbfd238: Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior
>> d6c7270e913d: timekeeping: Remove boot time specific code
>> 7250a4047aa6: posix-timers: Unify MONOTONIC and BOOTTIME clock behavior
>> 127bfa5f4342: hrtimer: Unify MONOTONIC and BOOTTIME clock behavior
>> 92af4dcb4e1c: tracing: Unify the "boot" and "mono" tracing clocks
>>
>> I'll push these out after testing.
>
> I'm still anxious about userspace effects given how much I've seen the
> current behavior documented, and wouldn't pushed for this myself (I'm
> a worrier), but at least I'm not seeing any failures in initial
> testing w/ kselftest so far.

I get lots of timer-errors on Arch-Linux booting current master, after
a suspend/resume cycle. Just a selection of errors I see on resume:

systemd[1]: systemd-journald.service: Main process exited,
code=dumped, status=6/ABRT
rtkit-daemon[742]: The canary thread is apparently starving. Taking action.
systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)!
systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
kernel: e1000e 0000:00:1f.6: Failed to restore TIMINCA clock rate delta: -22

Lots of crashes with SIGABRT due to these.

I did not bisect it, but it sounds related to me. Also, user-space
uses CLOCK_MONOTONIC for watchdog timers. That is, a process is
required to respond to a watchdog-request in a given MONOTONIC
time-frame. If this jumps during suspend/resume, watchdogs will fire
immediately. I don't see how this can work with the new MONOTONIC
behavior?

Thanks
David

2018-04-20 05:46:22

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On (04/20/18 06:37), David Herrmann wrote:
>
> I get lots of timer-errors on Arch-Linux booting current master, after
> a suspend/resume cycle. Just a selection of errors I see on resume:

Hello David,
Any chance you can revert the patches in question and test? I'm running
ARCH (4.17.0-rc1-dbg-00042-gaa03ddd9c434) and suspend/resume cycle does
not trigger any errors. Except for this one

kernel: do_IRQ: 0.55 No irq handler for vector

> systemd[1]: systemd-journald.service: Main process exited,
> code=dumped, status=6/ABRT
> rtkit-daemon[742]: The canary thread is apparently starving. Taking action.
> systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)!
> systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
> kernel: e1000e 0000:00:1f.6: Failed to restore TIMINCA clock rate delta: -22
>
> Lots of crashes with SIGABRT due to these.
>
> I did not bisect it, but it sounds related to me. Also, user-space
> uses CLOCK_MONOTONIC for watchdog timers. That is, a process is
> required to respond to a watchdog-request in a given MONOTONIC
> time-frame. If this jumps during suspend/resume, watchdogs will fire
> immediately. I don't see how this can work with the new MONOTONIC
> behavior?

-ss

2018-04-20 06:50:31

by David Herrmann

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

Hi

On Fri, Apr 20, 2018 at 7:44 AM, Sergey Senozhatsky
<[email protected]> wrote:
> On (04/20/18 06:37), David Herrmann wrote:
>>
>> I get lots of timer-errors on Arch-Linux booting current master, after
>> a suspend/resume cycle. Just a selection of errors I see on resume:
>
> Hello David,
> Any chance you can revert the patches in question and test? I'm running
> ARCH (4.17.0-rc1-dbg-00042-gaa03ddd9c434) and suspend/resume cycle does
> not trigger any errors. Except for this one
>
> kernel: do_IRQ: 0.55 No irq handler for vector

I can easily reproduce it by sleeping for >5min, so the systemd
watchdog timers are triggered. The patches don't revert cleanly, so I
didn't look into booting without them, yet. I will try just linking
the monotonic clock to the monotonic_active clock later.

Also, doesn't this hunk in 72199320d49d need a 'break;':

diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
index b258bee13b02..6259dbc0191a 100644
--- a/kernel/time/posix-stubs.c
+++ b/kernel/time/posix-stubs.c
@@ -73,6 +73,8 @@ int do_clock_gettime(clockid_t which_clock, struct
timespec64 *tp)
case CLOCK_BOOTTIME:
get_monotonic_boottime64(tp);
break;
+ case CLOCK_MONOTONIC_ACTIVE:
+ ktime_get_active_ts64(tp);
default:
return -EINVAL;
}

2018-04-24 00:51:00

by Genki Sky

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

Hello,

I came across this thread for same reason as [0]: Daemons getting
killed by systemd on resume (after >WatchdogSec seconds of
suspending). I'm using master branch of systemd and the kernel. As
mentioned, systemd uses CLOCK_MONOTONIC, originally expecting it to
not include suspend time.

Correct me if I'm mistaken, but I don't see the ambiguity of whether
this patch series breaks systemd. If it's implemented correctly, you'd
hope it *would* break it!

As a random end user, I see three options to get suspend/resume
working again on my laptop:

(A) Change systemd to keep track of the difference between
CLOCK_MONOTONIC and CLOCK_MONOTONIC_ACTIVE, using their difference via
clock_gettime(). This seems to be what the author of this patch series
intends (?).

(B) Implement timerfd_*(2) for CLOCK_MONOTONIC_ACTIVE in the kernel.
Do a sed s/CLOCK_MONOTONIC/&_ACTIVE in systemd source code.

(C) Do a 90% reverting of this patch series. Just introduce
CLOCK_MONOTONIC_ACTIVE as the "what you should use", document and
publicize this fact, and sometime in the future (monotonically
speaking :) finally unify CLOCK_MONOTONIC and CLOCK_BOOTTIME.

Thoughts?

Also, agreed, the missing break in do_clock_gettime() should be fixed,
as David mentioned in [1]. Is someone already patching this?

[0]: http://lkml.kernel.org/r/CANq1E4Tf27FJHTrO4ZVtWhYK=DLmwzszK=njOpgXZZXqzAOunA@mail.gmail.com
[1]: http://lkml.kernel.org/r/CANq1E4QppGAaU5PYbGpuC00S6wGQcAt70Z7CntZHiLC6748z2A@mail.gmail.com

Genki

2018-04-24 02:47:45

by Genki Sky

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

Quoting Genki Sky (2018/04/23 20:40:36 -0400)
> I came across this thread for same reason as [0]: Daemons getting
> killed by systemd on resume (after >WatchdogSec seconds of
> suspending). I'm using master branch of systemd and the kernel. As
> mentioned, systemd uses CLOCK_MONOTONIC, originally expecting it to
> not include suspend time.
>
> Correct me if I'm mistaken, but I don't see the ambiguity of whether
> this patch series breaks systemd. If it's implemented correctly, you'd
> hope it *would* break it!

This sounded a little weak on re-reading, sorry. So, I just confirmed
that after booting a "git revert -m 1 680014d6d1da", the issue no
longer appears. (I.e., a suspend for >WatchDog sec doesn't result in
any daemon getting killed).

Let me know if I can help in any way.

2018-04-24 03:05:16

by John Stultz

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Mon, Apr 23, 2018 at 7:45 PM, Genki Sky <[email protected]> wrote:
> Quoting Genki Sky (2018/04/23 20:40:36 -0400)
>> I came across this thread for same reason as [0]: Daemons getting
>> killed by systemd on resume (after >WatchdogSec seconds of
>> suspending). I'm using master branch of systemd and the kernel. As
>> mentioned, systemd uses CLOCK_MONOTONIC, originally expecting it to
>> not include suspend time.
>>
>> Correct me if I'm mistaken, but I don't see the ambiguity of whether
>> this patch series breaks systemd. If it's implemented correctly, you'd
>> hope it *would* break it!
>
> This sounded a little weak on re-reading, sorry. So, I just confirmed
> that after booting a "git revert -m 1 680014d6d1da", the issue no
> longer appears. (I.e., a suspend for >WatchDog sec doesn't result in
> any daemon getting killed).
>
> Let me know if I can help in any way.

Yea, this is the sort of thing I was worried about.

Thomas: I think reverting this change is needed.

thanks
-john

2018-04-24 08:11:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Mon, 23 Apr 2018, John Stultz wrote:

> On Mon, Apr 23, 2018 at 7:45 PM, Genki Sky <[email protected]> wrote:
> > Quoting Genki Sky (2018/04/23 20:40:36 -0400)
> >> I came across this thread for same reason as [0]: Daemons getting
> >> killed by systemd on resume (after >WatchdogSec seconds of
> >> suspending). I'm using master branch of systemd and the kernel. As
> >> mentioned, systemd uses CLOCK_MONOTONIC, originally expecting it to
> >> not include suspend time.
> >>
> >> Correct me if I'm mistaken, but I don't see the ambiguity of whether
> >> this patch series breaks systemd. If it's implemented correctly, you'd
> >> hope it *would* break it!
> >
> > This sounded a little weak on re-reading, sorry. So, I just confirmed
> > that after booting a "git revert -m 1 680014d6d1da", the issue no
> > longer appears. (I.e., a suspend for >WatchDog sec doesn't result in
> > any daemon getting killed).
> >
> > Let me know if I can help in any way.
>
> Yea, this is the sort of thing I was worried about.
>
> Thomas: I think reverting this change is needed.

Sigh. I hoped that something like this would be catched before I sent the
pull request by those who were actually interested in this change...

I'll try to distangle it.

Thanks,

tglx


2018-04-24 14:33:03

by Genki Sky

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

Sorry to have been the bearer of bad news :(. Again, I just have my
user hat on here. It does seem like this unifying would have been nice
to have. And even, more compliant with the POSIX definition of
MONOTONIC...

On that note, maybe it is still worth introducing MONOTONIC_ACTIVE,
but just as an alias for MONOTONIC for now. It's also more
self-documenting. Then sometime in the future, if people switch over,
remove BOOTTIME and make MONOTONIC like BOOTTIME. Though this doesn't
help simplify the code, I know.

Thanks again,
Genki

2018-04-24 15:03:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Tue, 24 Apr 2018, Genki Sky wrote:
> Sorry to have been the bearer of bad news :(.

No problem. We're not shooting the messengers

> Again, I just have my user hat on here. It does seem like this unifying
> would have been nice to have. And even, more compliant with the POSIX
> definition of MONOTONIC...

Yes, that was the idea

> On that note, maybe it is still worth introducing MONOTONIC_ACTIVE,
> but just as an alias for MONOTONIC for now. It's also more
> self-documenting. Then sometime in the future, if people switch over,
> remove BOOTTIME and make MONOTONIC like BOOTTIME. Though this doesn't
> help simplify the code, I know.

It doesn't and it does not make applications magically make use of
MONOTONIC_ACTIVE. We're in a trap here.....

Thanks,

tglx

2018-04-25 06:51:41

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Tue 2018-04-24 10:09:28, Thomas Gleixner wrote:
> On Mon, 23 Apr 2018, John Stultz wrote:
>
> > On Mon, Apr 23, 2018 at 7:45 PM, Genki Sky <[email protected]> wrote:
> > > Quoting Genki Sky (2018/04/23 20:40:36 -0400)
> > >> I came across this thread for same reason as [0]: Daemons getting
> > >> killed by systemd on resume (after >WatchdogSec seconds of
> > >> suspending). I'm using master branch of systemd and the kernel. As
> > >> mentioned, systemd uses CLOCK_MONOTONIC, originally expecting it to
> > >> not include suspend time.
> > >>
> > >> Correct me if I'm mistaken, but I don't see the ambiguity of whether
> > >> this patch series breaks systemd. If it's implemented correctly, you'd
> > >> hope it *would* break it!
> > >
> > > This sounded a little weak on re-reading, sorry. So, I just confirmed
> > > that after booting a "git revert -m 1 680014d6d1da", the issue no
> > > longer appears. (I.e., a suspend for >WatchDog sec doesn't result in
> > > any daemon getting killed).
> > >
> > > Let me know if I can help in any way.
> >
> > Yea, this is the sort of thing I was worried about.
> >
> > Thomas: I think reverting this change is needed.
>
> Sigh. I hoped that something like this would be catched before I sent the
> pull request by those who were actually interested in this change...

Well, we had two regressions in -next this cycle... I reported both
but bisections were not easy and noone was really interested.

See

Re: linux-next on x60: network manager often complains "network is
disabled" after resume

Subject: Re: CLOCK_MONOTONIC, BOOTTIME, suspend and screensaver
regression

Can we expect this to be sorted out in v4.17-rc3? -next?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.83 kB)
signature.asc (188.00 B)
Digital signature
Download all attachments

2018-04-25 08:54:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Tuesday, April 24, 2018 10:09:28 AM CEST Thomas Gleixner wrote:
> On Mon, 23 Apr 2018, John Stultz wrote:
>
> > On Mon, Apr 23, 2018 at 7:45 PM, Genki Sky <[email protected]> wrote:
> > > Quoting Genki Sky (2018/04/23 20:40:36 -0400)
> > >> I came across this thread for same reason as [0]: Daemons getting
> > >> killed by systemd on resume (after >WatchdogSec seconds of
> > >> suspending). I'm using master branch of systemd and the kernel. As
> > >> mentioned, systemd uses CLOCK_MONOTONIC, originally expecting it to
> > >> not include suspend time.
> > >>
> > >> Correct me if I'm mistaken, but I don't see the ambiguity of whether
> > >> this patch series breaks systemd. If it's implemented correctly, you'd
> > >> hope it *would* break it!
> > >
> > > This sounded a little weak on re-reading, sorry. So, I just confirmed
> > > that after booting a "git revert -m 1 680014d6d1da", the issue no
> > > longer appears. (I.e., a suspend for >WatchDog sec doesn't result in
> > > any daemon getting killed).
> > >
> > > Let me know if I can help in any way.
> >
> > Yea, this is the sort of thing I was worried about.
> >
> > Thomas: I think reverting this change is needed.
>
> Sigh. I hoped that something like this would be catched before I sent the
> pull request by those who were actually interested in this change...

The "git revert -m 1 680014d6d1da" makes resume issues on my Aspire S5
go away (cf. https://marc.info/?l=linux-kernel&m=152460804018920&w=2).

I'll try with just the "monotonic" vs "boottime" clock changes reverted.

> I'll try to distangle it.

Cool.

Please let me know if you need any help.


2018-04-25 08:57:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Wednesday, April 25, 2018 8:50:15 AM CEST Pavel Machek wrote:
>
> --ReaqsoxgOBHFXBhH
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
>
> On Tue 2018-04-24 10:09:28, Thomas Gleixner wrote:
> > On Mon, 23 Apr 2018, John Stultz wrote:
> >=20
> > > On Mon, Apr 23, 2018 at 7:45 PM, Genki Sky <[email protected]> wrote:
> > > > Quoting Genki Sky (2018/04/23 20:40:36 -0400)
> > > >> I came across this thread for same reason as [0]: Daemons getting
> > > >> killed by systemd on resume (after >WatchdogSec seconds of
> > > >> suspending). I'm using master branch of systemd and the kernel. As
> > > >> mentioned, systemd uses CLOCK_MONOTONIC, originally expecting it to
> > > >> not include suspend time.
> > > >>
> > > >> Correct me if I'm mistaken, but I don't see the ambiguity of whether
> > > >> this patch series breaks systemd. If it's implemented correctly, you=
> 'd
> > > >> hope it *would* break it!
> > > >
> > > > This sounded a little weak on re-reading, sorry. So, I just confirmed
> > > > that after booting a "git revert -m 1 680014d6d1da", the issue no
> > > > longer appears. (I.e., a suspend for >WatchDog sec doesn't result in
> > > > any daemon getting killed).
> > > >
> > > > Let me know if I can help in any way.
> > >=20
> > > Yea, this is the sort of thing I was worried about.
> > >=20
> > > Thomas: I think reverting this change is needed.
> >=20
> > Sigh. I hoped that something like this would be catched before I sent the
> > pull request by those who were actually interested in this change...
>
> Well, we had two regressions in -next this cycle... I reported both
> but bisections were not easy and noone was really interested.

It's more that it was difficult to correlate the reported symptoms with a
particular set of kernel changes.


2018-04-25 09:51:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Wednesday, April 25, 2018 10:52:18 AM CEST Rafael J. Wysocki wrote:
> On Tuesday, April 24, 2018 10:09:28 AM CEST Thomas Gleixner wrote:
> > On Mon, 23 Apr 2018, John Stultz wrote:
> >
> > > On Mon, Apr 23, 2018 at 7:45 PM, Genki Sky <[email protected]> wrote:
> > > > Quoting Genki Sky (2018/04/23 20:40:36 -0400)
> > > >> I came across this thread for same reason as [0]: Daemons getting
> > > >> killed by systemd on resume (after >WatchdogSec seconds of
> > > >> suspending). I'm using master branch of systemd and the kernel. As
> > > >> mentioned, systemd uses CLOCK_MONOTONIC, originally expecting it to
> > > >> not include suspend time.
> > > >>
> > > >> Correct me if I'm mistaken, but I don't see the ambiguity of whether
> > > >> this patch series breaks systemd. If it's implemented correctly, you'd
> > > >> hope it *would* break it!
> > > >
> > > > This sounded a little weak on re-reading, sorry. So, I just confirmed
> > > > that after booting a "git revert -m 1 680014d6d1da", the issue no
> > > > longer appears. (I.e., a suspend for >WatchDog sec doesn't result in
> > > > any daemon getting killed).
> > > >
> > > > Let me know if I can help in any way.
> > >
> > > Yea, this is the sort of thing I was worried about.
> > >
> > > Thomas: I think reverting this change is needed.
> >
> > Sigh. I hoped that something like this would be catched before I sent the
> > pull request by those who were actually interested in this change...
>
> The "git revert -m 1 680014d6d1da" makes resume issues on my Aspire S5
> go away (cf. https://marc.info/?l=linux-kernel&m=152460804018920&w=2).
>
> I'll try with just the "monotonic" vs "boottime" clock changes reverted.

FWICS (so far) system resume still works here with the commits between
d6ed449afdb3 (timekeeping: Make the MONOTONIC clock behave like the BOOTTIME
clock) and 127bfa5f4342 (hrtimer: Unify MONOTONIC and BOOTTIME clock behavior)
inclusive reverted on top of 4.17-rc2. [I probably should revert commit
92af4dcb4e1c (tracing: Unify the "boot" and "mono" tracing clocks) too, but
it doesn't revert cleanly and it doesn't apprear to affect things here too.]


2018-04-25 13:04:43

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Wed, 25 Apr 2018, Rafael J. Wysocki wrote:
> On Wednesday, April 25, 2018 10:52:18 AM CEST Rafael J. Wysocki wrote:
> > On Tuesday, April 24, 2018 10:09:28 AM CEST Thomas Gleixner wrote:
> > > On Mon, 23 Apr 2018, John Stultz wrote:
> > >
> > > > On Mon, Apr 23, 2018 at 7:45 PM, Genki Sky <[email protected]> wrote:
> > > > > Quoting Genki Sky (2018/04/23 20:40:36 -0400)
> > > > >> I came across this thread for same reason as [0]: Daemons getting
> > > > >> killed by systemd on resume (after >WatchdogSec seconds of
> > > > >> suspending). I'm using master branch of systemd and the kernel. As
> > > > >> mentioned, systemd uses CLOCK_MONOTONIC, originally expecting it to
> > > > >> not include suspend time.
> > > > >>
> > > > >> Correct me if I'm mistaken, but I don't see the ambiguity of whether
> > > > >> this patch series breaks systemd. If it's implemented correctly, you'd
> > > > >> hope it *would* break it!
> > > > >
> > > > > This sounded a little weak on re-reading, sorry. So, I just confirmed
> > > > > that after booting a "git revert -m 1 680014d6d1da", the issue no
> > > > > longer appears. (I.e., a suspend for >WatchDog sec doesn't result in
> > > > > any daemon getting killed).
> > > > >
> > > > > Let me know if I can help in any way.
> > > >
> > > > Yea, this is the sort of thing I was worried about.
> > > >
> > > > Thomas: I think reverting this change is needed.
> > >
> > > Sigh. I hoped that something like this would be catched before I sent the
> > > pull request by those who were actually interested in this change...
> >
> > The "git revert -m 1 680014d6d1da" makes resume issues on my Aspire S5
> > go away (cf. https://marc.info/?l=linux-kernel&m=152460804018920&w=2).
> >
> > I'll try with just the "monotonic" vs "boottime" clock changes reverted.
>
> FWICS (so far) system resume still works here with the commits between
> d6ed449afdb3 (timekeeping: Make the MONOTONIC clock behave like the BOOTTIME
> clock) and 127bfa5f4342 (hrtimer: Unify MONOTONIC and BOOTTIME clock behavior)
> inclusive reverted on top of 4.17-rc2. [I probably should revert commit
> 92af4dcb4e1c (tracing: Unify the "boot" and "mono" tracing clocks) too, but
> it doesn't revert cleanly and it doesn't apprear to affect things here too.]

Right, it does not matter. The real interesting one is d6ed449afdb3.

Thanks,

tglx

2018-04-26 07:07:30

by Mike Galbraith

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Wed, 2018-04-25 at 15:03 +0200, Thomas Gleixner wrote:
> Right, it does not matter. The real interesting one is d6ed449afdb3.

FWIW, three boxen here suspend/resume fine, but repeatably exhibit the
below after a very few minute suspend, and a short bisect fingered your
suspect. Distro is opensuse 42.3.

[ 211.113902] Restarting tasks ... done.
[ 211.114817] PM: suspend exit
[ 212.312993] systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal corrupted or uncleanly shut down, renaming and replacing.
[ 212.313363] systemd-coredump[7264]: Detected coredump of the journal daemon itself, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.0aa39276decf4f1ab6fda3464e31f9dd.582.1524720954000000.


2018-04-26 07:43:51

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Thu, 26 Apr 2018, Mike Galbraith wrote:
> On Wed, 2018-04-25 at 15:03 +0200, Thomas Gleixner wrote:
> > Right, it does not matter. The real interesting one is d6ed449afdb3.
>
> FWIW, three boxen here suspend/resume fine, but repeatably exhibit the
> below after a very few minute suspend, and a short bisect fingered your
> suspect. Distro is opensuse 42.3.
>
> [ 211.113902] Restarting tasks ... done.
> [ 211.114817] PM: suspend exit
> [ 212.312993] systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal corrupted or uncleanly shut down, renaming and replacing.
> [ 212.313363] systemd-coredump[7264]: Detected coredump of the journal daemon itself, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.0aa39276decf4f1ab6fda3464e31f9dd.582.1524720954000000.
>

Huch, that rather looks like a genuine application bug.

Thanks,

tglx


2018-04-26 08:37:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Thu, Apr 26, 2018 at 9:42 AM, Thomas Gleixner <[email protected]> wrote:
> On Thu, 26 Apr 2018, Mike Galbraith wrote:
>> On Wed, 2018-04-25 at 15:03 +0200, Thomas Gleixner wrote:
>> > Right, it does not matter. The real interesting one is d6ed449afdb3.
>>
>> FWIW, three boxen here suspend/resume fine, but repeatably exhibit the
>> below after a very few minute suspend, and a short bisect fingered your
>> suspect. Distro is opensuse 42.3.
>>
>> [ 211.113902] Restarting tasks ... done.
>> [ 211.114817] PM: suspend exit
>> [ 212.312993] systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal corrupted or uncleanly shut down, renaming and replacing.
>> [ 212.313363] systemd-coredump[7264]: Detected coredump of the journal daemon itself, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.0aa39276decf4f1ab6fda3464e31f9dd.582.1524720954000000.
>>
>
> Huch, that rather looks like a genuine application bug.

Well, say you set a timer to wake you up in X seconds. When you wake
up, you look at a clock and see that Y seconds have passed and Y is
much greater than X. I guess you'd think that something's wrong. :-)

2018-04-26 08:52:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Thu, 26 Apr 2018, Rafael J. Wysocki wrote:

> On Thu, Apr 26, 2018 at 9:42 AM, Thomas Gleixner <[email protected]> wrote:
> > On Thu, 26 Apr 2018, Mike Galbraith wrote:
> >> On Wed, 2018-04-25 at 15:03 +0200, Thomas Gleixner wrote:
> >> > Right, it does not matter. The real interesting one is d6ed449afdb3.
> >>
> >> FWIW, three boxen here suspend/resume fine, but repeatably exhibit the
> >> below after a very few minute suspend, and a short bisect fingered your
> >> suspect. Distro is opensuse 42.3.
> >>
> >> [ 211.113902] Restarting tasks ... done.
> >> [ 211.114817] PM: suspend exit
> >> [ 212.312993] systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal corrupted or uncleanly shut down, renaming and replacing.
> >> [ 212.313363] systemd-coredump[7264]: Detected coredump of the journal daemon itself, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.0aa39276decf4f1ab6fda3464e31f9dd.582.1524720954000000.
> >>
> >
> > Huch, that rather looks like a genuine application bug.
>
> Well, say you set a timer to wake you up in X seconds. When you wake
> up, you look at a clock and see that Y seconds have passed and Y is
> much greater than X. I guess you'd think that something's wrong. :-)

And that makes you coredump, right? Brilliant choice.

Thanks,

tglx


2018-04-26 09:04:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC/RFT patch 0/7] timekeeping: Unify clock MONOTONIC and clock BOOTTIME

On Thu, Apr 26, 2018 at 10:51 AM, Thomas Gleixner <[email protected]> wrote:
> On Thu, 26 Apr 2018, Rafael J. Wysocki wrote:
>
>> On Thu, Apr 26, 2018 at 9:42 AM, Thomas Gleixner <[email protected]> wrote:
>> > On Thu, 26 Apr 2018, Mike Galbraith wrote:
>> >> On Wed, 2018-04-25 at 15:03 +0200, Thomas Gleixner wrote:
>> >> > Right, it does not matter. The real interesting one is d6ed449afdb3.
>> >>
>> >> FWIW, three boxen here suspend/resume fine, but repeatably exhibit the
>> >> below after a very few minute suspend, and a short bisect fingered your
>> >> suspect. Distro is opensuse 42.3.
>> >>
>> >> [ 211.113902] Restarting tasks ... done.
>> >> [ 211.114817] PM: suspend exit
>> >> [ 212.312993] systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal corrupted or uncleanly shut down, renaming and replacing.
>> >> [ 212.313363] systemd-coredump[7264]: Detected coredump of the journal daemon itself, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.0aa39276decf4f1ab6fda3464e31f9dd.582.1524720954000000.
>> >>
>> >
>> > Huch, that rather looks like a genuine application bug.
>>
>> Well, say you set a timer to wake you up in X seconds. When you wake
>> up, you look at a clock and see that Y seconds have passed and Y is
>> much greater than X. I guess you'd think that something's wrong. :-)
>
> And that makes you coredump, right? Brilliant choice.

That wouldn't be my choice, but some people do make choices like that
even in their personal lives ...