2021-12-15 17:31:11

by info

[permalink] [raw]
Subject: CLOCK_MONOTONIC after suspend

Hi,



I have a comment/question related to this admittedly quite old commit:

- Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME,
https://github.com/torvalds/linux/commit/a3ed0e4393d6885b4af7ce84b437dc696490a530#diff-2278494fe0e3426f0e89d14f1f09e5e24923dc29a0f973250081f70416ade7dc



It states "(...) As reported by several folks systemd and other
applications rely on the documented behaviour of CLOCK_MONOTONIC on
Linux and break with the above changes. After resume daemons time out
and other timeout related issues are observed. Rafael compiled this
list: (...)".



From user space perspective similar issues can still be observed. I
guess these ostensible time jumps happen because user space is frozen
before the kernel fell asleep and vice versa on suspend:

dT = (T_KS_asleep – T_US_asleep) + (T_US_awake – T_KS_awake) // T: point
in time, KS: kernel space, US: user space



With a simple user space program that prints out the monotonic time each
100ms along with the day time, I did some measurements on my notebook.
It reveals the following discrepancies (time gaps) between the last time
stamp written before suspend and the first time stamp after resume:



dT in [s] #1 #2 #3 #4 #5 #6 #7

Suspend2RAM 6.409 6.423 7.451 3.444 7.815 5.655 7.178

Suspend2Disk 5.228 2.683 5.072 5.198 4.806 5.763 6.908



Is this effect known and accepted or is there some way to prevent or
mitigate it?



Thanks,

Dirk



2021-12-15 19:18:03

by Thomas Gleixner

[permalink] [raw]
Subject: Re: CLOCK_MONOTONIC after suspend

Dirk,

On Wed, Dec 15 2021 at 18:30, [email protected] wrote:
> dT = (T_KS_asleep – T_US_asleep) + (T_US_awake – T_KS_awake) // T: point
> in time, KS: kernel space, US: user space
>
> With a simple user space program that prints out the monotonic time each
> 100ms along with the day time, I did some measurements on my notebook.
> It reveals the following discrepancies (time gaps) between the last time
> stamp written before suspend and the first time stamp after resume:
>
> dT in [s] #1 #2 #3 #4 #5 #6 #7
>
> Suspend2RAM 6.409 6.423 7.451 3.444 7.815 5.655 7.178
>
> Suspend2Disk 5.228 2.683 5.072 5.198 4.806 5.763 6.908
>
> Is this effect known and accepted or is there some way to prevent or
> mitigate it?

there is not much the kernel can do about that.

Timekeeping can only stop at the very latest moment and has to resume
immediately when the CPU comes back. That's a matter of internal
correctness.

Yes, user space has to be frozen first in order to make that work and is
obviously unfrozen last. So the timeline looks like this:

T0 suspend is initiated
T1 user space freeze
T2 kernel shuts down - timekeeping freeze
T3 kernel resumes - timekeeping resume
T4 user space unfreeze

So the deltas T2 - T1, T4 - T3 are what matter for your user space
program. Those deltas heavily depend on the amount of drivers,
outstanding disk operations etc. So your milage will vary.

Thanks,

tglx