2021-02-10 10:45:46

by Hikaru Nishida

[permalink] [raw]
Subject: [RFC PATCH 0/2] Introduce a way to adjust CLOCK_BOOTTIME from userspace for VM guests

From: Hikaru Nishida <[email protected]>


Hi folks,

We'd like to add a sysfs interface that enable us to advance
CLOCK_BOOTTIME from userspace. The use case of this change is that
adjusting guest's CLOCK_BOOTTIME as host suspends to ensure that the
guest can notice the device has been suspended.
We have an application that rely on the difference between
CLOCK_BOOTTIME and CLOCK_MONOTONIC to detect whether the device went
suspend or not. However, the logic did not work well on VM environment
since most VMs are pausing the VM guests instead of actually suspending
them on the host's suspension.
With following patches, we can adjust CLOCK_BOOTTIME without actually
suspending guest and make the app working as intended.
I think this feature is also useful for other VM solutions since there
was no way to do this from userspace.

As far as I checked, it is working as expected but is there any concern
about this change? If so, please let me know.

Thanks,
Hikaru Nishida


Hikaru Nishida (2):
timekeeping: Add timekeeping_adjust_boottime
drivers/virt: introduce CLOCK_BOOTTIME adjustment sysfs interface
driver

drivers/virt/Kconfig | 9 ++++++
drivers/virt/Makefile | 1 +
drivers/virt/boottime_adj.c | 57 +++++++++++++++++++++++++++++++++++++
include/linux/timekeeping.h | 2 ++
kernel/time/timekeeping.c | 26 +++++++++++++++++
5 files changed, 95 insertions(+)
create mode 100644 drivers/virt/boottime_adj.c

--
2.30.0.478.g8a0d178c01-goog


2021-02-10 10:45:50

by Hikaru Nishida

[permalink] [raw]
Subject: [RFC PATCH 2/2] drivers/virt: introduce CLOCK_BOOTTIME adjustment sysfs interface driver

From: Hikaru Nishida <[email protected]>

This adds a sysfs interface /sys/kernel/boottime_adj to enable advancing
CLOCK_BOOTTIME from the userspace without actual susupend/resume cycles.

This gives a way to mitigate CLOCK_BOOTTIME divergence between guest
and host on virtualized environments after suspend/resume cycles on
the host.

We observed an issue of a guest application that expects there is a gap
between CLOCK_BOOTTIME and CLOCK_MONOTONIC after the device is suspended
to detect whether the device went into suspend or not.
Since the guest is paused instead of being actually suspended during the
host's suspension, guest kernel doesn't advance CLOCK_BOOTTIME correctly
and there is no way to correct that.

To solve the problem, this change introduces a way to modify a gap
between those clocks and align the timer behavior to host's one.

Signed-off-by: Hikaru Nishida <[email protected]>
---

drivers/virt/Kconfig | 9 ++++++
drivers/virt/Makefile | 1 +
drivers/virt/boottime_adj.c | 57 +++++++++++++++++++++++++++++++++++++
3 files changed, 67 insertions(+)
create mode 100644 drivers/virt/boottime_adj.c

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 80c5f9c16ec1..149b4e763e4d 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -13,6 +13,15 @@ menuconfig VIRT_DRIVERS

if VIRT_DRIVERS

+config BOOTTIME_ADJUSTMENT
+ tristate "CLOCK_BOOTTIME adjustment sysfs interface"
+ help
+ The CLOCK_BOOTTIME adjustment sysfs interface driver
+ provides a sysfs interface ( /sys/kernel/boottime_adj )
+ to enable adjusting CLOCK_BOOTTIME from the userspace.
+
+ If unsure, say N.
+
config FSL_HV_MANAGER
tristate "Freescale hypervisor management driver"
depends on FSL_SOC
diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index f28425ce4b39..1bbb476ddba9 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -3,6 +3,7 @@
# Makefile for drivers that support virtualization
#

+obj-$(CONFIG_BOOTTIME_ADJUSTMENT) += boottime_adj.o
obj-$(CONFIG_FSL_HV_MANAGER) += fsl_hypervisor.o
obj-y += vboxguest/

diff --git a/drivers/virt/boottime_adj.c b/drivers/virt/boottime_adj.c
new file mode 100644
index 000000000000..9cc717d8accc
--- /dev/null
+++ b/drivers/virt/boottime_adj.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * CLOCK_BOOTTIME Adjustment Interface Driver
+ */
+
+#include <linux/kobject.h>
+#include <linux/module.h>
+#include <linux/timekeeping.h>
+
+static struct kobject *kobj_boottime_adj;
+
+/*
+ * Write to /sys/kernel/boottime_adj advances CLOCK_BOOTTIME by given delta.
+ */
+static ssize_t boottime_adj_write(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf,
+ size_t count)
+{
+ int error;
+ struct timespec64 delta;
+
+ if (sscanf(buf, "%lld %ld", &delta.tv_sec, &delta.tv_nsec) != 2)
+ return -EINVAL;
+
+ error = timekeeping_adjust_boottime(&delta);
+ if (error)
+ return error;
+
+ pr_info("%s: CLOCK_BOOTTIME has been advanced by %+lld seconds and %+ld nanoseconds\n",
+ __func__, delta.tv_sec, delta.tv_nsec);
+ return count;
+}
+
+static struct kobj_attribute boottime_adj_attr =
+__ATTR(boottime_adj, 0200, NULL, boottime_adj_write);
+
+static int __init boottime_adj_init(void)
+{
+ int error;
+
+ error = sysfs_create_file(kernel_kobj, &boottime_adj_attr.attr);
+ if (error) {
+ pr_warn("%s: failed to init\n", __func__);
+ return error;
+ }
+ return 0;
+}
+
+static void __exit boottime_adj_cleanup(void)
+{
+ kobject_put(kobj_boottime_adj);
+}
+
+module_init(boottime_adj_init);
+module_exit(boottime_adj_cleanup);
+MODULE_DESCRIPTION("CLOCK_BOOTTIME adjustment interface driver");
+MODULE_LICENSE("GPL");
--
2.30.0.478.g8a0d178c01-goog

2021-02-10 10:53:20

by Greg KH

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] drivers/virt: introduce CLOCK_BOOTTIME adjustment sysfs interface driver

On Wed, Feb 10, 2021 at 07:39:08PM +0900, Hikaru Nishida wrote:
> From: Hikaru Nishida <[email protected]>
>
> This adds a sysfs interface /sys/kernel/boottime_adj to enable advancing
> CLOCK_BOOTTIME from the userspace without actual susupend/resume cycles.
>
> This gives a way to mitigate CLOCK_BOOTTIME divergence between guest
> and host on virtualized environments after suspend/resume cycles on
> the host.
>
> We observed an issue of a guest application that expects there is a gap
> between CLOCK_BOOTTIME and CLOCK_MONOTONIC after the device is suspended
> to detect whether the device went into suspend or not.
> Since the guest is paused instead of being actually suspended during the
> host's suspension, guest kernel doesn't advance CLOCK_BOOTTIME correctly
> and there is no way to correct that.
>
> To solve the problem, this change introduces a way to modify a gap
> between those clocks and align the timer behavior to host's one.
>
> Signed-off-by: Hikaru Nishida <[email protected]>
> ---
>
> drivers/virt/Kconfig | 9 ++++++
> drivers/virt/Makefile | 1 +
> drivers/virt/boottime_adj.c | 57 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 67 insertions(+)
> create mode 100644 drivers/virt/boottime_adj.c

No Documentation/ABI/ update for your new sysfs file? Please fix...

2021-02-10 10:55:09

by Greg KH

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Introduce a way to adjust CLOCK_BOOTTIME from userspace for VM guests

On Wed, Feb 10, 2021 at 07:39:06PM +0900, Hikaru Nishida wrote:
> From: Hikaru Nishida <[email protected]>
>
>
> Hi folks,
>
> We'd like to add a sysfs interface that enable us to advance
> CLOCK_BOOTTIME from userspace.

Why sysfs? What device is the clock that you are attaching to here?
Why not use the existing kernel api to adjust clocks?

thanks,

greg k-h

2021-02-10 10:56:19

by Greg KH

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] drivers/virt: introduce CLOCK_BOOTTIME adjustment sysfs interface driver

On Wed, Feb 10, 2021 at 07:39:08PM +0900, Hikaru Nishida wrote:
> From: Hikaru Nishida <[email protected]>
>
> This adds a sysfs interface /sys/kernel/boottime_adj to enable advancing
> CLOCK_BOOTTIME from the userspace without actual susupend/resume cycles.
>
> This gives a way to mitigate CLOCK_BOOTTIME divergence between guest
> and host on virtualized environments after suspend/resume cycles on
> the host.
>
> We observed an issue of a guest application that expects there is a gap
> between CLOCK_BOOTTIME and CLOCK_MONOTONIC after the device is suspended
> to detect whether the device went into suspend or not.
> Since the guest is paused instead of being actually suspended during the
> host's suspension, guest kernel doesn't advance CLOCK_BOOTTIME correctly
> and there is no way to correct that.
>
> To solve the problem, this change introduces a way to modify a gap
> between those clocks and align the timer behavior to host's one.
>
> Signed-off-by: Hikaru Nishida <[email protected]>
> ---
>
> drivers/virt/Kconfig | 9 ++++++
> drivers/virt/Makefile | 1 +
> drivers/virt/boottime_adj.c | 57 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 67 insertions(+)
> create mode 100644 drivers/virt/boottime_adj.c
>
> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
> index 80c5f9c16ec1..149b4e763e4d 100644
> --- a/drivers/virt/Kconfig
> +++ b/drivers/virt/Kconfig
> @@ -13,6 +13,15 @@ menuconfig VIRT_DRIVERS
>
> if VIRT_DRIVERS
>
> +config BOOTTIME_ADJUSTMENT
> + tristate "CLOCK_BOOTTIME adjustment sysfs interface"
> + help
> + The CLOCK_BOOTTIME adjustment sysfs interface driver
> + provides a sysfs interface ( /sys/kernel/boottime_adj )
> + to enable adjusting CLOCK_BOOTTIME from the userspace.
> +
> + If unsure, say N.
> +
> config FSL_HV_MANAGER
> tristate "Freescale hypervisor management driver"
> depends on FSL_SOC
> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
> index f28425ce4b39..1bbb476ddba9 100644
> --- a/drivers/virt/Makefile
> +++ b/drivers/virt/Makefile
> @@ -3,6 +3,7 @@
> # Makefile for drivers that support virtualization
> #
>
> +obj-$(CONFIG_BOOTTIME_ADJUSTMENT) += boottime_adj.o
> obj-$(CONFIG_FSL_HV_MANAGER) += fsl_hypervisor.o
> obj-y += vboxguest/
>
> diff --git a/drivers/virt/boottime_adj.c b/drivers/virt/boottime_adj.c
> new file mode 100644
> index 000000000000..9cc717d8accc
> --- /dev/null
> +++ b/drivers/virt/boottime_adj.c
> @@ -0,0 +1,57 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later

Do you really mean "or later"? I have to ask...

> +/*
> + * CLOCK_BOOTTIME Adjustment Interface Driver

No copyright, nice! Your company lawyers will be having a word with
you... :(

> + */
> +
> +#include <linux/kobject.h>
> +#include <linux/module.h>
> +#include <linux/timekeeping.h>
> +
> +static struct kobject *kobj_boottime_adj;
> +
> +/*
> + * Write to /sys/kernel/boottime_adj advances CLOCK_BOOTTIME by given delta.
> + */
> +static ssize_t boottime_adj_write(struct kobject *kobj,
> + struct kobj_attribute *attr, const char *buf,
> + size_t count)
> +{
> + int error;
> + struct timespec64 delta;
> +
> + if (sscanf(buf, "%lld %ld", &delta.tv_sec, &delta.tv_nsec) != 2)
> + return -EINVAL;

sysfs is "one value per file", but as you did not provide documentation
on what this does, I can't tell if you really are following this or not.

> +
> + error = timekeeping_adjust_boottime(&delta);
> + if (error)
> + return error;
> +
> + pr_info("%s: CLOCK_BOOTTIME has been advanced by %+lld seconds and %+ld nanoseconds\n",
> + __func__, delta.tv_sec, delta.tv_nsec);

If kernels are working normally, no messages are printed to the log. Do
not allow userspace to spam things.

> + return count;
> +}
> +
> +static struct kobj_attribute boottime_adj_attr =
> +__ATTR(boottime_adj, 0200, NULL, boottime_adj_write);

__ATTR_RO() please.

> +
> +static int __init boottime_adj_init(void)
> +{
> + int error;
> +
> + error = sysfs_create_file(kernel_kobj, &boottime_adj_attr.attr);
> + if (error) {
> + pr_warn("%s: failed to init\n", __func__);

why warn again?

> + return error;
> + }
> + return 0;
> +}
> +
> +static void __exit boottime_adj_cleanup(void)
> +{
> + kobject_put(kobj_boottime_adj);

You just called put on a kobject you never initialized?????

Are you sure this file has been tested?

odd...

greg k-h

2021-02-10 11:28:19

by Alexander Graf

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Introduce a way to adjust CLOCK_BOOTTIME from userspace for VM guests



On 10.02.21 11:39, Hikaru Nishida wrote:
>
> From: Hikaru Nishida <[email protected]>
>
>
> Hi folks,
>
> We'd like to add a sysfs interface that enable us to advance
> CLOCK_BOOTTIME from userspace. The use case of this change is that
> adjusting guest's CLOCK_BOOTTIME as host suspends to ensure that the
> guest can notice the device has been suspended.
> We have an application that rely on the difference between
> CLOCK_BOOTTIME and CLOCK_MONOTONIC to detect whether the device went
> suspend or not. However, the logic did not work well on VM environment
> since most VMs are pausing the VM guests instead of actually suspending
> them on the host's suspension.
> With following patches, we can adjust CLOCK_BOOTTIME without actually
> suspending guest and make the app working as intended.
> I think this feature is also useful for other VM solutions since there
> was no way to do this from userspace.
>
> As far as I checked, it is working as expected but is there any concern
> about this change? If so, please let me know.

I don't fully grasp why you want the guest to manually adjust its
CLOCK_BOOTTIME. Wouldn't it make more sense to extend kvmclock's notion
of wall clock time to tell you about suspended vs executed wall clock?


Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



2021-02-10 13:14:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] drivers/virt: introduce CLOCK_BOOTTIME adjustment sysfs interface driver

On Wed, Feb 10 2021 at 19:39, Hikaru Nishida wrote:
> From: Hikaru Nishida <[email protected]>
>
> This adds a sysfs interface /sys/kernel/boottime_adj to enable advancing
> CLOCK_BOOTTIME from the userspace without actual susupend/resume cycles.
>
> This gives a way to mitigate CLOCK_BOOTTIME divergence between guest
> and host on virtualized environments after suspend/resume cycles on
> the host.
>
> We observed an issue of a guest application that expects there is a gap
> between CLOCK_BOOTTIME and CLOCK_MONOTONIC after the device is suspended
> to detect whether the device went into suspend or not.
> Since the guest is paused instead of being actually suspended during the
> host's suspension, guest kernel doesn't advance CLOCK_BOOTTIME correctly
> and there is no way to correct that.
>
> To solve the problem, this change introduces a way to modify a gap
> between those clocks and align the timer behavior to host's one.

That's not a solution, that's a bandaid and just creating a horrible
user space ABI which we can't get rid off anymore.

The whole approach of virt vs. pausing and timekeeping is busted as I
pointed out several times before. Just papering over it with random
interfaces which fiddle with the timekeeping internals is not going to
happen.

Thanks,

tglx

2021-02-10 13:34:16

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Introduce a way to adjust CLOCK_BOOTTIME from userspace for VM guests

On Wed, Feb 10, 2021 at 11:39 AM Hikaru Nishida <[email protected]> wrote:
> From: Hikaru Nishida <[email protected]>
>
> We'd like to add a sysfs interface that enable us to advance
> CLOCK_BOOTTIME from userspace. The use case of this change is that
> adjusting guest's CLOCK_BOOTTIME as host suspends to ensure that the
> guest can notice the device has been suspended.
> We have an application that rely on the difference between
> CLOCK_BOOTTIME and CLOCK_MONOTONIC to detect whether the device went
> suspend or not. However, the logic did not work well on VM environment
> since most VMs are pausing the VM guests instead of actually suspending
> them on the host's suspension.
> With following patches, we can adjust CLOCK_BOOTTIME without actually
> suspending guest and make the app working as intended.
> I think this feature is also useful for other VM solutions since there
> was no way to do this from userspace.
>
> As far as I checked, it is working as expected but is there any concern
> about this change? If so, please let me know.

I think the correct internal interface to call would be
timekeeping_inject_sleeptime64(), which changes boottime in a
safe way.

Not sure what should call it, but kvmclock as Alex suggested might
be the right place.

Arnd