Currently, we offen use ntp (sync time with remote network clock)
to sync time in VM. But the precision of ntp is subject to network delay
so it's difficult to sync time in a high precision.
kvm virtual ptp clock (ptp_kvm) offers another way to sync time in VM,
as the remote clock locates in the host instead of remote network clock.
It targets to sync time between guest and host in virtualization
environment and in this way, we can keep the time of all the VMs running
in the same host in sync. In general, the delay of communication between
host and guest is quiet small, so ptp_kvm can offer time sync precision
up to in order of nanosecond. Please keep in mind that ptp_kvm just
limits itself to be a channel which transmit the remote clock from
host to guest and leaves the time sync jobs to an application, eg. chrony,
in usersapce in VM.
How ptp_kvm works:
After ptp_kvm initialized, there will be a new device node under
/dev called ptp%d. A guest userspace service, like chrony, can use this
device to get host walltime, sometimes also counter cycle, which depends
on the service it calls. Then this guest userspace service can use those
data to do the time sync for guest.
here is a rough sketch to show how kvm ptp works.
|----------------------------| |--------------------------|
| guest userspace | | host |
|ioctl -> /dev/ptp%d | | |
| ^ | | | |
|----------------------------| | |
| | | guest kernel | | |
| | V (get host walltime/counter cycle) |
| kvm ptp API (hypercall)- -|- - - - - - - - - ->hypercall service |
| <- - - - - - - - - - - - |
|----------------------------| |--------------------------|
1. time sync service in guest userspace call ptp device using ioctl.
2. guest kernel ptp_kvm API get this request then invoke hypercall to request
host walltime/counter cycle to host kernel.
3. ptp_kvm host hypercall service response to the request and send back data
4. ptp copy the data to userspace.
This ptp_kvm implemetation focuses itself to step 2 and 3 and step 2 works
in guest comparing step 3 works in host kernel.
change log:
from v11 to v12:
(1) rebase code on 5.7_rc6 and rebase 2 patches from Will Decon
including 1/11 and 2/11. as these patches introduce discover mechanism of
vendor smccc service.
(2) rebase ptp_kvm hypercall service from standard smccc to vendor
smccc and add ptp_kvm to vendor smccc service discover mechanism.
(3) add detail of why we need ptp_kvm and how ptp_kvm works in cover
letter.
from v10 to v11:
(1) rebase code on 5.7_rc2.
(2) remove support for arm32, as kvm support for arm32 will be
removed [1]
(3) add error report in ptp_kvm initialization.
from v11 to v10:
(1) change code base to v5.5.
(2) enable ptp_kvm both for arm32 and arm64.
(3) let user choose which of virtual counter or physical counter
should return when using crosstimestamp mode of ptp_kvm for arm/arm64.
(4) extend input argument for getcrosstimestamp API.
from v8 to v9:
(1) move ptp_kvm.h to driver/ptp/
(2) replace license declaration of ptp_kvm.h the same with other
header files in the same directory.
from v7 to v8:
(1) separate adding clocksource id for arm_arch_counter as a
single patch.
(2) update commit message for patch 4/8.
(3) refine patch 7/8 and patch 8/8 to make them more independent.
from v6 to v7:
(1) include the omitted clocksource_id.h in last version.
(2) reorder the header file in patch.
(3) refine some words in commit message to make it more impersonal.
from v5 to v6:
(1) apply Mark's patch[4] to get SMCCC conduit.
(2) add mechanism to recognize current clocksource by add
clocksouce_id value into struct clocksource instead of method in patch-v5.
(3) rename kvm_arch_ptp_get_clock_fn into
kvm_arch_ptp_get_crosststamp.
from v4 to v5:
(1) remove hvc delay compensasion as it should leave to userspace.
(2) check current clocksource in hvc call service.
(3) expose current clocksource by adding it to
system_time_snapshot.
(4) add helper to check if clocksource is arm_arch_counter.
(5) rename kvm_ptp.c to ptp_kvm_common.c
from v3 to v4:
(1) fix clocksource of ptp_kvm to arch_sys_counter.
(2) move kvm_arch_ptp_get_clock_fn into arm_arch_timer.c
(3) subtract cntvoff before return cycles from host.
(4) use ktime_get_snapshot instead of getnstimeofday and
get_current_counterval to return time and counter value.
(5) split ktime and counter into two 32-bit block respectively
to avoid Y2038-safe issue.
(6) set time compensation to device time as half of the delay of
hvc call.
(7) add ARM_ARCH_TIMER as dependency of ptp_kvm for
arm64.
from v2 to v3:
(1) fix some issues in commit log.
(2) add some receivers in send list.
from v1 to v2:
(1) move arch-specific code from arch/ to driver/ptp/
(2) offer mechanism to inform userspace if ptp_kvm service is
available.
(3) separate ptp_kvm code for arm64 into hypervisor part and
guest part.
(4) add API to expose monotonic clock and counter value.
(5) refine code: remove no necessary part and reconsitution.
[1] https://patchwork.kernel.org/cover/11373351/
Jianyong Wu (8):
psci: export psci conduit get helper.
ptp: Reorganize ptp_kvm modules to make it arch-independent.
clocksource: Add clocksource id for arm arch counter
psci: Add hypercall service for ptp_kvm.
ptp: arm64: Enable ptp_kvm for arm/arm64
ptp: extend input argument for getcrosstimestamp API
arm64: add mechanism to let user choose which counter to return
arm64: Add kvm capability check extension for ptp_kvm
Thomas Gleixner (1):
time: Add mechanism to recognize clocksource in time_get_snapshot
drivers/clocksource/arm_arch_timer.c | 33 ++++++++
drivers/firmware/psci/psci.c | 1 +
drivers/net/ethernet/intel/e1000e/ptp.c | 3 +-
drivers/ptp/Kconfig | 2 +-
drivers/ptp/Makefile | 1 +
drivers/ptp/ptp_chardev.c | 8 +-
drivers/ptp/ptp_kvm.h | 11 +++
drivers/ptp/ptp_kvm_arm64.c | 53 ++++++++++++
drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} | 85 ++++++--------------
drivers/ptp/ptp_kvm_x86.c | 89 +++++++++++++++++++++
include/linux/arm-smccc.h | 21 +++++
include/linux/clocksource.h | 6 ++
include/linux/clocksource_ids.h | 12 +++
include/linux/ptp_clock_kernel.h | 3 +-
include/linux/timekeeping.h | 12 +--
include/uapi/linux/kvm.h | 1 +
include/uapi/linux/ptp_clock.h | 4 +-
kernel/time/clocksource.c | 3 +
kernel/time/timekeeping.c | 1 +
virt/kvm/arm/arm.c | 1 +
virt/kvm/arm/hypercalls.c | 44 +++++++++-
21 files changed, 322 insertions(+), 72 deletions(-)
create mode 100644 drivers/ptp/ptp_kvm.h
create mode 100644 drivers/ptp/ptp_kvm_arm64.c
rename drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} (60%)
create mode 100644 drivers/ptp/ptp_kvm_x86.c
create mode 100644 include/linux/clocksource_ids.h
--
2.17.1
From: Will Deacon <[email protected]>
Although the SMCCC specification provides some limited functionality for
describing the presence of hypervisor and firmware services, this is
generally applicable only to functions designated as "Arm Architecture
Service Functions" and no portable discovery mechanism is provided for
standard hypervisor services, despite having a designated range of
function identifiers reserved by the specification.
In an attempt to avoid the need for additional firmware changes every
time a new function is added, introduce a UID to identify the service
provider as being compatible with KVM. Once this has been established,
additional services can be discovered via a feature bitmap.
Cc: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Jianyong Wu <[email protected]>
---
arch/arm64/include/asm/hypervisor.h | 11 +++++++++
arch/arm64/kernel/setup.c | 36 +++++++++++++++++++++++++++++
include/linux/arm-smccc.h | 26 +++++++++++++++++++++
3 files changed, 73 insertions(+)
diff --git a/arch/arm64/include/asm/hypervisor.h b/arch/arm64/include/asm/hypervisor.h
index f9cc1d021791..91e4bd890819 100644
--- a/arch/arm64/include/asm/hypervisor.h
+++ b/arch/arm64/include/asm/hypervisor.h
@@ -2,6 +2,17 @@
#ifndef _ASM_ARM64_HYPERVISOR_H
#define _ASM_ARM64_HYPERVISOR_H
+#include <linux/arm-smccc.h>
#include <asm/xen/hypervisor.h>
+static inline bool kvm_arm_hyp_service_available(u32 func_id)
+{
+ extern DECLARE_BITMAP(__kvm_arm_hyp_services, ARM_SMCCC_KVM_NUM_FUNCS);
+
+ if (func_id >= ARM_SMCCC_KVM_NUM_FUNCS)
+ return -EINVAL;
+
+ return test_bit(func_id, __kvm_arm_hyp_services);
+}
+
#endif
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 3fd2c11c09fc..80bb78953df2 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -7,6 +7,7 @@
*/
#include <linux/acpi.h>
+#include <linux/arm-smccc.h>
#include <linux/export.h>
#include <linux/kernel.h>
#include <linux/stddef.h>
@@ -276,6 +277,40 @@ arch_initcall(reserve_memblock_reserved_regions);
u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
+DECLARE_BITMAP(__kvm_arm_hyp_services, ARM_SMCCC_KVM_NUM_FUNCS) = { };
+
+static void __init kvm_init_hyp_services(void)
+{
+ int i;
+ struct arm_smccc_res res;
+
+ if (psci_ops.smccc_version == SMCCC_VERSION_1_0)
+ return;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, &res);
+ if (res.a0 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0 ||
+ res.a1 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1 ||
+ res.a2 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2 ||
+ res.a3 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3)
+ return;
+
+ memset(&res, 0, sizeof(res));
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID, &res);
+ for (i = 0; i < 32; ++i) {
+ if (res.a0 & (i))
+ set_bit(i + (32 * 0), __kvm_arm_hyp_services);
+ if (res.a1 & (i))
+ set_bit(i + (32 * 1), __kvm_arm_hyp_services);
+ if (res.a2 & (i))
+ set_bit(i + (32 * 2), __kvm_arm_hyp_services);
+ if (res.a3 & (i))
+ set_bit(i + (32 * 3), __kvm_arm_hyp_services);
+ }
+
+ pr_info("KVM hypervisor services detected (0x%08lx 0x%08lx 0x%08lx 0x%08lx)\n",
+ res.a3, res.a2, res.a1, res.a0);
+}
+
void __init setup_arch(char **cmdline_p)
{
init_mm.start_code = (unsigned long) _text;
@@ -344,6 +379,7 @@ void __init setup_arch(char **cmdline_p)
else
psci_acpi_init();
+ kvm_init_hyp_services();
init_bootcpu_ops();
smp_init_cpus();
smp_build_mpidr_hash();
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 59494df0f55b..bdc0124a064a 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -46,11 +46,14 @@
#define ARM_SMCCC_OWNER_OEM 3
#define ARM_SMCCC_OWNER_STANDARD 4
#define ARM_SMCCC_OWNER_STANDARD_HYP 5
+#define ARM_SMCCC_OWNER_VENDOR_HYP 6
#define ARM_SMCCC_OWNER_TRUSTED_APP 48
#define ARM_SMCCC_OWNER_TRUSTED_APP_END 49
#define ARM_SMCCC_OWNER_TRUSTED_OS 50
#define ARM_SMCCC_OWNER_TRUSTED_OS_END 63
+#define ARM_SMCCC_FUNC_QUERY_CALL_UID 0xff01
+
#define ARM_SMCCC_QUIRK_NONE 0
#define ARM_SMCCC_QUIRK_QCOM_A6 1 /* Save/restore register a6 */
@@ -77,6 +80,29 @@
ARM_SMCCC_SMC_32, \
0, 0x7fff)
+#define ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_FUNC_QUERY_CALL_UID)
+
+/* KVM UID value: 28b46fb6-2ec5-11e9-a9ca-4b564d003a74 */
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0 0xb66fb428U
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1 0xe911c52eU
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2 0x564bcaa9U
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3 0x743a004dU
+
+/* KVM "vendor specific" services */
+#define ARM_SMCCC_KVM_FUNC_FEATURES 0
+#define ARM_SMCCC_KVM_FUNC_FEATURES_2 127
+#define ARM_SMCCC_KVM_NUM_FUNCS 128
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_FEATURES)
+
#ifndef __ASSEMBLY__
#include <linux/linkage.h>
--
2.17.1
From: Will Deacon <[email protected]>
We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.
Cc: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Jianyong Wu <[email protected]>
---
virt/kvm/arm/hypercalls.c | 29 +++++++++++++++++++----------
1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
index 550dfa3e53cd..db6dce3d0e23 100644
--- a/virt/kvm/arm/hypercalls.c
+++ b/virt/kvm/arm/hypercalls.c
@@ -12,13 +12,13 @@
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
{
u32 func_id = smccc_get_function(vcpu);
- long val = SMCCC_RET_NOT_SUPPORTED;
+ u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
gpa_t gpa;
switch (func_id) {
case ARM_SMCCC_VERSION_FUNC_ID:
- val = ARM_SMCCC_VERSION_1_1;
+ val[0] = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case KVM_BP_HARDEN_UNKNOWN:
break;
case KVM_BP_HARDEN_WA_NEEDED:
- val = SMCCC_RET_SUCCESS;
+ val[0] = SMCCC_RET_SUCCESS;
break;
case KVM_BP_HARDEN_NOT_REQUIRED:
- val = SMCCC_RET_NOT_REQUIRED;
+ val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
@@ -41,31 +41,40 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case KVM_SSBD_UNKNOWN:
break;
case KVM_SSBD_KERNEL:
- val = SMCCC_RET_SUCCESS;
+ val[0] = SMCCC_RET_SUCCESS;
break;
case KVM_SSBD_FORCE_ENABLE:
case KVM_SSBD_MITIGATED:
- val = SMCCC_RET_NOT_REQUIRED;
+ val[0] = SMCCC_RET_NOT_REQUIRED;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
- val = SMCCC_RET_SUCCESS;
+ val[0] = SMCCC_RET_SUCCESS;
break;
}
break;
case ARM_SMCCC_HV_PV_TIME_FEATURES:
- val = kvm_hypercall_pv_features(vcpu);
+ val[0] = kvm_hypercall_pv_features(vcpu);
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
if (gpa != GPA_INVALID)
- val = gpa;
+ val[0] = gpa;
+ break;
+ case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+ val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+ val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+ val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+ val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+ break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+ val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
default:
return kvm_psci_call(vcpu);
}
- smccc_set_retval(vcpu, val, 0, 0, 0);
+ smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
return 1;
}
--
2.17.1
Export arm_smccc_1_1_get_conduit then modules can use smccc helper which
adopts it.
Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Jianyong Wu <[email protected]>
---
drivers/firmware/psci/psci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index 2937d44b5df4..fd3c88f21b6a 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -64,6 +64,7 @@ enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void)
return psci_ops.conduit;
}
+EXPORT_SYMBOL(arm_smccc_1_1_get_conduit);
typedef unsigned long (psci_fn)(unsigned long, unsigned long,
unsigned long, unsigned long);
--
2.17.1
Currently, ptp_kvm modules implementation is only for x86 which includs
large part of arch-specific code. This patch move all of those code
into new arch related file in the same directory.
Signed-off-by: Jianyong Wu <[email protected]>
---
drivers/ptp/Makefile | 1 +
drivers/ptp/ptp_kvm.h | 11 +++
drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} | 80 +++++-------------
drivers/ptp/ptp_kvm_x86.c | 89 +++++++++++++++++++++
4 files changed, 122 insertions(+), 59 deletions(-)
create mode 100644 drivers/ptp/ptp_kvm.h
rename drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} (63%)
create mode 100644 drivers/ptp/ptp_kvm_x86.c
diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
index 7aff75f745dc..baac6f5b243b 100644
--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
@@ -4,6 +4,7 @@
#
ptp-y := ptp_clock.o ptp_chardev.o ptp_sysfs.o
+ptp_kvm-y := ptp_kvm_$(ARCH).o ptp_kvm_common.o
obj-$(CONFIG_PTP_1588_CLOCK) += ptp.o
obj-$(CONFIG_PTP_1588_CLOCK_DTE) += ptp_dte.o
obj-$(CONFIG_PTP_1588_CLOCK_INES) += ptp_ines.o
diff --git a/drivers/ptp/ptp_kvm.h b/drivers/ptp/ptp_kvm.h
new file mode 100644
index 000000000000..4bf1802bbeb8
--- /dev/null
+++ b/drivers/ptp/ptp_kvm.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Virtual PTP 1588 clock for use with KVM guests
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ */
+
+int kvm_arch_ptp_init(void);
+int kvm_arch_ptp_get_clock(struct timespec64 *ts);
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle,
+ struct timespec64 *tspec, void *cs);
diff --git a/drivers/ptp/ptp_kvm.c b/drivers/ptp/ptp_kvm_common.c
similarity index 63%
rename from drivers/ptp/ptp_kvm.c
rename to drivers/ptp/ptp_kvm_common.c
index fc7d0b77e118..4fdd8ab11a28 100644
--- a/drivers/ptp/ptp_kvm.c
+++ b/drivers/ptp/ptp_kvm_common.c
@@ -8,15 +8,16 @@
#include <linux/err.h>
#include <linux/init.h>
#include <linux/kernel.h>
+#include <linux/slab.h>
#include <linux/module.h>
#include <uapi/linux/kvm_para.h>
#include <asm/kvm_para.h>
-#include <asm/pvclock.h>
-#include <asm/kvmclock.h>
#include <uapi/asm/kvm_para.h>
#include <linux/ptp_clock_kernel.h>
+#include "ptp_kvm.h"
+
struct kvm_ptp_clock {
struct ptp_clock *ptp_clock;
struct ptp_clock_info caps;
@@ -24,56 +25,29 @@ struct kvm_ptp_clock {
DEFINE_SPINLOCK(kvm_ptp_lock);
-static struct pvclock_vsyscall_time_info *hv_clock;
-
-static struct kvm_clock_pairing clock_pair;
-static phys_addr_t clock_pair_gpa;
-
static int ptp_kvm_get_time_fn(ktime_t *device_time,
struct system_counterval_t *system_counter,
void *ctx)
{
- unsigned long ret;
+ unsigned long ret, cycle;
struct timespec64 tspec;
- unsigned version;
- int cpu;
- struct pvclock_vcpu_time_info *src;
+ struct clocksource *cs;
spin_lock(&kvm_ptp_lock);
preempt_disable_notrace();
- cpu = smp_processor_id();
- src = &hv_clock[cpu].pvti;
-
- do {
- /*
- * We are using a TSC value read in the hosts
- * kvm_hc_clock_pairing handling.
- * So any changes to tsc_to_system_mul
- * and tsc_shift or any other pvclock
- * data invalidate that measurement.
- */
- version = pvclock_read_begin(src);
-
- ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
- clock_pair_gpa,
- KVM_CLOCK_PAIRING_WALLCLOCK);
- if (ret != 0) {
- pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
- spin_unlock(&kvm_ptp_lock);
- preempt_enable_notrace();
- return -EOPNOTSUPP;
- }
-
- tspec.tv_sec = clock_pair.sec;
- tspec.tv_nsec = clock_pair.nsec;
- ret = __pvclock_read_cycles(src, clock_pair.tsc);
- } while (pvclock_read_retry(src, version));
+ ret = kvm_arch_ptp_get_crosststamp(&cycle, &tspec, &cs);
+ if (ret != 0) {
+ pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
+ spin_unlock(&kvm_ptp_lock);
+ preempt_enable_notrace();
+ return -EOPNOTSUPP;
+ }
preempt_enable_notrace();
- system_counter->cycles = ret;
- system_counter->cs = &kvm_clock;
+ system_counter->cycles = cycle;
+ system_counter->cs = cs;
*device_time = timespec64_to_ktime(tspec);
@@ -116,17 +90,13 @@ static int ptp_kvm_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
spin_lock(&kvm_ptp_lock);
- ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
- clock_pair_gpa,
- KVM_CLOCK_PAIRING_WALLCLOCK);
+ ret = kvm_arch_ptp_get_clock(&tspec);
if (ret != 0) {
pr_err_ratelimited("clock offset hypercall ret %lu\n", ret);
spin_unlock(&kvm_ptp_lock);
return -EOPNOTSUPP;
}
- tspec.tv_sec = clock_pair.sec;
- tspec.tv_nsec = clock_pair.nsec;
spin_unlock(&kvm_ptp_lock);
memcpy(ts, &tspec, sizeof(struct timespec64));
@@ -166,21 +136,13 @@ static void __exit ptp_kvm_exit(void)
static int __init ptp_kvm_init(void)
{
- long ret;
-
- if (!kvm_para_available())
- return -ENODEV;
-
- clock_pair_gpa = slow_virt_to_phys(&clock_pair);
- hv_clock = pvclock_get_pvti_cpu0_va();
+ int ret;
- if (!hv_clock)
- return -ENODEV;
-
- ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
- KVM_CLOCK_PAIRING_WALLCLOCK);
- if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
- return -ENODEV;
+ ret = kvm_arch_ptp_init();
+ if (ret) {
+ pr_err("fail to initialize ptp_kvm");
+ return -EOPNOTSUPP;
+ }
kvm_ptp_clock.caps = ptp_kvm_caps;
diff --git a/drivers/ptp/ptp_kvm_x86.c b/drivers/ptp/ptp_kvm_x86.c
new file mode 100644
index 000000000000..aabed1b08a0d
--- /dev/null
+++ b/drivers/ptp/ptp_kvm_x86.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtual PTP 1588 clock for use with KVM guests
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ */
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <asm/pvclock.h>
+#include <asm/kvmclock.h>
+#include <linux/module.h>
+#include <uapi/asm/kvm_para.h>
+#include <uapi/linux/kvm_para.h>
+#include <linux/ptp_clock_kernel.h>
+
+phys_addr_t clock_pair_gpa;
+struct kvm_clock_pairing clock_pair;
+struct pvclock_vsyscall_time_info *hv_clock;
+
+int kvm_arch_ptp_init(void)
+{
+ int ret;
+
+ if (!kvm_para_available())
+ return -ENODEV;
+
+ clock_pair_gpa = slow_virt_to_phys(&clock_pair);
+ hv_clock = pvclock_get_pvti_cpu0_va();
+ if (!hv_clock)
+ return -ENODEV;
+
+ ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
+ KVM_CLOCK_PAIRING_WALLCLOCK);
+ if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
+ return -ENODEV;
+
+ return 0;
+}
+
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
+{
+ long ret;
+
+ ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+ clock_pair_gpa,
+ KVM_CLOCK_PAIRING_WALLCLOCK);
+ if (ret != 0)
+ return -EOPNOTSUPP;
+
+ ts->tv_sec = clock_pair.sec;
+ ts->tv_nsec = clock_pair.nsec;
+
+ return 0;
+}
+
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle, struct timespec64 *tspec,
+ struct clocksource **cs)
+{
+ unsigned long ret;
+ unsigned int version;
+ int cpu;
+ struct pvclock_vcpu_time_info *src;
+
+ cpu = smp_processor_id();
+ src = &hv_clock[cpu].pvti;
+
+ do {
+ /*
+ * We are using a TSC value read in the hosts
+ * kvm_hc_clock_pairing handling.
+ * So any changes to tsc_to_system_mul
+ * and tsc_shift or any other pvclock
+ * data invalidate that measurement.
+ */
+ version = pvclock_read_begin(src);
+
+ ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+ clock_pair_gpa,
+ KVM_CLOCK_PAIRING_WALLCLOCK);
+ tspec->tv_sec = clock_pair.sec;
+ tspec->tv_nsec = clock_pair.nsec;
+ *cycle = __pvclock_read_cycles(src, clock_pair.tsc);
+ } while (pvclock_read_retry(src, version));
+
+ *cs = &kvm_clock;
+
+ return 0;
+}
--
2.17.1
From: Thomas Gleixner <[email protected]>
System time snapshots are not conveying information about the current
clocksource which was used, but callers like the PTP KVM guest
implementation have the requirement to evaluate the clocksource type to
select the appropriate mechanism.
Introduce a clocksource id field in struct clocksource which is by default
set to CSID_GENERIC (0). Clocksource implementations can set that field to
a value which allows to identify the clocksource.
Store the clocksource id of the current clocksource in the
system_time_snapshot so callers can evaluate which clocksource was used to
take the snapshot and act accordingly.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jianyong Wu <[email protected]>
---
include/linux/clocksource.h | 6 ++++++
include/linux/clocksource_ids.h | 11 +++++++++++
include/linux/timekeeping.h | 12 +++++++-----
kernel/time/clocksource.c | 3 +++
kernel/time/timekeeping.c | 1 +
5 files changed, 28 insertions(+), 5 deletions(-)
create mode 100644 include/linux/clocksource_ids.h
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143db6523..80d2a7e39630 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -17,6 +17,7 @@
#include <linux/timer.h>
#include <linux/init.h>
#include <linux/of.h>
+#include <linux/clocksource_ids.h>
#include <asm/div64.h>
#include <asm/io.h>
@@ -62,6 +63,10 @@ struct module;
* 400-499: Perfect
* The ideal clocksource. A must-use where
* available.
+ @id: Defaults to CSID_GENERIC. The id value is captured
+ * in certain snapshot functions to allow callers to
+ * validate the clocksource from which the snapshot was
+ * taken.
* @flags: Flags describing special properties
* @enable: Optional function to enable the clocksource
* @disable: Optional function to disable the clocksource
@@ -100,6 +105,7 @@ struct clocksource {
const char *name;
struct list_head list;
int rating;
+ enum clocksource_ids id;
enum vdso_clock_mode vdso_clock_mode;
unsigned long flags;
diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h
new file mode 100644
index 000000000000..4d8e19e05328
--- /dev/null
+++ b/include/linux/clocksource_ids.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CLOCKSOURCE_IDS_H
+#define _LINUX_CLOCKSOURCE_IDS_H
+
+/* Enum to give clocksources a unique identifier */
+enum clocksource_ids {
+ CSID_GENERIC = 0,
+ CSID_MAX,
+};
+
+#endif
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index b27e2ffa96c1..70e771862d20 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -3,6 +3,7 @@
#define _LINUX_TIMEKEEPING_H
#include <linux/errno.h>
+#include <linux/clocksource_ids.h>
/* Included from linux/ktime.h */
@@ -232,11 +233,12 @@ extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta);
* @cs_was_changed_seq: The sequence number of clocksource change events
*/
struct system_time_snapshot {
- u64 cycles;
- ktime_t real;
- ktime_t raw;
- unsigned int clock_was_set_seq;
- u8 cs_was_changed_seq;
+ u64 cycles;
+ ktime_t real;
+ ktime_t raw;
+ enum clocksource_ids cs_id;
+ unsigned int clock_was_set_seq;
+ u8 cs_was_changed_seq;
};
/*
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 7cb09c4cf21c..a8f65b3e4ec8 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -928,6 +928,9 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
clocksource_arch_init(cs);
+if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX))
+ cs->id = CSID_GENERIC;
+
#ifdef CONFIG_GENERIC_VDSO_CLOCK_MODE
if (cs->vdso_clock_mode < 0 ||
cs->vdso_clock_mode >= VDSO_CLOCKMODE_MAX) {
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9ebaab13339d..a2e46b0151b6 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -979,6 +979,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
do {
seq = read_seqcount_begin(&tk_core.seq);
now = tk_clock_read(&tk->tkr_mono);
+ systime_snapshot->cs_id = tk->tkr_mono.clock->id;
systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
base_real = ktime_add(tk->tkr_mono.base,
--
2.17.1
Add clocksource id for arm arch counter to let it be identified easily and
elegantly in ptp_kvm implementation for arm.
Signed-off-by: Jianyong Wu <[email protected]>
---
drivers/clocksource/arm_arch_timer.c | 2 ++
include/linux/clocksource_ids.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 2204a444e801..0f44f296ed17 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -16,6 +16,7 @@
#include <linux/cpu_pm.h>
#include <linux/clockchips.h>
#include <linux/clocksource.h>
+#include <linux/clocksource_ids.h>
#include <linux/interrupt.h>
#include <linux/of_irq.h>
#include <linux/of_address.h>
@@ -191,6 +192,7 @@ static u64 arch_counter_read_cc(const struct cyclecounter *cc)
static struct clocksource clocksource_counter = {
.name = "arch_sys_counter",
+ .id = CSID_ARM_ARCH_COUNTER,
.rating = 400,
.read = arch_counter_read,
.mask = CLOCKSOURCE_MASK(56),
diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h
index 4d8e19e05328..16775d7d8f8d 100644
--- a/include/linux/clocksource_ids.h
+++ b/include/linux/clocksource_ids.h
@@ -5,6 +5,7 @@
/* Enum to give clocksources a unique identifier */
enum clocksource_ids {
CSID_GENERIC = 0,
+ CSID_ARM_ARCH_COUNTER,
CSID_MAX,
};
--
2.17.1
Currently, there is no mechanism to keep time sync between guest and host
in arm64 virtualization environment. Time in guest will drift compared
with host after boot up as they may both use third party time sources
to correct their time respectively. The time deviation will be in order
of milliseconds. But in some scenarios,like in cloud envirenment, we ask
for higher time precision.
kvm ptp clock, which choose the host clock source clock as a reference
clock to sync time clock between guest and host has been adopted by x86
which makes the time sync order from milliseconds to nanoseconds.
This patch enables kvm ptp on arm64 and improve clock sync precison
significantly.
Test result comparison between with kvm ptp and without it in arm64 is
as follows. This test derived from the result of command 'chronyc
sources'. we should take more care of the last sample column which shows
the offset between the local clock and the source at the last measurement.
no kvm ptp in guest:
MS Name/IP address Stratum Poll Reach LastRx Last sample
========================================================================
^* dns1.synet.edu.cn 2 6 377 13 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 21 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 29 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 37 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 45 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 53 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 61 +1040us[+1581us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 4 -130us[ +796us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 12 -130us[ +796us] +/- 21ms
^* dns1.synet.edu.cn 2 6 377 20 -130us[ +796us] +/- 21ms
in host:
MS Name/IP address Stratum Poll Reach LastRx Last sample
========================================================================
^* 120.25.115.20 2 7 377 72 -470us[ -603us] +/- 18ms
^* 120.25.115.20 2 7 377 92 -470us[ -603us] +/- 18ms
^* 120.25.115.20 2 7 377 112 -470us[ -603us] +/- 18ms
^* 120.25.115.20 2 7 377 2 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 22 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 43 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 63 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 83 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 103 +872ns[-6808ns] +/- 17ms
^* 120.25.115.20 2 7 377 123 +872ns[-6808ns] +/- 17ms
The dns1.synet.edu.cn is the network reference clock for guest and
120.25.115.20 is the network reference clock for host. we can't get the
clock error between guest and host directly, but a roughly estimated value
will be in order of hundreds of us to ms.
with kvm ptp in guest:
chrony has been disabled in host to remove the disturb by network clock.
MS Name/IP address Stratum Poll Reach LastRx Last sample
========================================================================
* PHC0 0 3 377 8 -7ns[ +1ns] +/- 3ns
* PHC0 0 3 377 8 +1ns[ +16ns] +/- 3ns
* PHC0 0 3 377 6 -4ns[ -0ns] +/- 6ns
* PHC0 0 3 377 6 -8ns[ -12ns] +/- 5ns
* PHC0 0 3 377 5 +2ns[ +4ns] +/- 4ns
* PHC0 0 3 377 13 +2ns[ +4ns] +/- 4ns
* PHC0 0 3 377 12 -4ns[ -6ns] +/- 4ns
* PHC0 0 3 377 11 -8ns[ -11ns] +/- 6ns
* PHC0 0 3 377 10 -14ns[ -20ns] +/- 4ns
* PHC0 0 3 377 8 +4ns[ +5ns] +/- 4ns
The PHC0 is the ptp clock which choose the host clock as its source
clock. So we can be sure to say that the clock error between host and guest
is in order of ns.
Signed-off-by: Jianyong Wu <[email protected]>
---
drivers/clocksource/arm_arch_timer.c | 22 ++++++++++++
drivers/ptp/Kconfig | 2 +-
drivers/ptp/ptp_kvm_arm64.c | 53 ++++++++++++++++++++++++++++
3 files changed, 76 insertions(+), 1 deletion(-)
create mode 100644 drivers/ptp/ptp_kvm_arm64.c
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 0f44f296ed17..848613261508 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -1641,3 +1641,25 @@ static int __init arch_timer_acpi_init(struct acpi_table_header *table)
}
TIMER_ACPI_DECLARE(arch_timer, ACPI_SIG_GTDT, arch_timer_acpi_init);
#endif
+
+#if IS_ENABLED(CONFIG_PTP_1588_CLOCK_KVM)
+#include <linux/arm-smccc.h>
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle, struct timespec64 *ts,
+ struct clocksource **cs)
+{
+ struct arm_smccc_res hvc_res;
+ ktime_t ktime_overall;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID, &hvc_res);
+ if ((int)(hvc_res.a0) < 0)
+ return -EOPNOTSUPP;
+
+ ktime_overall = (long long)hvc_res.a0 << 32 | hvc_res.a1;
+ *ts = ktime_to_timespec64(ktime_overall);
+ *cycle = (long long)hvc_res.a2 << 32 | hvc_res.a3;
+ *cs = &clocksource_counter;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_arch_ptp_get_crosststamp);
+#endif
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 86400c708150..0733c8c61541 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -106,7 +106,7 @@ config PTP_1588_CLOCK_PCH
config PTP_1588_CLOCK_KVM
tristate "KVM virtual PTP clock"
depends on PTP_1588_CLOCK
- depends on KVM_GUEST && X86
+ depends on KVM_GUEST && X86 || ARM64 && ARM_ARCH_TIMER && ARM_PSCI_FW
default y
help
This driver adds support for using kvm infrastructure as a PTP
diff --git a/drivers/ptp/ptp_kvm_arm64.c b/drivers/ptp/ptp_kvm_arm64.c
new file mode 100644
index 000000000000..2781a0f7cad2
--- /dev/null
+++ b/drivers/ptp/ptp_kvm_arm64.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Virtual PTP 1588 clock for use with KVM guests
+ * Copyright (C) 2019 ARM Ltd.
+ * All Rights Reserved
+ */
+
+#include <linux/kernel.h>
+#include <linux/err.h>
+#include <asm/hypervisor.h>
+#include <linux/module.h>
+#include <linux/psci.h>
+#include <linux/arm-smccc.h>
+#include <linux/timecounter.h>
+#include <linux/sched/clock.h>
+#include <asm/arch_timer.h>
+
+int kvm_arch_ptp_init(void)
+{
+ struct arm_smccc_res hvc_res;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID,
+ &hvc_res);
+ if (!(hvc_res.a0 | BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP)))
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+int kvm_arch_ptp_get_clock_generic(struct timespec64 *ts,
+ struct arm_smccc_res *hvc_res)
+{
+ ktime_t ktime_overall;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+ hvc_res);
+ if ((int)(hvc_res->a0) < 0)
+ return -EOPNOTSUPP;
+
+ ktime_overall = (long long)hvc_res->a0 << 32 | hvc_res->a1;
+ *ts = ktime_to_timespec64(ktime_overall);
+
+ return 0;
+}
+
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
+{
+ struct arm_smccc_res hvc_res;
+
+ kvm_arch_ptp_get_clock_generic(ts, &hvc_res);
+
+ return 0;
+}
--
2.17.1
In general, vm inside will use virtual counter compered with host use
phyical counter. But in some special scenarios, like nested
virtualization, phyical counter maybe used by vm. A interface added in
ptp_kvm driver to offer a mechanism to let user choose which counter
should be return from host.
To use this feature, you should call PTP_EXTTS_REQUEST(2) ioctl with flag
set bit PTP_KVM_ARM_PHY_COUNTER in its argument then call
PTP_SYS_OFFSET_PRECISE(2) ioctl to get the cross timestamp and phyical
counter will return. If the bit not set or no call for PTP_EXTTS_REQUEST2,
virtual counter will return by default.
Signed-off-by: Jianyong Wu <[email protected]>
Suggested-by: Marc Zyngier <[email protected]>
---
drivers/clocksource/arm_arch_timer.c | 13 ++++++++++++-
drivers/ptp/ptp_chardev.c | 25 +++++++++++++++++++++++++
drivers/ptp/ptp_kvm_common.c | 7 ++++---
include/uapi/linux/ptp_clock.h | 4 +++-
4 files changed, 44 insertions(+), 5 deletions(-)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 06959b901b0d..75a3bb118201 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -1650,7 +1650,18 @@ int kvm_arch_ptp_get_crosststamp(unsigned long *cycle, struct timespec64 *ts,
struct arm_smccc_res hvc_res;
ktime_t ktime_overall;
- arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID, &hvc_res);
+ /*
+ * an argument will be passed by a0 to determine weather virtual
+ * counter or phyical counter should be passed back.
+ */
+ if (ctx && *ctx)
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+ ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID,
+ &hvc_res);
+ else
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+ &hvc_res);
+
if ((int)(hvc_res.a0) < 0)
return -EOPNOTSUPP;
diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index fef72f29f3c8..8b0a7b328bcd 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -123,6 +123,9 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
struct timespec64 ts;
int enable, err = 0;
+#ifdef CONFIG_ARM64
+ static long flag;
+#endif
switch (cmd) {
case PTP_CLOCK_GETCAPS:
@@ -149,6 +152,24 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
err = -EFAULT;
break;
}
+
+#ifdef CONFIG_ARM64
+ /*
+ * Just using this ioctl to tell kvm ptp driver to get PHC
+ * with physical counter, so if bit PTP_KVM_ARM_PHY_COUNTER
+ * is set then just exit directly.
+ * In most cases, we just need virtual counter from host and
+ * there is limited scenario using this to get physical counter
+ * in guest.
+ * Be careful to use this as there is no way to set it back
+ * unless you reinstall the module.
+ * This is only for arm64.
+ */
+ if (req.extts.flags & PTP_KVM_ARM_PHY_COUNTER) {
+ flag = 1;
+ break;
+ }
+#endif
if (cmd == PTP_EXTTS_REQUEST2) {
/* Tell the drivers to check the flags carefully. */
req.extts.flags |= PTP_STRICT_FLAGS;
@@ -235,7 +256,11 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
err = -EOPNOTSUPP;
break;
}
+#ifdef CONFIG_ARM64
+ err = ptp->info->getcrosststamp(ptp->info, &xtstamp, &flag);
+#else
err = ptp->info->getcrosststamp(ptp->info, &xtstamp, NULL);
+#endif
if (err)
break;
diff --git a/drivers/ptp/ptp_kvm_common.c b/drivers/ptp/ptp_kvm_common.c
index 4fdd8ab11a28..39367e230176 100644
--- a/drivers/ptp/ptp_kvm_common.c
+++ b/drivers/ptp/ptp_kvm_common.c
@@ -36,7 +36,7 @@ static int ptp_kvm_get_time_fn(ktime_t *device_time,
spin_lock(&kvm_ptp_lock);
preempt_disable_notrace();
- ret = kvm_arch_ptp_get_crosststamp(&cycle, &tspec, &cs);
+ ret = kvm_arch_ptp_get_crosststamp(&cycle, &tspec, &cs, ctx);
if (ret != 0) {
pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
spin_unlock(&kvm_ptp_lock);
@@ -57,9 +57,10 @@ static int ptp_kvm_get_time_fn(ktime_t *device_time,
}
static int ptp_kvm_getcrosststamp(struct ptp_clock_info *ptp,
- struct system_device_crosststamp *xtstamp)
+ struct system_device_crosststamp *xtstamp,
+ long *flag)
{
- return get_device_system_crosststamp(ptp_kvm_get_time_fn, NULL,
+ return get_device_system_crosststamp(ptp_kvm_get_time_fn, flag,
NULL, xtstamp);
}
diff --git a/include/uapi/linux/ptp_clock.h b/include/uapi/linux/ptp_clock.h
index 9dc9d0079e98..71e388a82244 100644
--- a/include/uapi/linux/ptp_clock.h
+++ b/include/uapi/linux/ptp_clock.h
@@ -32,6 +32,7 @@
#define PTP_RISING_EDGE (1<<1)
#define PTP_FALLING_EDGE (1<<2)
#define PTP_STRICT_FLAGS (1<<3)
+#define PTP_KVM_ARM_PHY_COUNTER (1<<4)
#define PTP_EXTTS_EDGES (PTP_RISING_EDGE | PTP_FALLING_EDGE)
/*
@@ -40,7 +41,8 @@
#define PTP_EXTTS_VALID_FLAGS (PTP_ENABLE_FEATURE | \
PTP_RISING_EDGE | \
PTP_FALLING_EDGE | \
- PTP_STRICT_FLAGS)
+ PTP_STRICT_FLAGS | \
+ PTP_KVM_ARM_PHY_COUNTER)
/*
* flag fields valid for the original PTP_EXTTS_REQUEST ioctl.
--
2.17.1
Let userspace check if there is kvm ptp service in host.
Before VMs migrate to another host, VMM may check if this
cap is available to determine the next behavior.
Signed-off-by: Jianyong Wu <[email protected]>
Suggested-by: Marc Zyngier <[email protected]>
---
include/uapi/linux/kvm.h | 1 +
virt/kvm/arm/arm.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 428c7dde6b4b..668049ad78e1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1017,6 +1017,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_S390_VCPU_RESETS 179
#define KVM_CAP_S390_PROTECTED 180
#define KVM_CAP_PPC_SECURE_GUEST 181
+#define KVM_CAP_ARM_KVM_PTP 182
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..4726a88949f5 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -195,6 +195,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
case KVM_CAP_ARM_NISV_TO_USER:
case KVM_CAP_ARM_INJECT_EXT_DABT:
+ case KVM_CAP_ARM_KVM_PTP:
r = 1;
break;
case KVM_CAP_ARM_SET_DEVICE_ADDR:
--
2.17.1
ptp_kvm modules will get this service through smccc call.
The service offers real time and counter cycle of host for guest.
Also let caller determine which cycle of virtual counter or physical counter
to return.
Signed-off-by: Jianyong Wu <[email protected]>
---
include/linux/arm-smccc.h | 14 ++++++++++++
virt/kvm/Kconfig | 4 ++++
virt/kvm/arm/hypercalls.c | 47 +++++++++++++++++++++++++++++++++++++++
3 files changed, 65 insertions(+)
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index bdc0124a064a..badadc390809 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -94,6 +94,8 @@
/* KVM "vendor specific" services */
#define ARM_SMCCC_KVM_FUNC_FEATURES 0
+#define ARM_SMCCC_KVM_FUNC_KVM_PTP 1
+#define ARM_SMCCC_KVM_FUNC_KVM_PTP_PHY 2
#define ARM_SMCCC_KVM_FUNC_FEATURES_2 127
#define ARM_SMCCC_KVM_NUM_FUNCS 128
@@ -103,6 +105,18 @@
ARM_SMCCC_OWNER_VENDOR_HYP, \
ARM_SMCCC_KVM_FUNC_FEATURES)
+#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_KVM_PTP)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_KVM_PTP_PHY)
+
#ifndef __ASSEMBLY__
#include <linux/linkage.h>
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index aad9284c043a..bf820811e815 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -60,3 +60,7 @@ config HAVE_KVM_VCPU_RUN_PID_CHANGE
config HAVE_KVM_NO_POLL
bool
+
+config ARM64_KVM_PTP_HOST
+ def_bool y
+ depends on ARM64 && KVM
diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
index db6dce3d0e23..c964122f8dae 100644
--- a/virt/kvm/arm/hypercalls.c
+++ b/virt/kvm/arm/hypercalls.c
@@ -3,6 +3,7 @@
#include <linux/arm-smccc.h>
#include <linux/kvm_host.h>
+#include <linux/clocksource_ids.h>
#include <asm/kvm_emulate.h>
@@ -11,6 +12,10 @@
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
{
+#ifdef CONFIG_ARM64_KVM_PTP_HOST
+ struct system_time_snapshot systime_snapshot;
+ u64 cycles;
+#endif
u32 func_id = smccc_get_function(vcpu);
u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
u32 feature;
@@ -70,7 +75,49 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
+
+#ifdef CONFIG_ARM64_KVM_PTP_HOST
+ val[0] |= BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP);
+#endif
break;
+
+#ifdef CONFIG_ARM64_KVM_PTP_HOST
+ /*
+ * This serves virtual kvm_ptp.
+ * Four values will be passed back.
+ * reg0 stores high 32-bit host ktime;
+ * reg1 stores low 32-bit host ktime;
+ * reg2 stores high 32-bit difference of host cycles and cntvoff;
+ * reg3 stores low 32-bit difference of host cycles and cntvoff.
+ */
+ case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
+ /*
+ * system time and counter value must captured in the same
+ * time to keep consistency and precision.
+ */
+ ktime_get_snapshot(&systime_snapshot);
+ if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
+ break;
+ val[0] = upper_32_bits(systime_snapshot.real);
+ val[1] = lower_32_bits(systime_snapshot.real);
+ /*
+ * which of virtual counter or physical counter being
+ * asked for is decided by the first argument.
+ */
+ feature = smccc_get_arg1(vcpu);
+ switch (feature) {
+ case ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID:
+ cycles = systime_snapshot.cycles;
+ break;
+ default:
+ cycles = systime_snapshot.cycles -
+ vcpu_vtimer(vcpu)->cntvoff;
+ }
+ val[2] = upper_32_bits(cycles);
+ val[3] = lower_32_bits(cycles);
+ break;
+#endif
+
default:
return kvm_psci_call(vcpu);
}
--
2.17.1
sometimes we may need tell getcrosstimestamp call back how to perform
itself. Extending input arguments for getcrosstimestamp API to offer more
exquisite control for the operation.
Signed-off-by: Jianyong Wu <[email protected]>
---
drivers/clocksource/arm_arch_timer.c | 2 +-
drivers/net/ethernet/intel/e1000e/ptp.c | 3 ++-
drivers/ptp/ptp_chardev.c | 2 +-
drivers/ptp/ptp_kvm.h | 2 +-
drivers/ptp/ptp_kvm_x86.c | 2 +-
include/linux/ptp_clock_kernel.h | 3 ++-
6 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 848613261508..06959b901b0d 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -1645,7 +1645,7 @@ TIMER_ACPI_DECLARE(arch_timer, ACPI_SIG_GTDT, arch_timer_acpi_init);
#if IS_ENABLED(CONFIG_PTP_1588_CLOCK_KVM)
#include <linux/arm-smccc.h>
int kvm_arch_ptp_get_crosststamp(unsigned long *cycle, struct timespec64 *ts,
- struct clocksource **cs)
+ struct clocksource **cs, long *ctx)
{
struct arm_smccc_res hvc_res;
ktime_t ktime_overall;
diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c b/drivers/net/ethernet/intel/e1000e/ptp.c
index 439fda2f5368..4f98894316e9 100644
--- a/drivers/net/ethernet/intel/e1000e/ptp.c
+++ b/drivers/net/ethernet/intel/e1000e/ptp.c
@@ -150,7 +150,8 @@ static int e1000e_phc_get_syncdevicetime(ktime_t *device,
* clock values in ns.
**/
static int e1000e_phc_getcrosststamp(struct ptp_clock_info *ptp,
- struct system_device_crosststamp *xtstamp)
+ struct system_device_crosststamp *xtstamp,
+ long *arg)
{
struct e1000_adapter *adapter = container_of(ptp, struct e1000_adapter,
ptp_clock_info);
diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index 93d574faf1fe..fef72f29f3c8 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -235,7 +235,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
err = -EOPNOTSUPP;
break;
}
- err = ptp->info->getcrosststamp(ptp->info, &xtstamp);
+ err = ptp->info->getcrosststamp(ptp->info, &xtstamp, NULL);
if (err)
break;
diff --git a/drivers/ptp/ptp_kvm.h b/drivers/ptp/ptp_kvm.h
index 4bf1802bbeb8..ccceacbe8398 100644
--- a/drivers/ptp/ptp_kvm.h
+++ b/drivers/ptp/ptp_kvm.h
@@ -8,4 +8,4 @@
int kvm_arch_ptp_init(void);
int kvm_arch_ptp_get_clock(struct timespec64 *ts);
int kvm_arch_ptp_get_crosststamp(unsigned long *cycle,
- struct timespec64 *tspec, void *cs);
+ struct timespec64 *tspec, struct clocksource **cs, long *ctx);
diff --git a/drivers/ptp/ptp_kvm_x86.c b/drivers/ptp/ptp_kvm_x86.c
index aabed1b08a0d..54cf2c78b2e0 100644
--- a/drivers/ptp/ptp_kvm_x86.c
+++ b/drivers/ptp/ptp_kvm_x86.c
@@ -55,7 +55,7 @@ int kvm_arch_ptp_get_clock(struct timespec64 *ts)
}
int kvm_arch_ptp_get_crosststamp(unsigned long *cycle, struct timespec64 *tspec,
- struct clocksource **cs)
+ struct clocksource **cs, void *ctx)
{
unsigned long ret;
unsigned int version;
diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
index c602670bbffb..ba765647e54b 100644
--- a/include/linux/ptp_clock_kernel.h
+++ b/include/linux/ptp_clock_kernel.h
@@ -133,7 +133,8 @@ struct ptp_clock_info {
int (*gettimex64)(struct ptp_clock_info *ptp, struct timespec64 *ts,
struct ptp_system_timestamp *sts);
int (*getcrosststamp)(struct ptp_clock_info *ptp,
- struct system_device_crosststamp *cts);
+ struct system_device_crosststamp *cts,
+ long *flag);
int (*settime64)(struct ptp_clock_info *p, const struct timespec64 *ts);
int (*enable)(struct ptp_clock_info *ptp,
struct ptp_clock_request *request, int on);
--
2.17.1
On Fri, May 22, 2020 at 04:37:16PM +0800, Jianyong Wu wrote:
> Export arm_smccc_1_1_get_conduit then modules can use smccc helper which
> adopts it.
>
> Acked-by: Mark Rutland <[email protected]>
> Signed-off-by: Jianyong Wu <[email protected]>
> ---
> drivers/firmware/psci/psci.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 2937d44b5df4..fd3c88f21b6a 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -64,6 +64,7 @@ enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void)
>
> return psci_ops.conduit;
> }
> +EXPORT_SYMBOL(arm_smccc_1_1_get_conduit);
>
I have moved this into drivers/firmware/smccc/smccc.c [1]
Please update this accordingly.
Also this series is floating on the list for a while now, it is time to
drop "RFC" unless anyone has strong objection to the idea here.
--
Regards,
Sudeep
[1] https://git.kernel.org/arm64/c/f2ae97062a48
On 22/05/2020 09:37, Jianyong Wu wrote:
> ptp_kvm modules will get this service through smccc call.
> The service offers real time and counter cycle of host for guest.
> Also let caller determine which cycle of virtual counter or physical counter
> to return.
>
> Signed-off-by: Jianyong Wu <[email protected]>
> ---
> include/linux/arm-smccc.h | 14 ++++++++++++
> virt/kvm/Kconfig | 4 ++++
> virt/kvm/arm/hypercalls.c | 47 +++++++++++++++++++++++++++++++++++++++
> 3 files changed, 65 insertions(+)
>
> diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
> index bdc0124a064a..badadc390809 100644
> --- a/include/linux/arm-smccc.h
> +++ b/include/linux/arm-smccc.h
> @@ -94,6 +94,8 @@
>
> /* KVM "vendor specific" services */
> #define ARM_SMCCC_KVM_FUNC_FEATURES 0
> +#define ARM_SMCCC_KVM_FUNC_KVM_PTP 1
> +#define ARM_SMCCC_KVM_FUNC_KVM_PTP_PHY 2
> #define ARM_SMCCC_KVM_FUNC_FEATURES_2 127
> #define ARM_SMCCC_KVM_NUM_FUNCS 128
>
> @@ -103,6 +105,18 @@
> ARM_SMCCC_OWNER_VENDOR_HYP, \
> ARM_SMCCC_KVM_FUNC_FEATURES)
>
> +#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID \
> + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
> + ARM_SMCCC_SMC_32, \
> + ARM_SMCCC_OWNER_VENDOR_HYP, \
> + ARM_SMCCC_KVM_FUNC_KVM_PTP)
> +
> +#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID \
> + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
> + ARM_SMCCC_SMC_32, \
> + ARM_SMCCC_OWNER_VENDOR_HYP, \
> + ARM_SMCCC_KVM_FUNC_KVM_PTP_PHY)
> +
> #ifndef __ASSEMBLY__
>
> #include <linux/linkage.h>
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index aad9284c043a..bf820811e815 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -60,3 +60,7 @@ config HAVE_KVM_VCPU_RUN_PID_CHANGE
>
> config HAVE_KVM_NO_POLL
> bool
> +
> +config ARM64_KVM_PTP_HOST
> + def_bool y
> + depends on ARM64 && KVM
> diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
> index db6dce3d0e23..c964122f8dae 100644
> --- a/virt/kvm/arm/hypercalls.c
> +++ b/virt/kvm/arm/hypercalls.c
> @@ -3,6 +3,7 @@
>
> #include <linux/arm-smccc.h>
> #include <linux/kvm_host.h>
> +#include <linux/clocksource_ids.h>
>
> #include <asm/kvm_emulate.h>
>
> @@ -11,6 +12,10 @@
>
> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> {
> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> + struct system_time_snapshot systime_snapshot;
> + u64 cycles;
> +#endif
> u32 func_id = smccc_get_function(vcpu);
> u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
> u32 feature;
> @@ -70,7 +75,49 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> break;
> case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> +
> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP);
> +#endif
> break;
> +
> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> + /*
> + * This serves virtual kvm_ptp.
> + * Four values will be passed back.
> + * reg0 stores high 32-bit host ktime;
> + * reg1 stores low 32-bit host ktime;
> + * reg2 stores high 32-bit difference of host cycles and cntvoff;
> + * reg3 stores low 32-bit difference of host cycles and cntvoff.
> + */
> + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> + /*
> + * system time and counter value must captured in the same
> + * time to keep consistency and precision.
> + */
> + ktime_get_snapshot(&systime_snapshot);
> + if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
> + break;
> + val[0] = upper_32_bits(systime_snapshot.real);
> + val[1] = lower_32_bits(systime_snapshot.real);
> + /*
> + * which of virtual counter or physical counter being
> + * asked for is decided by the first argument.
> + */
> + feature = smccc_get_arg1(vcpu);
> + switch (feature) {
> + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID:
> + cycles = systime_snapshot.cycles;
> + break;
> + default:
There's something a bit odd here.
ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID and
ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID look like they should be names
of separate (top-level) functions, but actually the _PHY_ one is a
parameter for the first. If the intention is to have a parameter then it
would be better to pick a better name for the _PHY_ define and not
define it using ARM_SMCCC_CALL_VAL.
Second the use of "default:" means that there's no possibility to later
extend this interface for more clocks if needed in the future.
Alternatively you could indeed implement as two top-level functions and
change this to a...
switch (func_id)
... along with multiple case labels as the functions would obviously be
mostly the same.
Also a minor style issue - you might want to consider splitting this
into it's own function.
Finally I do think it would be useful to add some documentation of the
new SMC calls. It would be easier to review the interface based on that
documentation rather than trying to reverse-engineer the interface from
the code.
Steve
> + cycles = systime_snapshot.cycles -
> + vcpu_vtimer(vcpu)->cntvoff;
> + }
> + val[2] = upper_32_bits(cycles);
> + val[3] = lower_32_bits(cycles);
> + break;
> +#endif
> +
> default:
> return kvm_psci_call(vcpu);
> }
>
On Fri, May 22, 2020 at 04:37:22PM +0800, Jianyong Wu wrote:
> sometimes we may need tell getcrosstimestamp call back how to perform
> itself. Extending input arguments for getcrosstimestamp API to offer more
> exquisite control for the operation.
This text does not offer any justification for the change in API.
> diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
> index c602670bbffb..ba765647e54b 100644
> --- a/include/linux/ptp_clock_kernel.h
> +++ b/include/linux/ptp_clock_kernel.h
> @@ -133,7 +133,8 @@ struct ptp_clock_info {
> int (*gettimex64)(struct ptp_clock_info *ptp, struct timespec64 *ts,
> struct ptp_system_timestamp *sts);
> int (*getcrosststamp)(struct ptp_clock_info *ptp,
> - struct system_device_crosststamp *cts);
> + struct system_device_crosststamp *cts,
> + long *flag);
Well, you ignored the kernel doc completely. But in any case, I must
NAK this completely opaque and mysterious change. You want to add a
random pointer to some flag? I don't think so.
Thanks,
Richard
On Fri, May 22, 2020 at 04:37:23PM +0800, Jianyong Wu wrote:
> To use this feature, you should call PTP_EXTTS_REQUEST(2) ioctl with flag
> set bit PTP_KVM_ARM_PHY_COUNTER in its argument then call
> PTP_SYS_OFFSET_PRECISE(2) ioctl to get the cross timestamp and phyical
> counter will return. If the bit not set or no call for PTP_EXTTS_REQUEST2,
> virtual counter will return by default.
I'm sorry, but NAK on this completely bizarre twisting of the user
space API.
> diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
> index fef72f29f3c8..8b0a7b328bcd 100644
> --- a/drivers/ptp/ptp_chardev.c
> +++ b/drivers/ptp/ptp_chardev.c
> @@ -123,6 +123,9 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> struct timespec64 ts;
> int enable, err = 0;
>
> +#ifdef CONFIG_ARM64
> + static long flag;
> +#endif
> switch (cmd) {
>
> case PTP_CLOCK_GETCAPS:
> @@ -149,6 +152,24 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> err = -EFAULT;
> break;
> }
> +
> +#ifdef CONFIG_ARM64
> + /*
> + * Just using this ioctl to tell kvm ptp driver to get PHC
> + * with physical counter, so if bit PTP_KVM_ARM_PHY_COUNTER
> + * is set then just exit directly.
> + * In most cases, we just need virtual counter from host and
> + * there is limited scenario using this to get physical counter
> + * in guest.
> + * Be careful to use this as there is no way to set it back
> + * unless you reinstall the module.
> + * This is only for arm64.
> + */
> + if (req.extts.flags & PTP_KVM_ARM_PHY_COUNTER) {
> + flag = 1;
> + break;
> + }
> +#endif
This file contains the generic PTP Hardware Clock character device
implementation. It is no place for platform specific hacks.
Sorry,
Richard
On Fri, May 22, 2020 at 04:37:23PM +0800, Jianyong Wu wrote:
> In general, vm inside will use virtual counter compered with host use
> phyical counter. But in some special scenarios, like nested
> virtualization, phyical counter maybe used by vm. A interface added in
> ptp_kvm driver to offer a mechanism to let user choose which counter
> should be return from host.
Sounds like you have two time sources, one for normal guest, and one
for nested. Why not simply offer the correct one to user space
automatically? If that cannot be done, then just offer two PHC
devices with descriptive names.
> diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
> index fef72f29f3c8..8b0a7b328bcd 100644
> --- a/drivers/ptp/ptp_chardev.c
> +++ b/drivers/ptp/ptp_chardev.c
> @@ -123,6 +123,9 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> struct timespec64 ts;
> int enable, err = 0;
>
> +#ifdef CONFIG_ARM64
> + static long flag;
static? This is not going to fly.
> + * In most cases, we just need virtual counter from host and
> + * there is limited scenario using this to get physical counter
> + * in guest.
> + * Be careful to use this as there is no way to set it back
> + * unless you reinstall the module.
How on earth is the user supposed to know this?
From your description, this "flag" really should be a module
parameter.
Thanks,
Richard
Hi Sudeep,
> -----Original Message-----
> From: Sudeep Holla <[email protected]>
> Sent: Friday, May 22, 2020 9:12 PM
> To: Jianyong Wu <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Mark Rutland
> <[email protected]>; [email protected]; Suzuki Poulose
> <[email protected]>; Steven Price <[email protected]>; Justin
> He <[email protected]>; Wei Chen <[email protected]>;
> [email protected]; Steve Capper <[email protected]>; linux-
> [email protected]; Kaly Xin <[email protected]>; nd <[email protected]>;
> Sudeep Holla <[email protected]>; [email protected];
> [email protected]
> Subject: Re: [RFC PATCH v12 03/11] psci: export smccc conduit get helper.
>
> On Fri, May 22, 2020 at 04:37:16PM +0800, Jianyong Wu wrote:
> > Export arm_smccc_1_1_get_conduit then modules can use smccc helper
> > which adopts it.
> >
> > Acked-by: Mark Rutland <[email protected]>
> > Signed-off-by: Jianyong Wu <[email protected]>
> > ---
> > drivers/firmware/psci/psci.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/firmware/psci/psci.c
> > b/drivers/firmware/psci/psci.c index 2937d44b5df4..fd3c88f21b6a 100644
> > --- a/drivers/firmware/psci/psci.c
> > +++ b/drivers/firmware/psci/psci.c
> > @@ -64,6 +64,7 @@ enum arm_smccc_conduit
> > arm_smccc_1_1_get_conduit(void)
> >
> > return psci_ops.conduit;
> > }
> > +EXPORT_SYMBOL(arm_smccc_1_1_get_conduit);
> >
>
> I have moved this into drivers/firmware/smccc/smccc.c [1] Please update
> this accordingly.
Ok, I will remove this patch next version.
>
> Also this series is floating on the list for a while now, it is time to drop "RFC"
> unless anyone has strong objection to the idea here.
Yeah.
>
Thanks
Jianyong
> --
> Regards,
> Sudeep
>
> [1] https://git.kernel.org/arm64/c/f2ae97062a48
Hi Steven,
> -----Original Message-----
> From: Steven Price <[email protected]>
> Sent: Friday, May 22, 2020 10:18 PM
> To: Jianyong Wu <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Mark Rutland <[email protected]>;
> [email protected]; Suzuki Poulose <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Steve Capper
> <[email protected]>; Kaly Xin <[email protected]>; Justin He
> <[email protected]>; Wei Chen <[email protected]>; nd <[email protected]>
> Subject: Re: [RFC PATCH v12 07/11] psci: Add hypercall service for kvm ptp.
>
> On 22/05/2020 09:37, Jianyong Wu wrote:
> > ptp_kvm modules will get this service through smccc call.
> > The service offers real time and counter cycle of host for guest.
> > Also let caller determine which cycle of virtual counter or physical
> > counter to return.
> >
> > Signed-off-by: Jianyong Wu <[email protected]>
> > ---
> > include/linux/arm-smccc.h | 14 ++++++++++++
> > virt/kvm/Kconfig | 4 ++++
> > virt/kvm/arm/hypercalls.c | 47
> +++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 65 insertions(+)
> >
> > diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
> > index bdc0124a064a..badadc390809 100644
> > --- a/include/linux/arm-smccc.h
> > +++ b/include/linux/arm-smccc.h
> > @@ -94,6 +94,8 @@
> >
> > /* KVM "vendor specific" services */
> > #define ARM_SMCCC_KVM_FUNC_FEATURES 0
> > +#define ARM_SMCCC_KVM_FUNC_KVM_PTP 1
> > +#define ARM_SMCCC_KVM_FUNC_KVM_PTP_PHY 2
> > #define ARM_SMCCC_KVM_FUNC_FEATURES_2 127
> > #define ARM_SMCCC_KVM_NUM_FUNCS 128
> >
> > @@ -103,6 +105,18 @@
> > ARM_SMCCC_OWNER_VENDOR_HYP,
> \
> > ARM_SMCCC_KVM_FUNC_FEATURES)
> >
> > +#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID
> \
> > + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,
> \
> > + ARM_SMCCC_SMC_32,
> \
> > + ARM_SMCCC_OWNER_VENDOR_HYP,
> \
> > + ARM_SMCCC_KVM_FUNC_KVM_PTP)
> > +
> > +#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID
> \
> > + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,
> \
> > + ARM_SMCCC_SMC_32,
> \
> > + ARM_SMCCC_OWNER_VENDOR_HYP,
> \
> > + ARM_SMCCC_KVM_FUNC_KVM_PTP_PHY)
> > +
> > #ifndef __ASSEMBLY__
> >
> > #include <linux/linkage.h>
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index
> > aad9284c043a..bf820811e815 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -60,3 +60,7 @@ config HAVE_KVM_VCPU_RUN_PID_CHANGE
> >
> > config HAVE_KVM_NO_POLL
> > bool
> > +
> > +config ARM64_KVM_PTP_HOST
> > + def_bool y
> > + depends on ARM64 && KVM
> > diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
> > index db6dce3d0e23..c964122f8dae 100644
> > --- a/virt/kvm/arm/hypercalls.c
> > +++ b/virt/kvm/arm/hypercalls.c
> > @@ -3,6 +3,7 @@
> >
> > #include <linux/arm-smccc.h>
> > #include <linux/kvm_host.h>
> > +#include <linux/clocksource_ids.h>
> >
> > #include <asm/kvm_emulate.h>
> >
> > @@ -11,6 +12,10 @@
> >
> > int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > {
> > +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> > + struct system_time_snapshot systime_snapshot;
> > + u64 cycles;
> > +#endif
> > u32 func_id = smccc_get_function(vcpu);
> > u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
> > u32 feature;
> > @@ -70,7 +75,49 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> > break;
> > case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> > val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> > +
> > +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> > + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP); #endif
> > break;
> > +
> > +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> > + /*
> > + * This serves virtual kvm_ptp.
> > + * Four values will be passed back.
> > + * reg0 stores high 32-bit host ktime;
> > + * reg1 stores low 32-bit host ktime;
> > + * reg2 stores high 32-bit difference of host cycles and cntvoff;
> > + * reg3 stores low 32-bit difference of host cycles and cntvoff.
> > + */
> > + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> > + /*
> > + * system time and counter value must captured in the same
> > + * time to keep consistency and precision.
> > + */
> > + ktime_get_snapshot(&systime_snapshot);
> > + if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
> > + break;
> > + val[0] = upper_32_bits(systime_snapshot.real);
> > + val[1] = lower_32_bits(systime_snapshot.real);
> > + /*
> > + * which of virtual counter or physical counter being
> > + * asked for is decided by the first argument.
> > + */
> > + feature = smccc_get_arg1(vcpu);
> > + switch (feature) {
> > + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID:
> > + cycles = systime_snapshot.cycles;
> > + break;
> > + default:
>
> There's something a bit odd here.
>
> ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID and
> ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID look like they should
> be names of separate (top-level) functions, but actually the _PHY_ one is a
> parameter for the first. If the intention is to have a parameter then it would
> be better to pick a better name for the _PHY_ define and not define it using
> ARM_SMCCC_CALL_VAL.
>
Yeah, _PHY_ is not the same meaning with _PTP_FUNC_ID, so I think it should be a different name.
What about ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_COUNTER?
> Second the use of "default:" means that there's no possibility to later extend
> this interface for more clocks if needed in the future.
>
I think we can add more clocks by adding more cases, this "default" means we can use no first arg to determine the default clock.
> Alternatively you could indeed implement as two top-level functions and
> change this to a...
>
> switch (func_id)
>
> ... along with multiple case labels as the functions would obviously be mostly
> the same.
>
> Also a minor style issue - you might want to consider splitting this into it's
> own function.
>
I think "switch (feature)" maybe better as this _PHY_ is not like a function id. Just like:
"
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
feature = smccc_get_arg1(vcpu);
switch (feature) {
case ARM_SMCCC_ARCH_WORKAROUND_1:
...
"
> Finally I do think it would be useful to add some documentation of the new
> SMC calls. It would be easier to review the interface based on that
> documentation rather than trying to reverse-engineer the interface from the
> code.
>
Yeah, more doc needed here.
Thanks
Jianyong
> Steve
>
> > + cycles = systime_snapshot.cycles -
> > + vcpu_vtimer(vcpu)->cntvoff;
> > + }
> > + val[2] = upper_32_bits(cycles);
> > + val[3] = lower_32_bits(cycles);
> > + break;
> > +#endif
> > +
> > default:
> > return kvm_psci_call(vcpu);
> > }
> >
Hi Richard,
> -----Original Message-----
> From: Richard Cochran <[email protected]>
> Sent: Sunday, May 24, 2020 10:11 AM
> To: Jianyong Wu <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Mark Rutland <[email protected]>; [email protected];
> Suzuki Poulose <[email protected]>; Steven Price
> <[email protected]>; [email protected]; linux-arm-
> [email protected]; [email protected];
> [email protected]; Steve Capper <[email protected]>; Kaly Xin
> <[email protected]>; Justin He <[email protected]>; Wei Chen
> <[email protected]>; nd <[email protected]>
> Subject: Re: [RFC PATCH v12 10/11] arm64: add mechanism to let user choose
> which counter to return
>
> On Fri, May 22, 2020 at 04:37:23PM +0800, Jianyong Wu wrote:
> > In general, vm inside will use virtual counter compered with host use
> > phyical counter. But in some special scenarios, like nested
> > virtualization, phyical counter maybe used by vm. A interface added in
> > ptp_kvm driver to offer a mechanism to let user choose which counter
> > should be return from host.
>
> Sounds like you have two time sources, one for normal guest, and one for
> nested. Why not simply offer the correct one to user space automatically? If
> that cannot be done, then just offer two PHC devices with descriptive names.
>
It's a good idea, but in most case physical counter will not be used, so it's better not keep 2 ptp devices all the time.
How about adding an extra argument in struct ptp_clock_info to serve as a flag, then we can control this flag using IOCTL to determine the counter type.
In this way, no extra arguments needed in .getcrosststamp. But we also need specific code in ptp_ioctl to implement it like in this patch.
The second way, maybe we can use the flag as a module parameter, this is easier to implement.
@[email protected] WDYT?
> > diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
> > index fef72f29f3c8..8b0a7b328bcd 100644
> > --- a/drivers/ptp/ptp_chardev.c
> > +++ b/drivers/ptp/ptp_chardev.c
> > @@ -123,6 +123,9 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int
> cmd, unsigned long arg)
> > struct timespec64 ts;
> > int enable, err = 0;
> >
> > +#ifdef CONFIG_ARM64
> > + static long flag;
>
> static? This is not going to fly.
Need remove here.
>
> > + * In most cases, we just need virtual counter from host and
> > + * there is limited scenario using this to get physical counter
> > + * in guest.
> > + * Be careful to use this as there is no way to set it back
> > + * unless you reinstall the module.
>
> How on earth is the user supposed to know this?
>
Yeah, It's odd , should be removed.
> From your description, this "flag" really should be a module parameter.
Maybe use flag as a module parameter is a better way.
Thanks
Jianyong
>
> Thanks,
> Richard
Hi Richard,
> -----Original Message-----
> From: Richard Cochran <[email protected]>
> Sent: Monday, May 25, 2020 2:16 PM
> To: Jianyong Wu <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Mark Rutland <[email protected]>;
> [email protected]; Suzuki Poulose <[email protected]>; Steven Price
> <[email protected]>; [email protected]; linux-arm-
> [email protected]; [email protected];
> [email protected]; Steve Capper <[email protected]>; Kaly Xin
> <[email protected]>; Justin He <[email protected]>; Wei Chen
> <[email protected]>; nd <[email protected]>
> Subject: Re: [RFC PATCH v12 10/11] arm64: add mechanism to let user choose
> which counter to return
>
> On Mon, May 25, 2020 at 04:50:28AM +0000, Jianyong Wu wrote:
> > How about adding an extra argument in struct ptp_clock_info to serve as a
> flag, then we can control this flag using IOCTL to determine the counter type.
>
> no, No, NO!
>
Ok,
> > > From your description, this "flag" really should be a module parameter.
> > Maybe use flag as a module parameter is a better way.
>
> Yes.
>
It's fine for me, if @[email protected] is not against with it.
Thanks
Jianyong
> Thanks,
> Richard
On Mon, May 25, 2020 at 04:50:28AM +0000, Jianyong Wu wrote:
> How about adding an extra argument in struct ptp_clock_info to serve as a flag, then we can control this flag using IOCTL to determine the counter type.
no, No, NO!
> > From your description, this "flag" really should be a module parameter.
> Maybe use flag as a module parameter is a better way.
Yes.
Thanks,
Richard
On 2020-05-24 03:11, Richard Cochran wrote:
> On Fri, May 22, 2020 at 04:37:23PM +0800, Jianyong Wu wrote:
>> In general, vm inside will use virtual counter compered with host use
>> phyical counter. But in some special scenarios, like nested
>> virtualization, phyical counter maybe used by vm. A interface added in
>> ptp_kvm driver to offer a mechanism to let user choose which counter
>> should be return from host.
>
> Sounds like you have two time sources, one for normal guest, and one
> for nested. Why not simply offer the correct one to user space
> automatically? If that cannot be done, then just offer two PHC
> devices with descriptive names.
There is no such thing as a distinction between nested or non-nested.
Both counters are available to the guest at all times, and said guest
can choose whichever it wants to use. So the hypervisor (KVM) has to
support both counters as a reference.
For a Linux guest, we always know which reference we're using (the
virtual counter). So it is pointless to expose the choice to userspace
at all.
>
>> diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
>> index fef72f29f3c8..8b0a7b328bcd 100644
>> --- a/drivers/ptp/ptp_chardev.c
>> +++ b/drivers/ptp/ptp_chardev.c
>> @@ -123,6 +123,9 @@ long ptp_ioctl(struct posix_clock *pc, unsigned
>> int cmd, unsigned long arg)
>> struct timespec64 ts;
>> int enable, err = 0;
>>
>> +#ifdef CONFIG_ARM64
>> + static long flag;
>
> static? This is not going to fly.
>
>> + * In most cases, we just need virtual counter from host and
>> + * there is limited scenario using this to get physical counter
>> + * in guest.
>> + * Be careful to use this as there is no way to set it back
>> + * unless you reinstall the module.
>
> How on earth is the user supposed to know this?
>
> From your description, this "flag" really should be a module
> parameter.
Not even that. If anything, the driver can obtain full knowledge of
which
counter is in use without any help. And the hard truth is that it is
*always* the virtual counter as far as Linux is concerned.
M.
--
Jazz is not dead. It just smells funny...
Hi Marc,
> -----Original Message-----
> From: Marc Zyngier <[email protected]>
> Sent: Monday, May 25, 2020 5:17 PM
> To: Richard Cochran <[email protected]>; Jianyong Wu
> <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> Mark Rutland <[email protected]>; [email protected]; Suzuki Poulose
> <[email protected]>; Steven Price <[email protected]>; linux-
> [email protected]; [email protected];
> [email protected]; [email protected]; Steve Capper
> <[email protected]>; Kaly Xin <[email protected]>; Justin He
> <[email protected]>; Wei Chen <[email protected]>; nd <[email protected]>
> Subject: Re: [RFC PATCH v12 10/11] arm64: add mechanism to let user
> choose which counter to return
>
> On 2020-05-24 03:11, Richard Cochran wrote:
> > On Fri, May 22, 2020 at 04:37:23PM +0800, Jianyong Wu wrote:
> >> In general, vm inside will use virtual counter compered with host use
> >> phyical counter. But in some special scenarios, like nested
> >> virtualization, phyical counter maybe used by vm. A interface added
> >> in ptp_kvm driver to offer a mechanism to let user choose which
> >> counter should be return from host.
> >
> > Sounds like you have two time sources, one for normal guest, and one
> > for nested. Why not simply offer the correct one to user space
> > automatically? If that cannot be done, then just offer two PHC
> > devices with descriptive names.
>
> There is no such thing as a distinction between nested or non-nested.
> Both counters are available to the guest at all times, and said guest can
> choose whichever it wants to use. So the hypervisor (KVM) has to support
> both counters as a reference.
>
It's great that we can decide which counter to return in guest kernel. So we can abandon these code, including patch 9/11 and 10/11, that expose the interface to userspace to do the decision.
> For a Linux guest, we always know which reference we're using (the virtual
> counter). So it is pointless to expose the choice to userspace at all.
>
So, we should throw these code of deciding counter type in linux driver away and just keep the hypercall service of providing both virtual counter and physical counter in linux to server non-linux guest.
Am I right?
> >
> >> diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
> >> index fef72f29f3c8..8b0a7b328bcd 100644
> >> --- a/drivers/ptp/ptp_chardev.c
> >> +++ b/drivers/ptp/ptp_chardev.c
> >> @@ -123,6 +123,9 @@ long ptp_ioctl(struct posix_clock *pc, unsigned
> >> int cmd, unsigned long arg)
> >> struct timespec64 ts;
> >> int enable, err = 0;
> >>
> >> +#ifdef CONFIG_ARM64
> >> + static long flag;
> >
> > static? This is not going to fly.
> >
> >> + * In most cases, we just need virtual counter from host and
> >> + * there is limited scenario using this to get physical counter
> >> + * in guest.
> >> + * Be careful to use this as there is no way to set it back
> >> + * unless you reinstall the module.
> >
> > How on earth is the user supposed to know this?
> >
> > From your description, this "flag" really should be a module
> > parameter.
>
> Not even that. If anything, the driver can obtain full knowledge of which
> counter is in use without any help. And the hard truth is that it is
> *always* the virtual counter as far as Linux is concerned.
Good!
Thanks
Jianyong
>
> M.
> --
> Jazz is not dead. It just smells funny...
On 2020-05-25 15:18, Jianyong Wu wrote:
> Hi Marc,
>
>> -----Original Message-----
>> From: Marc Zyngier <[email protected]>
>> Sent: Monday, May 25, 2020 5:17 PM
>> To: Richard Cochran <[email protected]>; Jianyong Wu
>> <[email protected]>
>> Cc: [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected];
>> Mark Rutland <[email protected]>; [email protected]; Suzuki Poulose
>> <[email protected]>; Steven Price <[email protected]>; linux-
>> [email protected]; [email protected];
>> [email protected]; [email protected]; Steve Capper
>> <[email protected]>; Kaly Xin <[email protected]>; Justin He
>> <[email protected]>; Wei Chen <[email protected]>; nd <[email protected]>
>> Subject: Re: [RFC PATCH v12 10/11] arm64: add mechanism to let user
>> choose which counter to return
>>
>> On 2020-05-24 03:11, Richard Cochran wrote:
>> > On Fri, May 22, 2020 at 04:37:23PM +0800, Jianyong Wu wrote:
>> >> In general, vm inside will use virtual counter compered with host use
>> >> phyical counter. But in some special scenarios, like nested
>> >> virtualization, phyical counter maybe used by vm. A interface added
>> >> in ptp_kvm driver to offer a mechanism to let user choose which
>> >> counter should be return from host.
>> >
>> > Sounds like you have two time sources, one for normal guest, and one
>> > for nested. Why not simply offer the correct one to user space
>> > automatically? If that cannot be done, then just offer two PHC
>> > devices with descriptive names.
>>
>> There is no such thing as a distinction between nested or non-nested.
>> Both counters are available to the guest at all times, and said guest
>> can
>> choose whichever it wants to use. So the hypervisor (KVM) has to
>> support
>> both counters as a reference.
>>
> It's great that we can decide which counter to return in guest kernel.
> So we can abandon these code, including patch 9/11 and 10/11, that
> expose the interface to userspace to do the decision.
>
>> For a Linux guest, we always know which reference we're using (the
>> virtual
>> counter). So it is pointless to expose the choice to userspace at all.
>>
> So, we should throw these code of deciding counter type in linux
> driver away and just keep the hypercall service of providing both
> virtual counter and physical counter in linux to server non-linux
> guest.
> Am I right?
Exactly. We control Linux, and so far nothing is using the physical
counter directly. It is only using the virtual counter.
On the other side, this is *only* Linux. Other operating systems
will need to pick the reference clock that matches their own.
If one day we change Linux to use the physical counter, we'll
have to do the same thing.
M.
--
Jazz is not dead. It just smells funny...
On 25/05/2020 03:11, Jianyong Wu wrote:
> Hi Steven,
Hi Jianyong,
[...]>>> diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
>>> index db6dce3d0e23..c964122f8dae 100644
>>> --- a/virt/kvm/arm/hypercalls.c
>>> +++ b/virt/kvm/arm/hypercalls.c
>>> @@ -3,6 +3,7 @@
>>>
>>> #include <linux/arm-smccc.h>
>>> #include <linux/kvm_host.h>
>>> +#include <linux/clocksource_ids.h>
>>>
>>> #include <asm/kvm_emulate.h>
>>>
>>> @@ -11,6 +12,10 @@
>>>
>>> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>>> {
>>> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
>>> + struct system_time_snapshot systime_snapshot;
>>> + u64 cycles;
>>> +#endif
>>> u32 func_id = smccc_get_function(vcpu);
>>> u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
>>> u32 feature;
>>> @@ -70,7 +75,49 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>>> break;
>>> case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
>>> val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
>>> +
>>> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
>>> + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP); #endif
>>> break;
>>> +
>>> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
>>> + /*
>>> + * This serves virtual kvm_ptp.
>>> + * Four values will be passed back.
>>> + * reg0 stores high 32-bit host ktime;
>>> + * reg1 stores low 32-bit host ktime;
>>> + * reg2 stores high 32-bit difference of host cycles and cntvoff;
>>> + * reg3 stores low 32-bit difference of host cycles and cntvoff.
>>> + */
>>> + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
>>> + /*
>>> + * system time and counter value must captured in the same
>>> + * time to keep consistency and precision.
>>> + */
>>> + ktime_get_snapshot(&systime_snapshot);
>>> + if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
>>> + break;
>>> + val[0] = upper_32_bits(systime_snapshot.real);
>>> + val[1] = lower_32_bits(systime_snapshot.real);
>>> + /*
>>> + * which of virtual counter or physical counter being
>>> + * asked for is decided by the first argument.
>>> + */
>>> + feature = smccc_get_arg1(vcpu);
>>> + switch (feature) {
>>> + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID:
>>> + cycles = systime_snapshot.cycles;
>>> + break;
>>> + default:
>>
>> There's something a bit odd here.
>>
>> ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID and
>> ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID look like they should
>> be names of separate (top-level) functions, but actually the _PHY_ one is a
>> parameter for the first. If the intention is to have a parameter then it would
>> be better to pick a better name for the _PHY_ define and not define it using
>> ARM_SMCCC_CALL_VAL.
>>
> Yeah, _PHY_ is not the same meaning with _PTP_FUNC_ID, so I think it should be a different name.
> What about ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_COUNTER?
Personally I'd go with something much shorter, e.g. ARM_PTP_PHY_COUNTER.
This is just an argument to an SMCCC call so there's no need for most of
the prefix, indeed if (for whatever reason) there was a non-SMCCC
mechanism added to do the same thing it would be reasonable to reuse the
same values.
>> Second the use of "default:" means that there's no possibility to later extend
>> this interface for more clocks if needed in the future.
>>
> I think we can add more clocks by adding more cases, this "default" means we can use no first arg to determine the default clock.
The problem with the 'default' is it means it's not possible to probe
whether the kernel supports any more clocks. If we used a different
value (that the kernel doesn't support) then we end up in the default
case and have no idea whether the clock value is the one we requested or
not.
It's generally better when defining an ABI to explicitly return an error
for unknown parameters, that way a future user of the ABI can discover
whether the call did what was expected or not.
>> Alternatively you could indeed implement as two top-level functions and
>> change this to a...
>>
>> switch (func_id)
>>
>> ... along with multiple case labels as the functions would obviously be mostly
>> the same.
>>
>> Also a minor style issue - you might want to consider splitting this into it's
>> own function.
>>
> I think "switch (feature)" maybe better as this _PHY_ is not like a function id. Just like:
> "
> case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
> feature = smccc_get_arg1(vcpu);
> switch (feature) {
> case ARM_SMCCC_ARCH_WORKAROUND_1:
> ...
> "
I'm happy either way - it's purely that the definition/naming of
ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID made it look like that was the
intention. My preference would be to stick with the 'feature' approach
as above because there's no need to "use up" the top-level SMCCC calls
(but equally there's a large space so we'd have to work very hard to run
out... ;) )
>> Finally I do think it would be useful to add some documentation of the new
>> SMC calls. It would be easier to review the interface based on that
>> documentation rather than trying to reverse-engineer the interface from the
>> code.
>>
> Yeah, more doc needed here.
Thanks, I think it's a good idea to get the ABI nailed down before
worrying too much about the code, and it's easier to discuss based on
documentation rather than code.
Thanks,
Steve
On Mon, May 25, 2020 at 01:37:56AM +0000, Jianyong Wu wrote:
> Hi Sudeep,
>
> > -----Original Message-----
> > From: Sudeep Holla <[email protected]>
> > Sent: Friday, May 22, 2020 9:12 PM
> > To: Jianyong Wu <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; Mark Rutland
> > <[email protected]>; [email protected]; Suzuki Poulose
> > <[email protected]>; Steven Price <[email protected]>; Justin
> > He <[email protected]>; Wei Chen <[email protected]>;
> > [email protected]; Steve Capper <[email protected]>; linux-
> > [email protected]; Kaly Xin <[email protected]>; nd <[email protected]>;
> > Sudeep Holla <[email protected]>; [email protected];
> > [email protected]
> > Subject: Re: [RFC PATCH v12 03/11] psci: export smccc conduit get helper.
> >
> > On Fri, May 22, 2020 at 04:37:16PM +0800, Jianyong Wu wrote:
> > > Export arm_smccc_1_1_get_conduit then modules can use smccc helper
> > > which adopts it.
> > >
> > > Acked-by: Mark Rutland <[email protected]>
> > > Signed-off-by: Jianyong Wu <[email protected]>
> > > ---
> > > drivers/firmware/psci/psci.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/firmware/psci/psci.c
> > > b/drivers/firmware/psci/psci.c index 2937d44b5df4..fd3c88f21b6a 100644
> > > --- a/drivers/firmware/psci/psci.c
> > > +++ b/drivers/firmware/psci/psci.c
> > > @@ -64,6 +64,7 @@ enum arm_smccc_conduit
> > > arm_smccc_1_1_get_conduit(void)
> > >
> > > return psci_ops.conduit;
> > > }
> > > +EXPORT_SYMBOL(arm_smccc_1_1_get_conduit);
> > >
> >
> > I have moved this into drivers/firmware/smccc/smccc.c [1] Please update
> > this accordingly.
>
> Ok, I will remove this patch next version.
You may need it still, just that this patch won't apply as the function
is moved to a new file.
--
Regards,
Sudeep
Hi Sudeep,
> -----Original Message-----
> From: Sudeep Holla <[email protected]>
> Sent: Tuesday, May 26, 2020 6:10 PM
> To: Jianyong Wu <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Mark Rutland
> <[email protected]>; [email protected]; Suzuki Poulose
> <[email protected]>; Steven Price <[email protected]>; Justin
> He <[email protected]>; Wei Chen <[email protected]>;
> [email protected]; Steve Capper <[email protected]>; linux-
> [email protected]; Kaly Xin <[email protected]>; nd <[email protected]>;
> Sudeep Holla <[email protected]>; [email protected];
> [email protected]
> Subject: Re: [RFC PATCH v12 03/11] psci: export smccc conduit get helper.
>
> On Mon, May 25, 2020 at 01:37:56AM +0000, Jianyong Wu wrote:
> > Hi Sudeep,
> >
> > > -----Original Message-----
> > > From: Sudeep Holla <[email protected]>
> > > Sent: Friday, May 22, 2020 9:12 PM
> > > To: Jianyong Wu <[email protected]>
> > > Cc: [email protected]; [email protected];
> > > [email protected]; [email protected]; [email protected];
> > > [email protected]; [email protected];
> > > [email protected]; Mark Rutland <[email protected]>;
> > > [email protected]; Suzuki Poulose <[email protected]>; Steven
> > > Price <[email protected]>; Justin He <[email protected]>; Wei
> > > Chen <[email protected]>; [email protected]; Steve Capper
> > > <[email protected]>; linux- [email protected]; Kaly Xin
> > > <[email protected]>; nd <[email protected]>; Sudeep Holla
> > > <[email protected]>; [email protected];
> > > [email protected]
> > > Subject: Re: [RFC PATCH v12 03/11] psci: export smccc conduit get helper.
> > >
> > > On Fri, May 22, 2020 at 04:37:16PM +0800, Jianyong Wu wrote:
> > > > Export arm_smccc_1_1_get_conduit then modules can use smccc
> helper
> > > > which adopts it.
> > > >
> > > > Acked-by: Mark Rutland <[email protected]>
> > > > Signed-off-by: Jianyong Wu <[email protected]>
> > > > ---
> > > > drivers/firmware/psci/psci.c | 1 +
> > > > 1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/drivers/firmware/psci/psci.c
> > > > b/drivers/firmware/psci/psci.c index 2937d44b5df4..fd3c88f21b6a
> > > > 100644
> > > > --- a/drivers/firmware/psci/psci.c
> > > > +++ b/drivers/firmware/psci/psci.c
> > > > @@ -64,6 +64,7 @@ enum arm_smccc_conduit
> > > > arm_smccc_1_1_get_conduit(void)
> > > >
> > > > return psci_ops.conduit;
> > > > }
> > > > +EXPORT_SYMBOL(arm_smccc_1_1_get_conduit);
> > > >
> > >
> > > I have moved this into drivers/firmware/smccc/smccc.c [1] Please
> > > update this accordingly.
> >
> > Ok, I will remove this patch next version.
>
> You may need it still, just that this patch won't apply as the function is moved
> to a new file.
>
Yeah, Thanks for remainder!
Thanks
Jianyong
> --
> Regards,
> Sudeep
Hi Steven,
> -----Original Message-----
> From: Steven Price <[email protected]>
> Sent: Tuesday, May 26, 2020 7:02 PM
> To: Jianyong Wu <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Mark Rutland <[email protected]>;
> [email protected]; Suzuki Poulose <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Steve Capper
> <[email protected]>; Kaly Xin <[email protected]>; Justin He
> <[email protected]>; Wei Chen <[email protected]>; nd <[email protected]>
> Subject: Re: [RFC PATCH v12 07/11] psci: Add hypercall service for kvm ptp.
>
> On 25/05/2020 03:11, Jianyong Wu wrote:
> > Hi Steven,
>
> Hi Jianyong,
>
> [...]>>> diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
> >>> index db6dce3d0e23..c964122f8dae 100644
> >>> --- a/virt/kvm/arm/hypercalls.c
> >>> +++ b/virt/kvm/arm/hypercalls.c
> >>> @@ -3,6 +3,7 @@
> >>>
> >>> #include <linux/arm-smccc.h>
> >>> #include <linux/kvm_host.h>
> >>> +#include <linux/clocksource_ids.h>
> >>>
> >>> #include <asm/kvm_emulate.h>
> >>>
> >>> @@ -11,6 +12,10 @@
> >>>
> >>> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> >>> {
> >>> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> >>> + struct system_time_snapshot systime_snapshot;
> >>> + u64 cycles;
> >>> +#endif
> >>> u32 func_id = smccc_get_function(vcpu);
> >>> u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
> >>> u32 feature;
> >>> @@ -70,7 +75,49 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> >>> break;
> >>> case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> >>> val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> >>> +
> >>> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> >>> + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP); #endif
> >>> break;
> >>> +
> >>> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> >>> + /*
> >>> + * This serves virtual kvm_ptp.
> >>> + * Four values will be passed back.
> >>> + * reg0 stores high 32-bit host ktime;
> >>> + * reg1 stores low 32-bit host ktime;
> >>> + * reg2 stores high 32-bit difference of host cycles and cntvoff;
> >>> + * reg3 stores low 32-bit difference of host cycles and cntvoff.
> >>> + */
> >>> + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> >>> + /*
> >>> + * system time and counter value must captured in the same
> >>> + * time to keep consistency and precision.
> >>> + */
> >>> + ktime_get_snapshot(&systime_snapshot);
> >>> + if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
> >>> + break;
> >>> + val[0] = upper_32_bits(systime_snapshot.real);
> >>> + val[1] = lower_32_bits(systime_snapshot.real);
> >>> + /*
> >>> + * which of virtual counter or physical counter being
> >>> + * asked for is decided by the first argument.
> >>> + */
> >>> + feature = smccc_get_arg1(vcpu);
> >>> + switch (feature) {
> >>> + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID:
> >>> + cycles = systime_snapshot.cycles;
> >>> + break;
> >>> + default:
> >>
> >> There's something a bit odd here.
> >>
> >> ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID and
> >> ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID look like they
> should be
> >> names of separate (top-level) functions, but actually the _PHY_ one
> >> is a parameter for the first. If the intention is to have a parameter
> >> then it would be better to pick a better name for the _PHY_ define
> >> and not define it using ARM_SMCCC_CALL_VAL.
> >>
> > Yeah, _PHY_ is not the same meaning with _PTP_FUNC_ID, so I think it
> should be a different name.
> > What about ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_COUNTER?
>
> Personally I'd go with something much shorter, e.g.
> ARM_PTP_PHY_COUNTER.
> This is just an argument to an SMCCC call so there's no need for most of the
> prefix, indeed if (for whatever reason) there was a non-SMCCC mechanism
> added to do the same thing it would be reasonable to reuse the same values.
>
Ok , this shorter name is better.
> >> Second the use of "default:" means that there's no possibility to
> >> later extend this interface for more clocks if needed in the future.
> >>
> > I think we can add more clocks by adding more cases, this "default" means
> we can use no first arg to determine the default clock.
>
> The problem with the 'default' is it means it's not possible to probe whether
> the kernel supports any more clocks. If we used a different value (that the
> kernel doesn't support) then we end up in the default case and have no idea
> whether the clock value is the one we requested or not.
>
Yeah, it's more meaningful. Should return the exact value back to user.
> It's generally better when defining an ABI to explicitly return an error for
> unknown parameters, that way a future user of the ABI can discover
> whether the call did what was expected or not.
>
ok. I will fix it.
> >> Alternatively you could indeed implement as two top-level functions
> >> and change this to a...
> >>
> >> switch (func_id)
> >>
> >> ... along with multiple case labels as the functions would obviously
> >> be mostly the same.
> >>
> >> Also a minor style issue - you might want to consider splitting this
> >> into it's own function.
> >>
> > I think "switch (feature)" maybe better as this _PHY_ is not like a function
> id. Just like:
> > "
> > case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
> > feature = smccc_get_arg1(vcpu);
> > switch (feature) {
> > case ARM_SMCCC_ARCH_WORKAROUND_1:
> > ...
> > "
>
> I'm happy either way - it's purely that the definition/naming of
> ARM_SMCCC_VENDOR_HYP_KVM_PTP_PHY_FUNC_ID made it look like that
> was the intention. My preference would be to stick with the 'feature'
> approach as above because there's no need to "use up" the top-level SMCCC
> calls (but equally there's a large space so we'd have to work very hard to run
> out... ;) )
>
We can change the name of "_PHY_COUNTER", but it will remain in the same level with "_FUNC_ID" as
It will still occupy a place reserved for VENDOR SMCCC call.
Just like ARM_SMCCC_ARCH_WORKAROUND_1,
#define ARM_SMCCC_ARCH_WORKAROUND_1 \
ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
ARM_SMCCC_SMC_32, \
0, 0x8000)
It will be a ARCH SMCCC call id from the view of its definition.
> >> Finally I do think it would be useful to add some documentation of
> >> the new SMC calls. It would be easier to review the interface based
> >> on that documentation rather than trying to reverse-engineer the
> >> interface from the code.
> >>
> > Yeah, more doc needed here.
>
> Thanks, I think it's a good idea to get the ABI nailed down before worrying
> too much about the code, and it's easier to discuss based on documentation
> rather than code.
>
Yeah, a document here is in favor of code review.
Thanks
Jianyong
> Thanks,
>
> Steve
Jianyong Wu <[email protected]> writes:
> From: Thomas Gleixner <[email protected]>
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 7cb09c4cf21c..a8f65b3e4ec8 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -928,6 +928,9 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
>
> clocksource_arch_init(cs);
>
> +if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX))
> + cs->id = CSID_GENERIC;
> +
This is white space damaged and certainly not from me.
Hi Thomas,
> -----Original Message-----
> From: Thomas Gleixner <[email protected]>
> Sent: Friday, May 29, 2020 12:36 AM
> To: Jianyong Wu <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; Mark Rutland <[email protected]>;
> [email protected]; Suzuki Poulose <[email protected]>; Steven Price
> <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Steve Capper
> <[email protected]>; Kaly Xin <[email protected]>; Justin He
> <[email protected]>; Wei Chen <[email protected]>; Jianyong Wu
> <[email protected]>; nd <[email protected]>
> Subject: Re: [RFC PATCH v12 05/11] time: Add mechanism to recognize
> clocksource in time_get_snapshot
>
> Jianyong Wu <[email protected]> writes:
> > From: Thomas Gleixner <[email protected]> diff --git
> > a/kernel/time/clocksource.c b/kernel/time/clocksource.c index
> > 7cb09c4cf21c..a8f65b3e4ec8 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -928,6 +928,9 @@ int __clocksource_register_scale(struct
> > clocksource *cs, u32 scale, u32 freq)
> >
> > clocksource_arch_init(cs);
> >
> > +if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX))
> > + cs->id = CSID_GENERIC;
> > +
>
> This is white space damaged and certainly not from me.
Sorry, I will fix it.
Thanks
Jianyong