Received: by 2002:a25:b323:0:0:0:0:0 with SMTP id l35csp1763185ybj; Fri, 20 Sep 2019 16:22:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqyGMxQFGK4aA8xUY4nmR++k6DY8N23+RM1kY/BWyQA4g1Ax0PDGhWr+mZbp5Q7QTFrw+9y/ X-Received: by 2002:a17:906:454c:: with SMTP id s12mr6148330ejq.69.1569021758647; Fri, 20 Sep 2019 16:22:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569021758; cv=none; d=google.com; s=arc-20160816; b=FXlfMqzaFXd6Rj4A6jGSHvxmI/tY8QrbW4iahhy2Z0rD63HvoDv19WE1BicH0iQyvB wEb7NmN2MhyFT3vvX2AkMCMn6dxjZPoioBvjzLBRBRQ3rmpJk8BOhqL6k0UqwdXdD1Yq 0Fupk7+lv2FAWX7EnVu26gXf+FCL8MLGuYzgZB43yneVoj/OXWHDGcgW68cQrzyw8qQe d4UJ7rhwlNKXlPMsJacVoAcGwcL0cEL40uRy578j1iywA3old59SVJWAo3qH8s68D3Pv FjWAdUWx2afC/D0mPv1QMGuR9BYmfZDzK3ViXYpC25sbz8DewHdsAJgekZ4DcEBEeywt EHaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:dkim-signature; bh=tCwobJ4sibSbRL27xRdckc8ojKP82LRYHZd+OSAZkFc=; b=QW0Gq9K7T0R6U4zhK2Voq4vuYDXjHXOT7VsT7SbDUm0odfnnl9Yc7eAGSpEmfObDd0 +73QMgd81hh6F4y7YEHuoGG4tct3jX+9x+bC3634kqC/BLU70oX3VLHAUvWjaFDxSgI0 HMORauUxmuhSXkMmrkUYbyvneIr1kSHLDjPxyQxTPScFvQvIYdu0jKdtMGPyt7PbtAS+ GnXGjwn2kZLVnsD7DLdaS7OGNKOAeFzeJg49Xc5MHdYiikNX+bg6VMMFCq3puL6EYZm5 gaj5hXc42halAzAy10IwL/73WNOsxIInoDUOHOx5IOmeOi0cLelXzZOaBtfGWVuEUhaB IE0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=NwItX0CD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f15si2302981edm.414.2019.09.20.16.22.02; Fri, 20 Sep 2019 16:22:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=NwItX0CD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394867AbfITG1e (ORCPT + 99 others); Fri, 20 Sep 2019 02:27:34 -0400 Received: from mail-ua1-f73.google.com ([209.85.222.73]:37202 "EHLO mail-ua1-f73.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2394859AbfITG1e (ORCPT ); Fri, 20 Sep 2019 02:27:34 -0400 Received: by mail-ua1-f73.google.com with SMTP id h15so1228069uan.4 for ; Thu, 19 Sep 2019 23:27:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=tCwobJ4sibSbRL27xRdckc8ojKP82LRYHZd+OSAZkFc=; b=NwItX0CDO8BWB79lwMnI/3DOOTIfkoSTBe9lNRVdjukKaRivMeIvDHkWK6yp5k8u63 hEB341B4VTKGuk0Lvr27YGvMrtomaF+Hyrf1GkLEAbTbLOqrmZJ2DoKmskO2dalXC4LO VkvtJWhOvXAdrCPnilFMiGLU5d8z2c/EzKx73QIgZIZCj6ho3L2rns7kqDWcEOpZJ0S3 I2rlk0NbDpwaXmpFVRfwpgPU9+6z/1Ikt+w1M0dnORZdINjQ7hJeksL1pHqC4JTK+93p MV6qrub4k3YpL665sq4WiT9lRI+/WzmAJNb0yculC6qO9xyPgphDfaa9D0uXUvuABYSU hb8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=tCwobJ4sibSbRL27xRdckc8ojKP82LRYHZd+OSAZkFc=; b=L2He4ZEtWbfu+YCfoGj9cmoC6KLLlhUBjC5JuCpDxITUrdzmyjhMx14D8UtBv67Dis FgHUt/nv1dDdP7UE7pQyK9/8hYMfjj1jOci8c+/YSd+KFlm+bD1xeJzqF8D2t0Kj68Iv A54GdYq6HAhfFgIRLYkfgkhbRkHsfuCNvF/MWTnkfKhk7Ix1l00R9I36QfG+k/lJ8uEz ht/kObq3USQ3GhIalwIH8c/z2yp7vtGLOiETumtjtvGqIl9wjnxGHV8nz/ulJHg4ZrhH eLkJNOhUtPfnEAgDWNUQBelYIOfM1pKhm6WqQX69nFMo9BS2bnSG64XcbqQpeBGGQGAO dY1A== X-Gm-Message-State: APjAAAXqF10PfhBlD567FKwx4s5mLYJCKvMqKiGN2bAxLPeQuzuddxuq ybnq/M3ysVfmc3CrcAw2C5L1qJ8jJqhyrA== X-Received: by 2002:ab0:5ac6:: with SMTP id x6mr8614021uae.7.1568960852015; Thu, 19 Sep 2019 23:27:32 -0700 (PDT) Date: Fri, 20 Sep 2019 15:27:12 +0900 In-Reply-To: <20190920062713.78503-1-suleiman@google.com> Message-Id: <20190920062713.78503-2-suleiman@google.com> Mime-Version: 1.0 References: <20190920062713.78503-1-suleiman@google.com> X-Mailer: git-send-email 2.23.0.351.gc4317032e6-goog Subject: [RFC 1/2] kvm: Mechanism to copy host timekeeping parameters into guest. From: Suleiman Souhlal To: pbonzini@redhat.com, rkrcmar@redhat.com, tglx@linutronix.de Cc: john.stultz@linaro.org, sboyd@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Suleiman Souhlal Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is used to synchronize time between host and guest. The guest can request the (guest) physical address it wants the data in through the MSR_KVM_TIMEKEEPER_EN MSR. We maintain a shadow copy of the timekeeper that gets updated whenever the timekeeper gets updated, and then copied into the guest. It currently assumes the host timekeeper is "tsc". Signed-off-by: Suleiman Souhlal --- arch/x86/include/asm/kvm_host.h | 3 + arch/x86/include/asm/pvclock-abi.h | 27 ++++++ arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/x86.c | 121 +++++++++++++++++++++++++++ 4 files changed, 152 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index bdc16b0aa7c6..b1b4c3a80b8d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -666,7 +666,10 @@ struct kvm_vcpu_arch { struct pvclock_vcpu_time_info hv_clock; unsigned int hw_tsc_khz; struct gfn_to_hva_cache pv_time; + struct gfn_to_hva_cache pv_timekeeper_g2h; + struct pvclock_timekeeper pv_timekeeper; bool pv_time_enabled; + bool pv_timekeeper_enabled; /* set guest stopped flag in pvclock flags field */ bool pvclock_set_guest_stopped_request; diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index 1436226efe3e..2809008b9b26 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -40,6 +40,33 @@ struct pvclock_wall_clock { u32 nsec; } __attribute__((__packed__)); +struct pvclock_read_base { + u64 mask; + u64 cycle_last; + u32 mult; + u32 shift; + u64 xtime_nsec; + u64 base; +} __attribute__((__packed__)); + +struct pvclock_timekeeper { + u64 gen; + u64 flags; + struct pvclock_read_base tkr_mono; + struct pvclock_read_base tkr_raw; + u64 xtime_sec; + u64 ktime_sec; + u64 wall_to_monotonic_sec; + u64 wall_to_monotonic_nsec; + u64 offs_real; + u64 offs_boot; + u64 offs_tai; + u64 raw_sec; + u64 tsc_offset; +} __attribute__((__packed__)); + +#define PVCLOCK_TIMEKEEPER_ENABLED (1 << 0) + #define PVCLOCK_TSC_STABLE_BIT (1 << 0) #define PVCLOCK_GUEST_STOPPED (1 << 1) /* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */ diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 2a8e0b6b9805..3ebb1d87db3a 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -50,6 +50,7 @@ #define MSR_KVM_STEAL_TIME 0x4b564d03 #define MSR_KVM_PV_EOI_EN 0x4b564d04 #define MSR_KVM_POLL_CONTROL 0x4b564d05 +#define MSR_KVM_TIMEKEEPER_EN 0x4b564d06 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 91602d310a3f..06a940a74005 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -157,6 +157,8 @@ module_param(force_emulation_prefix, bool, S_IRUGO); int __read_mostly pi_inject_timer = -1; module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR); +static atomic_t pv_timekeepers_nr; + #define KVM_NR_SHARED_MSRS 16 struct kvm_shared_msrs_global { @@ -2621,6 +2623,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) break; } + case MSR_KVM_TIMEKEEPER_EN: + if (kvm_gfn_to_hva_cache_init(vcpu->kvm, + &vcpu->arch.pv_timekeeper_g2h, data, + sizeof(struct pvclock_timekeeper))) + vcpu->arch.pv_timekeeper_enabled = false; + else { + vcpu->arch.pv_timekeeper_enabled = true; + atomic_inc(&pv_timekeepers_nr); + } + break; case MSR_KVM_ASYNC_PF_EN: if (kvm_pv_enable_async_pf(vcpu, data)) return 1; @@ -6965,6 +6977,109 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = { .handle_intel_pt_intr = kvm_handle_intel_pt_intr, }; +static DEFINE_SPINLOCK(shadow_pvtk_lock); +static struct pvclock_timekeeper shadow_pvtk; + +static void +pvclock_copy_read_base(struct pvclock_read_base *pvtkr, + struct tk_read_base *tkr) +{ + pvtkr->cycle_last = tkr->cycle_last; + pvtkr->mult = tkr->mult; + pvtkr->shift = tkr->shift; + pvtkr->mask = tkr->mask; + pvtkr->xtime_nsec = tkr->xtime_nsec; + pvtkr->base = tkr->base; +} + +static void +kvm_copy_into_pvtk(struct kvm_vcpu *vcpu) +{ + struct pvclock_timekeeper *pvtk; + unsigned long flags; + + if (!vcpu->arch.pv_timekeeper_enabled) + return; + + pvtk = &vcpu->arch.pv_timekeeper; + if (pvclock_gtod_data.clock.vclock_mode == VCLOCK_TSC) { + pvtk->flags |= PVCLOCK_TIMEKEEPER_ENABLED; + spin_lock_irqsave(&shadow_pvtk_lock, flags); + pvtk->tkr_mono = shadow_pvtk.tkr_mono; + pvtk->tkr_raw = shadow_pvtk.tkr_raw; + + pvtk->xtime_sec = shadow_pvtk.xtime_sec; + pvtk->ktime_sec = shadow_pvtk.ktime_sec; + pvtk->wall_to_monotonic_sec = + shadow_pvtk.wall_to_monotonic_sec; + pvtk->wall_to_monotonic_nsec = + shadow_pvtk.wall_to_monotonic_nsec; + pvtk->offs_real = shadow_pvtk.offs_real; + pvtk->offs_boot = shadow_pvtk.offs_boot; + pvtk->offs_tai = shadow_pvtk.offs_tai; + pvtk->raw_sec = shadow_pvtk.raw_sec; + spin_unlock_irqrestore(&shadow_pvtk_lock, flags); + + pvtk->tsc_offset = kvm_x86_ops->read_l1_tsc_offset(vcpu); + } else + pvtk->flags &= ~PVCLOCK_TIMEKEEPER_ENABLED; + + BUILD_BUG_ON(offsetof(struct pvclock_timekeeper, gen) != 0); + + /* + * Make the gen count odd to indicate we are in the process of + * updating. + */ + vcpu->arch.pv_timekeeper.gen++; + vcpu->arch.pv_timekeeper.gen |= 1; + + /* + * See comment in kvm_guest_time_update() for why we have to do + * multiple writes. + */ + kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h, + &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper.gen)); + + smp_wmb(); + + kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h, + &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper)); + + smp_wmb(); + + vcpu->arch.pv_timekeeper.gen++; + + kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_timekeeper_g2h, + &vcpu->arch.pv_timekeeper, sizeof(vcpu->arch.pv_timekeeper.gen)); +} + +static void +update_shadow_pvtk(struct timekeeper *tk) +{ + struct pvclock_timekeeper *pvtk; + unsigned long flags; + + pvtk = &shadow_pvtk; + + if (atomic_read(&pv_timekeepers_nr) == 0 || + pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC) + return; + + spin_lock_irqsave(&shadow_pvtk_lock, flags); + pvclock_copy_read_base(&pvtk->tkr_mono, &tk->tkr_mono); + pvclock_copy_read_base(&pvtk->tkr_raw, &tk->tkr_raw); + + pvtk->xtime_sec = tk->xtime_sec; + pvtk->ktime_sec = tk->ktime_sec; + pvtk->wall_to_monotonic_sec = tk->wall_to_monotonic.tv_sec; + pvtk->wall_to_monotonic_nsec = tk->wall_to_monotonic.tv_nsec; + pvtk->offs_real = tk->offs_real; + pvtk->offs_boot = tk->offs_boot; + pvtk->offs_tai = tk->offs_tai; + pvtk->raw_sec = tk->raw_sec; + spin_unlock_irqrestore(&shadow_pvtk_lock, flags); +} + #ifdef CONFIG_X86_64 static void pvclock_gtod_update_fn(struct work_struct *work) { @@ -6993,6 +7108,7 @@ static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused, struct timekeeper *tk = priv; update_pvclock_gtod(tk); + update_shadow_pvtk(tk); /* disable master clock if host does not trust, or does not * use, TSC based clocksource. @@ -7809,6 +7925,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) bool req_immediate_exit = false; + kvm_copy_into_pvtk(vcpu); + if (kvm_request_pending(vcpu)) { if (kvm_check_request(KVM_REQ_GET_VMCS12_PAGES, vcpu)) kvm_x86_ops->get_vmcs12_pages(vcpu); @@ -8891,6 +9009,9 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) kvmclock_reset(vcpu); + if (vcpu->arch.pv_timekeeper_enabled) + atomic_dec(&pv_timekeepers_nr); + kvm_x86_ops->vcpu_free(vcpu); free_cpumask_var(wbinvd_dirty_mask); } -- 2.23.0.237.gc6a4ce50a0-goog