Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp21493180rwd; Thu, 29 Jun 2023 17:25:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5VcxDKw2UB0Ae8gvd5YBYvUiGhrwYfouOarCTE1ZxvokLU/ZyH6/5/5qTk34Q2y2g9gEJv X-Received: by 2002:a54:4199:0:b0:3a2:572a:eaa6 with SMTP id 25-20020a544199000000b003a2572aeaa6mr969209oiy.5.1688084750735; Thu, 29 Jun 2023 17:25:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688084750; cv=none; d=google.com; s=arc-20160816; b=hfMA4bojFJ/jtONfUodv09Wkg3AbbPpH5MIPtlxFIDLsGOkjfvINKn/nv+s/iMbPm3 1SPOIFhlHjEJqO1dQwz51ODobDoGInKsFnv7CR3G7DEA91IQCaM0/4UPVLT/BqqSGV6Y yQ3e8az2RiBzUY1m5v9UF6i5MUGtpSiIkAkcqWm8zM9vvGxuBPx7wjVjc5bhqNNiOXux B6wY1bqbdcIKBgXpeonympPOjEvDrzfB3gqPYjWU8j7EC8Ta/RspQ6+RQCcXNlnZ2Fuk Q/MiPKtPZDsf78+fr4NxbWWV/1LMTHv+Ulvs50sxiRKRR1GGw5RtKRAFLtJXjVqQ9Pr2 a0Sg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:references:mime-version:in-reply-to:date:dkim-signature; bh=CLAK2aaJMqt8JMt9rFyLuXa7SLZsuXSqltLsJC9Ut/I=; fh=2yF6HM7b0vYMAO3b42DDz8o8t9Rf0Sr5V1eHsI3U31U=; b=S/G2jajIhBeS8QCoPWEx2kMqB2+koynzl0Ghmi2mIB5BVM6qxTs+4rHUl5su5bwBwH qzKk8IHfnhWgdf1S/nGWvV0JrYdBTXJX/3hbJOrklMXTCQ9gnrqAhy2mXnxadK+L4Urn e22j597IqdOn9GG/wIfhVOPKP3GNZIymXxnk+r6keljdcwfZXHJLCnk1uQwUzWiATbQy SBdXMoaIXnQZWembbfK1L0toLeN74bwatFeU4whzX2LhFVkfIwQsAMucngq13aojNSkC YqsJJGOi1L3hEHXKnRjkcN4yxO+mf3uQtV9Vi3+Ff6fJmhvpFBfO+9hhY1tSRIquuEuK xAew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=pafJgNqS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 23-20020a17090a01d700b00244930ffe45si11602036pjd.0.2023.06.29.17.25.32; Thu, 29 Jun 2023 17:25:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=pafJgNqS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232079AbjF3ALL (ORCPT + 99 others); Thu, 29 Jun 2023 20:11:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231638AbjF3ALJ (ORCPT ); Thu, 29 Jun 2023 20:11:09 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6338FE4B for ; Thu, 29 Jun 2023 17:11:08 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id 98e67ed59e1d1-262e2cb725eso885959a91.0 for ; Thu, 29 Jun 2023 17:11:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688083868; x=1690675868; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=CLAK2aaJMqt8JMt9rFyLuXa7SLZsuXSqltLsJC9Ut/I=; b=pafJgNqSlBfLIW/qP07meHoC4gZ7MF1KwU9BqieXtUH4Vh0Wj5OFtwZXoEUjt7NtI6 oQ0Cy7ROucQoeyCtenftvliy0LnSZMFN18ofzMCCKELi9585Vwr8ffJoZgA6qx4jf2g2 7RNcuQusOSh/RXk2d0VvBk5eKleqddXMRMF/FK8ZgPxJKPzl9Nxwp28xuuk2J9MvNDQr 9RPO5JquPNQ3XNQx/n3Fv2l+LK9Q5wv6Stxv2+davIiFfFzcE4vuYb6226s76NERC8lv sltZCMLVNWZFEjFNz+r+sqwzIFWy7tC7MsRRYJsORDr/7ZYKP5+tnqldgLzK5l0ElgN3 PVCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688083868; x=1690675868; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=CLAK2aaJMqt8JMt9rFyLuXa7SLZsuXSqltLsJC9Ut/I=; b=NVIB8ARwWQjo1LImOgOr1pD7KaPLzS2Qwk90G3sr1vkRYP2APjxuUuOGHVRnw4XQgn 860s05W1Gqk0U8QK/Rlaj8EgNDxzTn3XDi3xwM7Ie02IHIzOuuJQeDtXlm1CcXQvcdW2 T52x2uVR/89h/rZPW5Q4QXxhOXWILRs8bz1UG/NfG84UmDXPyK27einySc2nuHBDE7Km l9RcvHYVDjtKpiTpOOaADKZ4vvrZbbJrXvdYzYsIZTV2gs8bQ2r5Rv/RfqDrNoA5EkND N+h1XU581FcAieXh5q2cTvpPWfjb/59eWGwpmhSZkKhceEhSFRzZGUU55sDu/cajtSPu Pc7g== X-Gm-Message-State: ABy/qLaPYrZsHcMIOSriuxXdbOcv/MEInnbB6M7vV00pQ6VGH6Fqis+r Va9CwyWt8onfg4vBMY9M0JZFpwoRJIg= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:d82:b0:25b:809a:c7a with SMTP id bg2-20020a17090b0d8200b0025b809a0c7amr416383pjb.3.1688083867787; Thu, 29 Jun 2023 17:11:07 -0700 (PDT) Date: Thu, 29 Jun 2023 17:11:06 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230504120042.785651-1-rkagan@amazon.de> Message-ID: Subject: Re: [PATCH] KVM: x86: vPMU: truncate counter value to allowed width From: Sean Christopherson To: Jim Mattson Cc: Paolo Bonzini , Eric Hankland , Roman Kagan , kvm@vger.kernel.org, Dave Hansen , Like Xu , x86@kernel.org, Thomas Gleixner , linux-kernel@vger.kernel.org, "H. Peter Anvin" , Borislav Petkov , Ingo Molnar , Mingwei Zhang Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +Mingwei On Thu, Jun 29, 2023, Jim Mattson wrote: > On Mon, Jun 5, 2023 at 5:51=E2=80=AFPM Sean Christopherson wrote: > > > > On Thu, May 04, 2023, Roman Kagan wrote: > > > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h > > > index 5c7bbf03b599..6a91e1afef5a 100644 > > > --- a/arch/x86/kvm/pmu.h > > > +++ b/arch/x86/kvm/pmu.h > > > @@ -60,6 +60,12 @@ static inline u64 pmc_read_counter(struct kvm_pmc = *pmc) > > > return counter & pmc_bitmask(pmc); > > > } > > > > > > +static inline void pmc_set_counter(struct kvm_pmc *pmc, u64 val) > > > +{ > > > + pmc->counter +=3D val - pmc_read_counter(pmc); > > > > Ugh, not your code, but I don't see how the current code can possibly b= e correct. > > > > The above unpacks to > > > > counter =3D pmc->counter; > > if (pmc->perf_event && !pmc->is_paused) > > counter +=3D perf_event_read_value(pmc->perf_event, > > &enabled, &running); > > pmc->counter +=3D val - (counter & pmc_bitmask(pmc)); > > > > which distills down to > > > > counter =3D 0; > > if (pmc->perf_event && !pmc->is_paused) > > counter +=3D perf_event_read_value(pmc->perf_event, > > &enabled, &running); > > pmc->counter =3D val - (counter & pmc_bitmask(pmc)); > > > > or more succinctly > > > > if (pmc->perf_event && !pmc->is_paused) > > val -=3D perf_event_read_value(pmc->perf_event, &enable= d, &running); > > > > pmc->counter =3D val; > > > > which is obviously wrong. E.g. if the guest writes '0' to an active co= unter, the > > adjustment will cause pmc->counter to be loaded with a large (in unsign= ed terms) > > value, and thus quickly overflow after a write of '0'. >=20 > This weird construct goes all the way back to commit f5132b01386b > ("KVM: Expose a version 2 architectural PMU to a guests"). Paolo > killed it in commit 2924b52117b2 ("KVM: x86/pmu: do not mask the value > that is written to fixed PMUs"), perhaps by accident. Eric then > resurrected it in commit 4400cf546b4b ("KVM: x86: Fix perfctr WRMSR > for running counters"). >=20 > It makes no sense to me. WRMSR should just set the new value of the > counter, regardless of the old value or whether or not it is running. Heh, didn't stop you from giving Eric's patch a thumbs-up[*] :-) Thanks to Eric's testcase [Wow, tests do help! We should try writing more = of them!], I finally figured out what's going on. I wrongly assumed perf_event_read_v= alue() is destructive, but it's not, it just reads the current value. So on a WRM= SR, KVM offsets the value with the current perf event, and then *mostly* adjust= s for it when reading the counter. But that is obviously super fragile because it means pmc->counter must neve= r be read directly unless the perf event is paused and the accumulated counter h= as been propagated to pmc->counter. Blech. I fiddled with a variety of things, but AFAICT the easiest solution is also= the most obviously correct: set perf's count to the guest's count. Lightly tes= ted patch below. On a related topic, Mingwei also appears to have found another bug: prev_co= unter needs to be set when the counter is written, i.e. my proposed pmc_write_cou= nter() also needs to update prev_counter. Though that also raises the question of whether or not zeroing prev_counter= in reprogram_counter() is correct. Unless I'm missing something, reprogram_co= unter() should also set pmc->prev_counter to pmc->counter when the counter is succe= ssfully (re)enabled. And Jim also pointed out that prev_counter needs to be set even when KVM fa= ils to enable a perf event (software counting should still work). [*] https://lore.kernel.org/all/CALMp9eRfeFFb6n22Uf4R2Pf8WW7BVLX_Vuf04WFwiM= trk14Y-Q@mail.gmail.com --- arch/x86/kvm/pmu.h | 8 ++++++++ arch/x86/kvm/svm/pmu.c | 2 +- arch/x86/kvm/vmx/pmu_intel.c | 4 ++-- include/linux/perf_event.h | 2 ++ kernel/events/core.c | 11 +++++++++++ 5 files changed, 24 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index 7d9ba301c090..ba91a78e4dc1 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -74,6 +74,14 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc) return counter & pmc_bitmask(pmc); } =20 +static inline void pmc_write_counter(struct kvm_pmc *pmc, u64 val) +{ + if (pmc->perf_event && !pmc->is_paused) + perf_event_set_count(pmc->perf_event, val); + + pmc->counter =3D val; +} + static inline void pmc_release_perf_event(struct kvm_pmc *pmc) { if (pmc->perf_event) { diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index cef5a3d0abd0..373ff6a6687b 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -160,7 +160,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struc= t msr_data *msr_info) /* MSR_PERFCTRn */ pmc =3D get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER); if (pmc) { - pmc->counter +=3D data - pmc_read_counter(pmc); + pmc_write_counter(pmc, data); pmc_update_sample_period(pmc); return 0; } diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 80c769c58a87..18a658aa2a8d 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -406,11 +406,11 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, s= truct msr_data *msr_info) if (!msr_info->host_initiated && !(msr & MSR_PMC_FULL_WIDTH_BIT)) data =3D (s64)(s32)data; - pmc->counter +=3D data - pmc_read_counter(pmc); + pmc_write_counter(pmc, data); pmc_update_sample_period(pmc); break; } else if ((pmc =3D get_fixed_pmc(pmu, msr))) { - pmc->counter +=3D data - pmc_read_counter(pmc); + pmc_write_counter(pmc, data); pmc_update_sample_period(pmc); break; } else if ((pmc =3D get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) { diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index d5628a7b5eaa..8fcd52a87ba2 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1677,6 +1677,7 @@ extern void perf_event_disable_inatomic(struct perf_e= vent *event); extern void perf_event_task_tick(void); extern int perf_event_account_interrupt(struct perf_event *event); extern int perf_event_period(struct perf_event *event, u64 value); +extern void perf_event_set_count(struct perf_event *event, u64 count); extern u64 perf_event_pause(struct perf_event *event, bool reset); #else /* !CONFIG_PERF_EVENTS: */ static inline void * @@ -1760,6 +1761,7 @@ static inline int perf_event_period(struct perf_event= *event, u64 value) { return -EINVAL; } +static inline perf_event_set_count(struct perf_event *event, u64 count) { = } static inline u64 perf_event_pause(struct perf_event *event, bool reset) { return 0; diff --git a/kernel/events/core.c b/kernel/events/core.c index db016e418931..d368c283eba5 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5646,6 +5646,17 @@ static void _perf_event_reset(struct perf_event *eve= nt) perf_event_update_userpage(event); } =20 +void perf_event_set_count(struct perf_event *event, u64 count) +{ + struct perf_event_context *ctx; + + ctx =3D perf_event_ctx_lock(event); + (void)perf_event_read(event, false); + local64_set(&event->count, count); + perf_event_ctx_unlock(event, ctx); +} +EXPORT_SYMBOL_GPL(perf_event_set_count); + /* Assume it's not an event with inherit set. */ u64 perf_event_pause(struct perf_event *event, bool reset) { base-commit: 5ae85a1bd17b959796f6cc4c1153ceada2cf8f24 --=20