Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp22634464rwd; Fri, 30 Jun 2023 10:25:14 -0700 (PDT) X-Google-Smtp-Source: APBJJlGKvG5jH0KUTGT7q4+tRS2z67AzZkFYeyeI9hHYh47Jt5MTjO0kTEf06GusMW9q0tR4c6v8 X-Received: by 2002:a17:90a:f507:b0:263:1526:2fb9 with SMTP id cs7-20020a17090af50700b0026315262fb9mr2392325pjb.36.1688145913789; Fri, 30 Jun 2023 10:25:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688145913; cv=none; d=google.com; s=arc-20160816; b=bMSl56GWJlxozywd9419WKhFDShhT6N9vXaABIwvvrkTzZG7m3/V1cLl1fdi6RC3ug wnwbjZDF8OYUf73382B/TPX4telpEjgAGo0mfhyWUmKJ0k4FCAnpZBC3Q3WwkFeACkMF LRjvls2vTSgYore8JNYYfv6M/bkG7NoOnsHJPkoaZip1B33e3F0QKiDNiWsnP9zGBXEi n4tCjeYKaTdYdf45eawhJ7hk6dFtoOLjAhcvxLeXII8Tk8V8BpJz5G54G75h1XreK3q0 RlinwWZtXUpCuIg4HHXRmTzZlyKlaArvEWpfXjDR9g5ENaNMaIn7U8PFoIhyor8idlO7 C84Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=KZYjbsbti+rVflhaD8niD3Jax2FmXNNOo5d9fcG6uCU=; fh=ZdZw+XPpf01a89u2ibjvzGmZXVoh0FGZq0s85D9IohI=; b=p/eHLMgcFZlOc0VBytPh6LD2t8Q2FPRANAxTDrjJfugOLGa+B5IKYAT1WBcayzbFk/ OQg4vAI7rCEAgWJ48cl6HjmR+K8CKG92sxOFGl4lPn+fdhVtXcK/vKdX4i2p58wfloXH mT/ch9/l2bkkC6FTjXcOdI6Q1CQsrx3b9NpS8L7DrscWVfQJ9vQgpcoT/yNMLiwFWP04 jns67aeLGnp1fJ44f4pLw32vNQqaIYdv4DYt5Xdzr9JZaUo7wsLfcg+80Hd6htOpc0TF 8qb6BSOa7f36XvqftkSV8A38L0lTH1kDrpG1LC2kf5uHpJKNZePy7YBK1DMmqGlCeG47 yBjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=a8rX5Jg4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id pc4-20020a17090b3b8400b0025bcbba10c6si14278625pjb.85.2023.06.30.10.24.55; Fri, 30 Jun 2023 10:25:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=a8rX5Jg4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232881AbjF3RQm (ORCPT + 99 others); Fri, 30 Jun 2023 13:16:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232905AbjF3RQg (ORCPT ); Fri, 30 Jun 2023 13:16:36 -0400 Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com [IPv6:2607:f8b0:4864:20::82a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83D733A93 for ; Fri, 30 Jun 2023 10:16:34 -0700 (PDT) Received: by mail-qt1-x82a.google.com with SMTP id d75a77b69052e-4007b5bafceso12661cf.1 for ; Fri, 30 Jun 2023 10:16:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688145393; x=1690737393; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KZYjbsbti+rVflhaD8niD3Jax2FmXNNOo5d9fcG6uCU=; b=a8rX5Jg4cK6o2EwrKI9ZMYQ7nGXlRnt1EjGarsxRoCGSTfKW8GbuPuQV+FLsgSexsW ncS83gvPK9qbbFpFqcgFcyW7kUz/wXl5tdIpJvYhXTnpW6MrMGL1OptIn8jvYDALkeBY Pu6MLXnYIeTLGlBF/b/iZ9D9+VCzpYRI0sfpWuxwDiFfmx+cnpD+zLXeL+2gvCNwF8hy KEPbVpMdBlkHzEHXWs2jLwt0j+OQnl0Wx48yzFMytvxBh8Zi4LSCk6mg6sptCHk4JjRl G1VLJS6IbIxTK66w3ojnO0M2TKTa6XQNBicWbtDlrB6pGZOQAlsmQ5xFrJKkhCvxm9G/ YClQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688145393; x=1690737393; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KZYjbsbti+rVflhaD8niD3Jax2FmXNNOo5d9fcG6uCU=; b=EFlg8LubVlu0RdBO0y4s+DoAi/2TE5iPxnfm4ggDjmy+LA4deVEu2d1amvUchVDtQX sr4JT+ucWv260DT3MCwdYVdTIG35/P5X1qLTNiyJRN5GSX4hEy0jy+XB05IBvOnJcxP8 IZp9AL6XY5sM9Mo/bJBC371h/m+XIIrWsCE3Bf18PbQvhnrC/NhlADZlw70NnZxlondN apODHPYO6hQVgEfE1OZ7VsUbJjRQzZbn7s2e/HFt1RYjzBDXNiVSKaqc12gb+xHbJ1fe KQVkLq/EOmxhWELiN+Hdp8gtW37bLvLSO4t5pNZK0zWEUoTLNXwrcgbzbzPDogWA8I+F oQGQ== X-Gm-Message-State: AC+VfDz9Nuw7Ri3rMv8sGPZY/igW8u7mYEcTdnlM8juX1VUvjpF1++Qr a6KqCV2kqw3cJzsIZ56LnQNMAWkokzfTA8ynsJEC9g== X-Received: by 2002:ac8:5882:0:b0:3ef:4319:c6c5 with SMTP id t2-20020ac85882000000b003ef4319c6c5mr669071qta.19.1688145393483; Fri, 30 Jun 2023 10:16:33 -0700 (PDT) MIME-Version: 1.0 References: <20230504120042.785651-1-rkagan@amazon.de> In-Reply-To: From: Jim Mattson Date: Fri, 30 Jun 2023 10:16:22 -0700 Message-ID: Subject: Re: [PATCH] KVM: x86: vPMU: truncate counter value to allowed width To: Mingwei Zhang Cc: Sean Christopherson , Roman Kagan , Paolo Bonzini , Eric Hankland , kvm@vger.kernel.org, Dave Hansen , Like Xu , x86@kernel.org, Thomas Gleixner , linux-kernel@vger.kernel.org, "H. Peter Anvin" , Borislav Petkov , Ingo Molnar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 30, 2023 at 10:08=E2=80=AFAM Mingwei Zhang = wrote: > > On Fri, Jun 30, 2023 at 8:45=E2=80=AFAM Sean Christopherson wrote: > > > > On Fri, Jun 30, 2023, Roman Kagan wrote: > > > On Fri, Jun 30, 2023 at 07:28:29AM -0700, Sean Christopherson wrote: > > > > On Fri, Jun 30, 2023, Roman Kagan wrote: > > > > > On Thu, Jun 29, 2023 at 05:11:06PM -0700, Sean Christopherson wro= te: > > > > > > @@ -74,6 +74,14 @@ static inline u64 pmc_read_counter(struct kv= m_pmc *pmc) > > > > > > return counter & pmc_bitmask(pmc); > > > > > > } > > > > > > > > > > > > +static inline void pmc_write_counter(struct kvm_pmc *pmc, u64 = val) > > > > > > +{ > > > > > > + if (pmc->perf_event && !pmc->is_paused) > > > > > > + perf_event_set_count(pmc->perf_event, val); > > > > > > + > > > > > > + pmc->counter =3D val; > > > > > > > > > > Doesn't this still have the original problem of storing wider val= ue than > > > > > allowed? > > > > > > > > Yes, this was just to fix the counter offset weirdness. My plan is= to apply your > > > > patch on top. Sorry for not making that clear. > > > > > > Ah, got it, thanks! > > > > > > Also I'm now chasing a problem that we occasionally see > > > > > > [3939579.462832] Uhhuh. NMI received for unknown reason 30 on CPU 43. > > > [3939579.462836] Do you have a strange power saving mode enabled? > > > [3939579.462836] Dazed and confused, but trying to continue > > > > > > in the guests when perf is used. These messages disappear when > > > 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") is > > > reverted. I haven't yet figured out where exactly the culprit is. > > > > Can you reverting de0f619564f4 ("KVM: x86/pmu: Defer counter emulated o= verflow > > via pmc->prev_counter")? I suspect the problem is the prev_counter mes= s. > > For sure it is prev_counter issue, I have done some instrumentation as fo= llows: > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c > index 48a0528080ab..946663a42326 100644 > --- a/arch/x86/kvm/pmu.c > +++ b/arch/x86/kvm/pmu.c > @@ -322,8 +322,11 @@ static void reprogram_counter(struct kvm_pmc *pmc) > if (!pmc_event_is_allowed(pmc)) > goto reprogram_complete; > > - if (pmc->counter < pmc->prev_counter) > + if (pmc->counter < pmc->prev_counter) { > + pr_info("pmc->counter: %llx\tpmc->prev_counter: %llx\n", > + pmc->counter, pmc->prev_counter); > __kvm_perf_overflow(pmc, false); > + } > > if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL) > printk_once("kvm pmu: pin control bit is ignored\n"); > > I find some interesting changes on prev_counter: > > [ +7.295348] pmc->counter: 12 pmc->prev_counter: fffffffffb3d > [ +0.622991] pmc->counter: 3 pmc->prev_counter: fffffffffb1a > [ +6.943282] pmc->counter: 1 pmc->prev_counter: fffffffff746 > [ +4.483523] pmc->counter: 0 pmc->prev_counter: ffffffffffff > [ +12.817772] pmc->counter: 0 pmc->prev_counter: ffffffffffff > [ +21.721233] pmc->counter: 0 pmc->prev_counter: ffffffffffff > > The first 3 logs will generate this: > > [ +11.811925] Uhhuh. NMI received for unknown reason 20 on CPU 2. > [ +0.000003] Dazed and confused, but trying to continue > > While the last 3 logs won't. This is quite reasonable as looking into > de0f619564f4 ("KVM: x86/pmu: Defer counter emulated overflow via > pmc->prev_counter"), counter and prev_counter should only have 1 diff > in value. prev_counter isn't actually sync'ed at this point, is it? This comes back to that "setting a running counter" nonsense. We want to add 1 to the current counter, but we don't actually know what the current counter is. My interpretation of the above is that, in the first three cases, PMU hardware has already detected an overflow. In the last three cases, software counting has detected an overflow. If the last three occur while executing the guest's PMI handler (i.e. NMIs are blocked), then this could corroborate my conjecture about IA32_DEBUGCTL.Freeze_PerfMon_On_PMI. > So, the reasonable suspect should be the stale prev_counter. There > might be several potential reasons behind this. Jim's theory is the > highly reasonable one as I did another experiment and found that KVM > may leave pmu->global_status as '0' when injecting an NMI.