Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp237472rwb; Wed, 9 Nov 2022 01:34:22 -0800 (PST) X-Google-Smtp-Source: AMsMyM5v0oez1yxQS0sLncDOHECi8hIvvRReK2DOfUrAAhFcL0m59yLH8c/Nle7RuDgzDYCTRaKV X-Received: by 2002:a17:902:e84f:b0:187:2127:cc6 with SMTP id t15-20020a170902e84f00b0018721270cc6mr53053370plg.110.1667986461859; Wed, 09 Nov 2022 01:34:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667986461; cv=none; d=google.com; s=arc-20160816; b=U9vooNrNx2X3SF8KrmH8r8AZCMyWSdmG2iTJc/kwj+hlcGLCzpb05p5BD3fGCmnvsa 0eCb2WW/wgGHa3E8w8m0eZRyVZRU2CncQTbtHl4mhU8mAgeVKoNgYRfFqzoBmS7FZISR ScZXfDie/6jp76PBlBvuqR4Vlea2bg0tdEbnMVRsL2ojsme7K/1ltALbYqap9hi19cKw ag83s792IpLNvydbAtVsNDXP/Dw4drD9vF3BdFTMioZ9J0p33osg1W2pUOHA0HFA/7uU VzgiLfVaEKzhiJ8vAfT8pQfgnFO76b/MCcMqpJyyotrbw03evUuk3kg2yK9rRHhRx2v8 jw/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=gfUfiLMSDty5pKmQVoXHf+IXuNDCfniw4Yx5uF+cedc=; b=N61AMpdIe5ysDr4/ArLFbCDUQiX8DVf0DtpXE+6tNKaj45/dWAWfhMPZ02SwUIsYau sTKGcA97zoaz3GizWjzxPXNied1x59C9jlRgzaVnXO1LwYgxUMI3Id5PFUu3ny3nN+mW ZDy6PylYKADq97NJLetxFbSoRGvky9BlZtP+j0dY4TOUIdEb6DsGywSX5UCmmyA+49J2 EqBXd4rEed13DJGY/rRSQ/JdOXr/erCDIWNPdAg+6Ol3VE95Fjrgezde7EEjxIF/3p2l m0jPUKvcfqQs/jC31VcuJ28zO+uUCl2PScOfyBV797cb/MAn3tz82FnmnBBDuWIjCiWY 9BZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=PnI7rvNS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 21-20020a631755000000b0046f56372127si16552727pgx.468.2022.11.09.01.34.08; Wed, 09 Nov 2022 01:34:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=PnI7rvNS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230074AbiKII3K (ORCPT + 93 others); Wed, 9 Nov 2022 03:29:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229875AbiKII2k (ORCPT ); Wed, 9 Nov 2022 03:28:40 -0500 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 007802673; Wed, 9 Nov 2022 00:28:24 -0800 (PST) Received: by mail-pg1-x52b.google.com with SMTP id 6so7475274pgm.6; Wed, 09 Nov 2022 00:28:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gfUfiLMSDty5pKmQVoXHf+IXuNDCfniw4Yx5uF+cedc=; b=PnI7rvNSYoxT0RIx9PjwUuB4E8+I6ctdIzfJW5WPVkYFdo6fNDbFR5+nkOhGiD7tBC paRtJh+4d2zHmWDnSy4Y/XW2fcRvniC+O8yMkHgBHztLiIMpK5I73nBppDoy6LhhEfJd eMJBqwFEmV7OnrE5H9OFycKhGWh7qmVjtnJuA7uAfS+gQTLSjaeRkkq3RCmR2w6ovpWa M4MmVKtUuyl9Tubbd82EaYTkkxk1Fa2dJ1FHt2W3QQA7PNjHIcHBTf0mFOOCO3Duga2x EDGn6XabJSKE00eTmRCBNurRUy8frYI26f1/9ANQjFvp7ffzsenCgI5MysYXZ3hqSvGs 7Lbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gfUfiLMSDty5pKmQVoXHf+IXuNDCfniw4Yx5uF+cedc=; b=yvZ6iAsymmEYPcrzvxdF1rfflRqDDkMdP+KxUq5VN8D7LkjU/l9ei8fTm579ATxeDN Lg80+eys4gfEq3Z7s58BmGEXhumPmNS8h8ONY83NBuuiP7rgFSOtN1KnNtTQMIQdZk6B fpMs3IkTmLQbvZJXchThnKv+kVX8JOQWtQsrYgAn5RSq1niNUXBREETu2HBXaLytiUS+ EvAA9faQFm11uGthDXIx1KlMuAgEKQjBXy3fqQMsMn1W0D9B5NTC3ivffemYjVBWw6lf Uiqh//c2pwQipaj71MqirqJB9ODVTYXnXcjD4cttKsomfa6K4RmveVQPLEFrHKY3rSUH 55BQ== X-Gm-Message-State: ACrzQf26yvuIgazzz/URlnSv9kJo8zZhAYQnaVrLicIsx3hKp7tiUF9k DnmkmXKcMaA52gZrRnLU2cQ= X-Received: by 2002:a63:1145:0:b0:46a:e00c:579c with SMTP id 5-20020a631145000000b0046ae00c579cmr52119177pgr.279.1667982504412; Wed, 09 Nov 2022 00:28:24 -0800 (PST) Received: from localhost.localdomain ([103.7.29.32]) by smtp.gmail.com with ESMTPSA id b14-20020a63d30e000000b00470537b9b0asm6587700pgg.51.2022.11.09.00.28.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Nov 2022 00:28:24 -0800 (PST) From: Like Xu X-Google-Original-From: Like Xu To: Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v3 2/3] KVM: x86/pmu: Add PRIR++ and PDist support for SPR and later models Date: Wed, 9 Nov 2022 16:28:01 +0800 Message-Id: <20221109082802.27543-3-likexu@tencent.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221109082802.27543-1-likexu@tencent.com> References: <20221109082802.27543-1-likexu@tencent.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Like Xu The pebs capability on the SPR is basically the same as Ice Lake Server with the exception of two special facilities that have been enhanced and require special handling. Upon triggering a PEBS assist, there will be a finite delay between the time the counter overflows and when the microcode starts to carry out its data collection obligations. Even if the delay is constant in core clock space, it invariably manifest as variable "skids" in instruction address space. On the Ice Lake Server, the Precise Distribution of Instructions Retire (PDIR) facility mitigates the "skid" problem by providing an early indication of when the counter is about to overflow. On SPR, the PDIR counter available (Fixed 0) is unchanged, but the capability is enhanced to Instruction-Accurate PDIR (PDIR++), where PEBS is taken on the next instruction after the one that caused the overflow. SPR also introduces a new Precise Distribution (PDist) facility only on general programmable counter 0. Per Intel SDM, PDist eliminates any skid or shadowing effects from PEBS. With PDist, the PEBS record will be generated precisely upon completion of the instruction or operation that causes the counter to overflow (there is no "wait for next occurrence" by default). In terms of KVM handling, when guest accesses those special counters, the KVM needs to request the same index counters via the perf_event kernel subsystem to ensure that the guest uses the correct pebs hardware counter (PRIR++ or PDist). This is mainly achieved by adjusting the event precise level to the maximum, where the semantics of this magic number is mainly defined by the internal software context of perf_event and it's also backwards compatible as part of the user space interface. Opportunistically, refine confusing comments on TNT+, as the only ones that currently support pebs_ept are Ice Lake server and SPR (GLC+). Signed-off-by: Like Xu --- arch/x86/kvm/pmu.c | 45 +++++++++++++++++++++++++++++++++------------ 1 file changed, 33 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 935c9d80ab50..3df48fc34e97 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -28,9 +28,18 @@ struct x86_pmu_capability __read_mostly kvm_pmu_cap; EXPORT_SYMBOL_GPL(kvm_pmu_cap); -static const struct x86_cpu_id vmx_icl_pebs_cpu[] = { +/* Precise Distribution of Instructions Retired (PDIR) */ +static const struct x86_cpu_id vmx_pebs_pdir_cpu[] = { X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, NULL), X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, NULL), + /* Instruction-Accurate PDIR (PDIR++) */ + X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL), + {} +}; + +/* Precise Distribution (PDist) */ +static const struct x86_cpu_id vmx_pebs_pdist_cpu[] = { + X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL), {} }; @@ -155,6 +164,28 @@ static void kvm_perf_overflow(struct perf_event *perf_event, kvm_make_request(KVM_REQ_PMU, pmc->vcpu); } +static u64 pmc_get_pebs_precise_level(struct kvm_pmc *pmc) +{ + /* + * For some model specific pebs counters with special capabilities + * (PDIR, PDIR++, PDIST), KVM needs to raise the event precise + * level to the maximum value (currently 3, backwards compatible) + * so that the perf subsystem would assign specific hardware counter + * with that capability for vPMC. + */ + if ((pmc->idx == 0 && x86_match_cpu(vmx_pebs_pdist_cpu)) || + (pmc->idx == 32 && x86_match_cpu(vmx_pebs_pdir_cpu))) + return 3; + + /* + * The non-zero precision level of guest event makes the ordinary + * guest event becomes a guest PEBS event and triggers the host + * PEBS PMI handler to determine whether the PEBS overflow PMI + * comes from the host counters or the guest. + */ + return 1; +} + static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config, bool exclude_user, bool exclude_kernel, bool intr) @@ -186,22 +217,12 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config, } if (pebs) { /* - * The non-zero precision level of guest event makes the ordinary - * guest event becomes a guest PEBS event and triggers the host - * PEBS PMI handler to determine whether the PEBS overflow PMI - * comes from the host counters or the guest. - * * For most PEBS hardware events, the difference in the software * precision levels of guest and host PEBS events will not affect * the accuracy of the PEBS profiling result, because the "event IP" * in the PEBS record is calibrated on the guest side. - * - * On Icelake everything is fine. Other hardware (GLC+, TNT+) that - * could possibly care here is unsupported and needs changes. */ - attr.precise_ip = 1; - if (x86_match_cpu(vmx_icl_pebs_cpu) && pmc->idx == 32) - attr.precise_ip = 3; + attr.precise_ip = pmc_get_pebs_precise_level(pmc); } event = perf_event_create_kernel_counter(&attr, -1, current, -- 2.38.1