Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp2902140pxb; Sun, 8 Nov 2020 18:23:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJynRkw3ZFAEcF+MM3i4Irqt5U2Iw4wpDIdeT+FDVL7xmpuaCsocRw1wCEk+YiQy/1Ckgkfh X-Received: by 2002:a17:906:26c7:: with SMTP id u7mr12838698ejc.494.1604888611111; Sun, 08 Nov 2020 18:23:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604888611; cv=none; d=google.com; s=arc-20160816; b=0mvQmzwqF9bv3xRY5YGqK8/W3EYS7v7CbuIxFhg4aaACaqEVa5uplZfLiNUzfJY9hv nLEfEZ1ga7Pp6QJXy6FnGT6low7LQX1XFmPokSxwsm3h/ZMMr/9gyU6i7+5c7VFP35Tr y+FBVo3bRILiclGFrYDsKYuZ83ocpTs+6vrLzNExpVVB00diMvBrqFkRwzcUE1JhK2rZ BUBphvNcwHQdNxEjXx9AAMCvdXVt/K+7DRTEITqubDOGB+ayTHBLMh1VaVZ8a7kaAnn/ JJhOJO7hS5x+qoXhpk3plfB5qCmLVerCZXIzsT5vhHaJznTQZloHNyZ3p6TRHpRHWxXi +voA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :ironport-sdr:ironport-sdr; bh=CS3r06TKU+3uq6rR928csZUTBaZSZZjUh6V3ybEZwJI=; b=U97yYEXuDyiuhrPkOF5g5SJx/PR2sO1zUMFzkSu/iRg+5YgFyer5+mD4nX8405oNvE pYpPcNxped3EC+8yQZvHOBHJLiKiJAOkswPEcVEGMyV9JXwAKr0eiFkYSHQ4s7NA8NQm szaJW+d0qy13Za/bgM3rGHUGmLP1BMw/FV9fCQ4ALEpbPF8QmiCnvp3/wuxTZTSqUTZd UAv9kE2ez5qP8EvR5rT4OEi//Zqfl/jOcJUMO6PQtDDrD4gSPWWj9XfPXNjljRofOjRY gyzCCp5yfEj+vTDz3eLe4690EO/CSADKnE69rg+fAxGvW52PUQgFMjBDtQCEs/ukPRKn klNw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y7si6269077edp.497.2020.11.08.18.23.08; Sun, 08 Nov 2020 18:23:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729201AbgKICRX (ORCPT + 99 others); Sun, 8 Nov 2020 21:17:23 -0500 Received: from mga01.intel.com ([192.55.52.88]:64927 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729119AbgKICRW (ORCPT ); Sun, 8 Nov 2020 21:17:22 -0500 IronPort-SDR: +gQz6LNq3ym8PMpF1rbKBXCy8XE79EWjJ2hQiYqP+Nk9A0b1bu/VnP3vWFgQDQToRurVYIkihg 4wrpoJ5UMISw== X-IronPort-AV: E=McAfee;i="6000,8403,9799"; a="187684601" X-IronPort-AV: E=Sophos;i="5.77,462,1596524400"; d="scan'208";a="187684601" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2020 18:17:05 -0800 IronPort-SDR: 5xcBZ083ilWFnTu4+0og/HCYlvl4Jhj0dtxK1tMnQ4dXgFY7aAu6rTeIpeJDpHZt26ozxvqStb QoGqJIJLieaw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,462,1596524400"; d="scan'208";a="540646153" Received: from e5-2699-v4-likexu.sh.intel.com ([10.239.48.39]) by orsmga005.jf.intel.com with ESMTP; 08 Nov 2020 18:17:01 -0800 From: Like Xu To: Peter Zijlstra , Paolo Bonzini , kvm@vger.kernel.org Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Kan Liang , luwei.kang@intel.com, Thomas Gleixner , wei.w.wang@intel.com, Tony Luck , Stephane Eranian , Mark Gross , Srinivas Pandruvada , linux-kernel@vger.kernel.org Subject: [PATCH v2 04/17] perf: x86/ds: Handle guest PEBS overflow PMI and inject it to guest Date: Mon, 9 Nov 2020 10:12:41 +0800 Message-Id: <20201109021254.79755-5-like.xu@linux.intel.com> X-Mailer: git-send-email 2.21.3 In-Reply-To: <20201109021254.79755-1-like.xu@linux.intel.com> References: <20201109021254.79755-1-like.xu@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With PEBS virtualization, the PEBS records get delivered to the guest, and host still sees the PEBS overflow PMI from guest PEBS counters. This would normally result in a spurious host PMI and we needs to inject that PEBS overflow PMI into the guest, so that the guest PMI handler can handle the PEBS records. Check for this case in the host perf PEBS handler. If a PEBS overflow PMI occurs and it's not generated from host side (via check host DS), a fake event will be triggered. The fake event causes the KVM PMI callback to be called, thereby injecting the PEBS overflow PMI into the guest. No matter how many guest PEBS counters are overflowed, only triggering one fake event is enough. The guest PEBS handler would retrieve the correct information from its own PEBS records buffer. If the counter_freezing is disabled on the host, a guest PEBS overflow PMI would be missed when a PEBS counter is enabled on the host side and coincidentally a host PEBS overflow PMI based on host DS_AREA is also triggered right after vm-exit due to the guest PEBS overflow PMI based on guest DS_AREA. In that case, KVM will disable guest PEBS before vm-entry once there's a host PEBS counter enabled on the same CPU. Originally-by: Andi Kleen Co-developed-by: Kan Liang Signed-off-by: Kan Liang Signed-off-by: Like Xu --- arch/x86/events/intel/ds.c | 64 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 86848c57b55e..1e759c74bffd 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1721,6 +1721,67 @@ intel_pmu_save_and_restart_reload(struct perf_event *event, int count) return 0; } +/* + * We may be running with guest PEBS events created by KVM, and the + * PEBS records are logged into the guest's DS and invisible to host. + * + * In the case of guest PEBS overflow, we only trigger a fake event + * to emulate the PEBS overflow PMI for guest PBES counters in KVM. + * The guest will then vm-entry and check the guest DS area to read + * the guest PEBS records. + * + * Without counter_freezing support on the host, the guest PEBS overflow + * PMI may be dropped when both the guest and the host use PEBS. + * Therefore, KVM will not enable guest PEBS once the host PEBS is enabled + * without counter_freezing since it may bring a confused unknown NMI. + * + * The contents and other behavior of the guest event do not matter. + */ +static int intel_pmu_handle_guest_pebs(struct cpu_hw_events *cpuc, + struct pt_regs *iregs, + struct debug_store *ds) +{ + struct perf_sample_data data; + struct perf_event *event = NULL; + u64 guest_pebs_idxs = cpuc->pebs_enabled & ~cpuc->intel_ctrl_host_mask; + int bit; + + /* + * Ideally, we should check guest DS to understand if it's + * a guest PEBS overflow PMI from guest PEBS counters. + * However, it brings high overhead to retrieve guest DS in host. + * So we check host DS instead for performance. + * + * If PEBS interrupt threshold on host is not exceeded in a NMI, there + * must be a PEBS overflow PMI generated from the guest PEBS counters. + * There is no ambiguity since the reported event in the PMI is guest + * only. It gets handled correctly on a case by case base for each event. + * + * Note: This is based on the assumption that counter_freezing is enabled, + * or KVM disables the co-existence of guest PEBS and host PEBS. + */ + if (!guest_pebs_idxs || !in_nmi() || + ds->pebs_index >= ds->pebs_interrupt_threshold) + return 0; + + for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs, + INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed) { + + event = cpuc->events[bit]; + if (!event->attr.precise_ip) + continue; + + perf_sample_data_init(&data, 0, event->hw.last_period); + if (perf_event_overflow(event, &data, iregs)) + x86_pmu_stop(event, 0); + + /* Inject one fake event is enough. */ + return 1; + } + + return 0; +} + static void __intel_pmu_pebs_event(struct perf_event *event, struct pt_regs *iregs, void *base, void *top, @@ -1954,6 +2015,9 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs) if (!x86_pmu.pebs_active) return; + if (intel_pmu_handle_guest_pebs(cpuc, iregs, ds)) + return; + base = (struct pebs_basic *)(unsigned long)ds->pebs_buffer_base; top = (struct pebs_basic *)(unsigned long)ds->pebs_index; -- 2.21.3