Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp355899rdb; Tue, 5 Dec 2023 07:19:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IGrdI0f37FrQh5hXopxCqILlfx8bp2SOp7JeBweeg2nQ7HGp1xNC3tq0uKxKuQ2pUVtAv4P X-Received: by 2002:a05:6a00:809:b0:6b7:18c1:c09a with SMTP id m9-20020a056a00080900b006b718c1c09amr1430560pfk.5.1701789574977; Tue, 05 Dec 2023 07:19:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701789574; cv=none; d=google.com; s=arc-20160816; b=VvKTowfpnx5pYb/yOgHB9Er5zDb01XjtjIwaE/41I+lX4CahB4in2deipQpZpEFVee SkJ+mZJD35zuFhE6uwP7Q35/D8wvzld6GQmi7ghGG4X4D/NZemjf2uov7EvyinGdfUnb LePxV8DihXmCCG7+lijUhUFA3rkucpb6fAgTtNhn2MI913MqMJvkHqiI4qff9V+duxyd 8IBUWb7hsT1Xx+iQf/xVingovMXUsuPAqvVhVymumX5hJAfTjt7Dsxw0QBvQ+UqhDCbQ 6IRclhNnTh6fhOO6A9g7eBFe5ROPiVVcsJGzWVSzosg8smBuOYsAh4UZ7LRRvXnFb0jE 7Z+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=oEZfnmwEGuMeGRi9B69oY0jRL5WPBrbAayOwErPIXg0=; fh=owAasD6QInalfosdS8LBqQgsQ/joIDaI5pwKZERwsDU=; b=i8d+4aXdW6CSYMCyNLccfs37qCjsRsCJ0tmCpVKVoM9KONQZjFKkelSGPk2Wz4R16p MycGne37z011NVXrdP3ySO4cLyoi/iNnzR1UZK69fpqnetcM87M+I8dlLdNEjUSfP6Ra S+J3a+SSAw39NiIenEF39mtS/37aDCsYvM4P0gMd4gcSDYpKSW1PilGs/iEa2GT5V9W0 oWK/wm1JDrLESUrRoJQhG3PtNsEOMBcyS1TgpxviHG657jUYhGuM4iujcxUgLdbqXOqx 9srUae54uC+lH1fHUncI1Qtt8rOfuyx0E402OLa7+cBpZ9rFIPEduoX+AywYErbBPoR0 qhPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TftyEOVA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id k8-20020a056a00134800b006ce73355c1bsi196314pfu.59.2023.12.05.07.19.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 07:19:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TftyEOVA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id C9D96801B333; Tue, 5 Dec 2023 07:19:31 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346063AbjLEPTN (ORCPT + 99 others); Tue, 5 Dec 2023 10:19:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346094AbjLEPTL (ORCPT ); Tue, 5 Dec 2023 10:19:11 -0500 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9885B199; Tue, 5 Dec 2023 07:19:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701789556; x=1733325556; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=84s3KFtjKYi+AC7NctrVnTJKab+l+pnnUuTk6VFMeCM=; b=TftyEOVACk3RYWnWl82vRSUtGhuSpBQEWCE9+bG+mc2ENe0KaN6BLEnr WG5vsYdkoNyh0PtUb+dsWCwsOD8u343L6Sulq10Ors1CbfDeEgbQDluHd vS2lgv/WJ5QThNhUq0X0H9EvYVmM8t+ITDq44ySvittNhYrHqT6PqIVqR y+iWWeNW0lASA7h5QlWMVdx1+IIMCExakv0arvar0yP2AuyJaeExnoaSz 3sgcR5GZg+QjleQBIkCJQ/ydZHZFES9OPP2ScY0oWssZjaA5CutrCoALi DqkhTNPqXBH0YEabSlBQxba5+gcN8W6TfBzfXmQkGm/Pbg63yukPWXfo+ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10915"; a="15456275" X-IronPort-AV: E=Sophos;i="6.04,252,1695711600"; d="scan'208";a="15456275" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Dec 2023 07:19:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10915"; a="1018238776" X-IronPort-AV: E=Sophos;i="6.04,252,1695711600"; d="scan'208";a="1018238776" Received: from linux.intel.com ([10.54.29.200]) by fmsmga006.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Dec 2023 07:19:15 -0800 Received: from [10.212.98.168] (mnethi-mobl6.amr.corp.intel.com [10.212.98.168]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by linux.intel.com (Postfix) with ESMTPS id 6314D580ABD; Tue, 5 Dec 2023 07:19:14 -0800 (PST) Message-ID: <689846d4-7221-4c6e-bb37-a4d3561634b7@linux.intel.com> Date: Tue, 5 Dec 2023 10:19:13 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] KVM: x86/pmu: Prevent any host user from enabling PEBS for profiling guest Content-Language: en-US To: Like Xu Cc: Paolo Bonzini , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Sean Christopherson , Peter Zijlstra , Ian Rogers References: <20231129095055.88060-1-likexu@tencent.com> <6c4bd247-1f81-4b43-9e21-012f831d26b8@linux.intel.com> <84903edf-96a7-40da-8bbb-52511b4ec893@linux.intel.com> <444c0244-e377-4b4d-b3f0-a9404f013b87@linux.intel.com> <812822c4-8f24-4fc5-81eb-335abe46baa5@gmail.com> From: "Liang, Kan" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Tue, 05 Dec 2023 07:19:31 -0800 (PST) On 2023-12-05 2:24 a.m., Like Xu wrote: > > > On 4/12/2023 11:19 pm, Liang, Kan wrote: >> >> >> On 2023-12-04 3:32 a.m., Like Xu wrote: >>> >>> >>> On 1/12/2023 10:38 pm, Liang, Kan wrote: >>>> >>>> >>>> On 2023-11-30 10:59 p.m., Like Xu wrote: >>>>> On 30/11/2023 11:49 pm, Liang, Kan wrote: >>>>>> >>>>>> >>>>>> On 2023-11-30 2:29 a.m., Like Xu wrote: >>>>>>> On 29/11/2023 10:38 pm, Liang, Kan wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 2023-11-29 4:50 a.m., Like Xu wrote: >>>>>>>>> From: Like Xu >>>>>>>>> >>>>>>>>> Stop using PEBS counters on host to profiling guest. Limit the >>>>>>>>> range of >>>>>>>>> enabled PEBS counters to only those counters enabled from the >>>>>>>>> guest >>>>>>>>> PEBS >>>>>>>>> emulation perspective. >>>>>>>>> >>>>>>>>> If there is a perf-record agent on host that uses perf-tools >>>>>>>>> events >>>>>>>>> like >>>>>>>>> "cpu-cycles:GP" (G for attr.exclude_host, P for max precise event >>>>>>>>> counter) >>>>>>>>> to capture guest performance events, then the guest will be >>>>>>>>> hanged. >>>>>>>>> This is >>>>>>>>> because Intel DS-based PEBS buffer is addressed using the 64-bit >>>>>>>>> linear >>>>>>>>> address of the current {p/v}CPU context based on MSR_IA32_DS_AREA. >>>>>>>>> >>>>>>>>> Any perf user using PEBS counters to profile guest on host is, in >>>>>>>>> perf/core >>>>>>>>> implementation details, trying to set bits on >>>>>>>>> cpuc->intel_ctrl_guest_mask >>>>>>>>> and arr[pebs_enable].guest, much like the guest PEBS emulation >>>>>>>>> behaviour. >>>>>>>>> But the subsequent PEBS memory write, regardless of whether guest >>>>>>>>> PEBS is >>>>>>>>> enabled, can overshoot guest entry and corrupt guest memory. >>>>>>>>> >>>>>>>>> Profiling guest via PEBS-DS buffer on host is not supported at >>>>>>>>> this >>>>>>>>> time. >>>>>>>>> Fix this by filtering the real configured value of >>>>>>>>> arr[pebs_enable].guest >>>>>>>>> with the emulated state of guest enabled PEBS counters, under the >>>>>>>>> condition >>>>>>>>> of none cross-mapped PEBS counters. >>>>>>>> >>>>>>>> So the counter will be silently disabled. The user never knows why >>>>>>>> nothing is sampled. >>>>>>>> Since we don't support the case, profiling guest via PEBS-DS >>>>>>>> buffer on >>>>>>>> host. Maybe we should error out when creating the event. For >>>>>>>> example >>>>>>>> (not tested), >>>>>>> >>>>>>> Test failed. >>>>>>> >>>>>>>> >>>>>>>> diff --git a/arch/x86/events/intel/core.c >>>>>>>> b/arch/x86/events/intel/core.c >>>>>>>> index 3871267d3237..24b90c70737f 100644 >>>>>>>> --- a/arch/x86/events/intel/core.c >>>>>>>> +++ b/arch/x86/events/intel/core.c >>>>>>>> @@ -3958,6 +3958,10 @@ static int intel_pmu_hw_config(struct >>>>>>>> perf_event >>>>>>>> *event) >>>>>>>>              if ((event->attr.config & INTEL_ARCH_EVENT_MASK) == >>>>>>>> INTEL_FIXED_VLBR_EVENT) >>>>>>>>                  return -EINVAL; >>>>>>>> >>>>>>>> +        /* Profiling guest via PEBS-DS buffer on host is not >>>>>>>> supported. */ >>>>>>>> +        if (event->attr.exclude_host) >>>>>>>> +            return -EINVAL; >>>>>>>> + >>>>>>> >>>>>>> Guest PEBS emulation also sets this bit, a typical call stack looks >>>>>>> like: >>>>>>> >>>>>>>        intel_pmu_hw_config+0x441/0x4d0 >>>>>>>        hsw_hw_config+0x12/0xa0 >>>>>>>        x86_pmu_event_init+0x98/0x370 >>>>>>>        perf_try_init_event+0x47/0x130 >>>>>>>        perf_event_alloc+0x446/0xeb0 >>>>>>>        perf_event_create_kernel_counter+0x38/0x190 >>>>>>>        pmc_reprogram_counter.constprop.17+0xd9/0x230 [kvm] >>>>>>>        kvm_pmu_handle_event+0x1a6/0x310 [kvm] >>>>>>>        vcpu_enter_guest+0x1388/0x19b0 [kvm] >>>>>>>        vcpu_run+0x117/0x6c0 [kvm] >>>>>>>        kvm_arch_vcpu_ioctl_run+0x13d/0x4d0 [kvm] >>>>>>>        kvm_vcpu_ioctl+0x301/0x6e0 [kvm] >>>>>>> >>>>>> >>>>>> Oh right, the event from the KVM guest is also exclude_host. >>>>>> So we should only error out with the non-KVM exclude_host PEBS event. >>>>>> >>>>>>> Alternatively, this path is taken when using PEBS-via-PT to profile >>>>>>> guests on host. >>>>>> >>>>>> There is a is_pebs_pt(event), so we can skip the PEBS-via-PT. >>>>>> >>>>>> Seems we just need to distinguish a KVM event and a normal host >>>>>> event. >>>>>> I don't have a better way to do it except using >>>>>> event->overflow_handler_context, which is NULL for a normal host >>>>>> event. >>>>> >>>>> The assumption that event->overflow_handler_context == NULL is unsafe, >>>>> considering .overflow_handler_context hook has its acclaimed >>>>> generality. >>>>> >>>> >>>> Yes, I agree it's very hacky. >>>> >>>> What we need here is a way to distinguish the KVM guest request from >>>> the >>>> others. How about the PF_VCPU flag? >>>> >>>>        if (event->attr.exclude_host && >>>>            !is_pebs_pt(event) && >>>>            (!(event->attach_state & PERF_ATTACH_TASK) || >>>> !(current->flags & >>>> PF_VCPU)) >>>>                return -EINVAL; >>> >>> Unfortunately, the tests are not passed w/ the above diff. >> >> What's the exact failed perf command? > > [0] one vcpu thread running //usr/bin/yes > [1] tools/perf/perf kvm --host --guest \ > --guestkallsyms="..." \ --guestvmlinux="..." \ > record --raw-samples --kcore \ > -e "cpu-cycles:GP" \ > -p `pidof cloud-hypervisor` > >> >> Is it because that the KVM guest is mistakenly killed or the host >> command is not detected? > > Theoretically it should error_out, but instead of just printing > unsupported info, > this command even collects samples pointing to vmx_vmexit, which is also > not > expected, there should be no samples taken for "cpu-cycles:GP" event. If the event creation is rejected by the kernel, the perf tool will automatically remove the P and give another try. What's the perf tool output with -vvv? Do you see "decreasing precise_ip by one"? If the precise_ip is decreased to 0, the event is a non-pebs which can be created successfully. So you will still see the samples. > > The point at which PF_VCPU is set/cleared is not synchronised or even > controllable with the time of perf_event being added and enabled. > > Going the way of PF_VCPU will bring more complexity. Seems we need a new flag to distinguish the KVM created event. > >> >>> But it's good to know that there is PF_VCPU and more things to play >>> around with. >>> >>> The AMD IBS also takes the precise path, but it puts the recorded values >>> on group of MSRs instead of the linear address-based memory. >> >> The check is in the intel_pmu_hw_config(). So AMD isn't impacted. >> >>> >>> The root cause of this issue is the hardware limitation, just like what >>> we did >>> in the commit 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry"), >>> a similar fix should also belong in the context of directly configuring >>> VMX-switch related hardware configuration values. >>> >> >> The above fix is different than this case. IIRC, It's caused by a ucode >> issue. So once there is a guest, SW has to explicitly disable the PEBS. >> Otherwise, the guest crashes even the host doesn't intend to profile the >> guest. >> >> While the case in the patch is apparently a violation of the current >> rules. I think it's better to detect it and error out early. >> >> Thanks, >> Kan > > "The host does not intend to profile the guest" implies that at the time > 26a4f3c08de4 was introduced, we didn't have the capability to use PEBS to > profiling guest, and the error_out message was not prompted to the end-user > when vPEBS is not supported, and the fix to prevent guest crashes appears > in intel_guest_get_msrs(); > > Now vPEBS is supported and not enabled, but at this time, we're moving > the same logic that fixes guest crashes from intel_guest_get_msrs() to > generic intel_pmu_hw_config(). This might not be a good move. > For me, it fixes the same issue in the same way (I should have fixed > it in c59a1f106f5c). The problem is that the current KVM addicts to silently manipulating the MSRs. It doesn't care if the end user is informed properly about the changes. I don't think it's a proper way. Let's say if a user runs a system-wide PEBS event which tries to collect both host and guest, they should only get samples from the host, right? How do they know the reason why there is 0 samples from the guest? > > If we do more implementation to make PEBS on the host profiling guest, > with shared memory and read/write ordering, we don't need to change > intel_pmu_hw_config() any more, any of the required change is not out > of the scope of intel_guest_get_msrs(). The intel_pmu_hw_config() is to check whether the feature is supported. If we support the host profiling guest later, we should remove the check in the intel_pmu_hw_config(). > > To move forward, this fix won't prevent you from exploring more further > fixes in the intel_pmu_hw_config() or elsewhere, could we agree on that ? I think we need a real fix (from error handling to MSR manipulation), not just a workaround. I know the current error handling of perf is not perfect. Some KVM failures may not be able to be explicitly passed to the end user, e.g., if I recall correctly PEBS crossmapping. But it doesn't mean it's not important. We should try our best to notify the end user when it's possible. I think we can patch both intel_pmu_hw_config() and intel_guest_get_msrs() if necessary. Thanks, Kan > >> >>> I haven't find a better location than intel_guest_get_msrs(). >>> >>>> >>>> >>>>> I understand your motivation very well, but this is not the right move >>>>> (based on my previous history of being sprayed by Peter). For >>>>> perf/core, >>>>> in-kernel perf_events should be treated equally, and the semantics of >>>>> kvm_pmu should only be accounted when a perf/core API is only used for >>>>> guest-only path. In this case for KVM perf_events, >>>>> intel_guest_get_msrs() >>>>> and x86_pmu_handle_guest_pebs() have this context. >>>>> >>>>>> >>>>>> diff --git a/arch/x86/events/intel/core.c >>>>>> b/arch/x86/events/intel/core.c >>>>>> index a968708ed1fb..c93a2aaff7c3 100644 >>>>>> --- a/arch/x86/events/intel/core.c >>>>>> +++ b/arch/x86/events/intel/core.c >>>>>> @@ -3958,6 +3958,16 @@ static int intel_pmu_hw_config(struct >>>>>> perf_event >>>>>> *event) >>>>>>             if ((event->attr.config & INTEL_ARCH_EVENT_MASK) == >>>>>> INTEL_FIXED_VLBR_EVENT) >>>>>>                 return -EINVAL; >>>>>> >>>>>> +        /* >>>>>> +         * Profiling guest via PEBS-DS buffer on host is not >>>>>> supported. >>>>>> +         * The event->overflow_handler_context is to distinguish >>>>>> a KVM >>>>>> +         * event and a normal host event. >>>>>> +         */ >>>>>> +        if (event->attr.exclude_host && >>>>>> +            !is_pebs_pt(event) && >>>>>> +            !event->overflow_handler_context) >>>>>> +            return -EINVAL; >>>>>> + >>>>>>             if (!(event->attr.freq || (event->attr.wakeup_events && >>>>>> !event->attr.watermark))) { >>>>>>                 event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD; >>>>>>                 if (!(event->attr.sample_type & >>>>>> >>>>>>> >>>>>>> The status of the guest can only be queried in the NMI handler and >>>>>>> the func >>>>>>> intel_guest_get_msrs() in the perf/core context, where it's easier >>>>>>> and more >>>>>>> centrally to review this part of changes that affects vPMU for >>>>>>> corner >>>>>>> cases. >>>>>>> >>>>>>> Maybe adding print info on the perf-tool side would help. >>>>>>> >>>>>>> For perf-tool users, it will get 0 number of sample for >>>>>>> "cpu-cycles:GP" >>>>>>> events, >>>>>>> just like other uncounted perf-tool events. >>>>>> >>>>>> perf-tool would never know such details, e.g., whether the platform >>>>>> supports PEBS-DS or other PEBS method. It's hard to tell if the 0 is >>>>>> because of an unsupported hardware or nothing sampled in the guest. >>>>> >>>>> It kind of looks like this use case to me: >>>>> >>>>>       perf record -e cycles -e cpu/event=0xf4,umask=0x10/ ./workload # >>>>> ICX >>>>> >>>>> # Total Lost Samples: 0 >>>>> # >>>>> # Samples: 0  of event 'cpu/event=0xf4,umask=0x10/' >>>>> # Event count (approx.): 0 >>>>> >>>>> A end-user has to check if the event-umask combination is supported >>>>> or not, >>>>> or nothing sampled for the workload. Is there any room for >>>>> improvement in >>>>> perf-tool to reduce the pain of this part ? If any, the same thing >>>>> could >>>>> be applied >>>>> to cpu-cycles:GP, isn't it ? >>>> >>>> I don't think we can expect the end user knows such details. Most of >>>> them may even don't know what's PEBS-via-DS. >>> >>> Maybe the following generic reminder helps: >>> >>> # Total Lost Samples: 0 >>> # Note: the event is not counted or unsupported. >>> # >>> # Samples: 0  of event 'cpu/event=0xf4,umask=0x10/' >>> # Event count (approx.): 0 >>> >>>> >>>> Thanks, >>>> Kan >>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Kan >>>>>>> >>>>>>>>              if (!(event->attr.freq || >>>>>>>> (event->attr.wakeup_events && >>>>>>>> !event->attr.watermark))) { >>>>>>>>                  event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD; >>>>>>>>                  if (!(event->attr.sample_type & >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Kan >>>>>>>> >>>>>>>>> >>>>>>>>> Cc: Peter Zijlstra (Intel) >>>>>>>>> Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR >>>>>>>>> emulation for extended PEBS") >>>>>>>>> Signed-off-by: Like Xu >>>>>>>>> --- >>>>>>>>>      arch/x86/events/intel/core.c | 8 +++++++- >>>>>>>>>      1 file changed, 7 insertions(+), 1 deletion(-) >>>>>>>>> >>>>>>>>> diff --git a/arch/x86/events/intel/core.c >>>>>>>>> b/arch/x86/events/intel/core.c >>>>>>>>> index a08f794a0e79..17afd504c35b 100644 >>>>>>>>> --- a/arch/x86/events/intel/core.c >>>>>>>>> +++ b/arch/x86/events/intel/core.c >>>>>>>>> @@ -4103,13 +4103,19 @@ static struct perf_guest_switch_msr >>>>>>>>> *intel_guest_get_msrs(int *nr, void *data) >>>>>>>>>              .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask, >>>>>>>>>          }; >>>>>>>>>      +    /* In any case, clear guest PEBS bits first. */ >>>>>>>>> +    arr[global_ctrl].guest &= ~arr[pebs_enable].guest; >>>>>>>>> + >>>>>>>>>          if (arr[pebs_enable].host) { >>>>>>>>>              /* Disable guest PEBS if host PEBS is enabled. */ >>>>>>>>>              arr[pebs_enable].guest = 0; >>>>>>>>>          } else { >>>>>>>>>              /* Disable guest PEBS thoroughly for cross-mapped >>>>>>>>> PEBS >>>>>>>>> counters. */ >>>>>>>>>              arr[pebs_enable].guest &= >>>>>>>>> ~kvm_pmu->host_cross_mapped_mask; >>>>>>>>> -        arr[global_ctrl].guest &= >>>>>>>>> ~kvm_pmu->host_cross_mapped_mask; >>>>>>>>> + >>>>>>>>> +        /* Prevent any host user from enabling PEBS for profiling >>>>>>>>> guest. */ >>>>>>>>> +        arr[pebs_enable].guest &= (kvm_pmu->pebs_enable & >>>>>>>>> kvm_pmu->global_ctrl); >>>>>>>>> + >>>>>>>>>              /* Set hw GLOBAL_CTRL bits for PEBS counter when >>>>>>>>> it runs >>>>>>>>> for guest */ >>>>>>>>>              arr[global_ctrl].guest |= arr[pebs_enable].guest; >>>>>>>>>          } >>>>>>>>> >>>>>>>>> base-commit: 6803fb00772cc50cd59a66bd8caaee5c84b13fcf >>>