Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp726870rdb; Tue, 5 Dec 2023 19:51:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IF+1XwtlxxboAYjs6o4Ohj5liY1XnB5xDsE1ZfFQuha7ZUWsD5ZzEDdgQSdO2Cq+/eqHDLq X-Received: by 2002:a05:6a00:cc5:b0:6ce:3712:e522 with SMTP id b5-20020a056a000cc500b006ce3712e522mr161551pfv.14.1701834659680; Tue, 05 Dec 2023 19:50:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701834659; cv=none; d=google.com; s=arc-20160816; b=u376fT7fQY+p+1NUznkgqaPGB/JNACU8L0DTNgX9Y1KJP2rA6uCoFliE9yszRylL4O EVb6J77WHPz0RzeLjB7TwSaKnQjg8le3QMIuYbAhTnUJIdoycrLfb42s7zLjkcxqcqV9 9ejv3KoRghI5PuQfCktR5SOvQmU10xdyXIeUkmoVwpy+8+OjQL3lCAIm755pFFb1C1xX D0cCFXS8fSwOxvr1hciY2yiIYAQfMT2MBRSi1NpVbSOe5MrdW3XUcGE+zwVWjh7TChUY PKaqQ5hvhxiKSlNFbD/jcTXAFshSmZbEk3/KNEOWddAsV2rYoeWUyu63prwHswQBhwdc /DNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=IA5narrqLRvX1tvPsZjw9uL7ABrtpaMBuRPT61b57Gw=; fh=O2NjnKagjeOKUmkJ1q9ulEJJttFsJfCAIa4kHyw8jYM=; b=1Cx42RlY8OA086KRTiODYffsuPTpNPw7Py9PzyfIsvBdYxBcmObuNza4Fqf/jedc2K NIDCjXgzrLTt68hahtpKaLeYHqKHNw+BfGgdRNZ9pF4KIUXKV1kkfAJbERfEmnX8H+qO NAjmeZdeCFGgZpv/Bm6ZNfaEeT3tFqdKYFkr3sGE99AZDeb2TYACr4B00FN2wD8da1dR cTrhe52USQWBoWie9akMXQNOLBDyOiAlP/bb+fFt3K6pBipwf/3nJxQZZQzKSEvHHGGL djyqilJ1gPzZxuG27uoFmGV8b8reF/f3FhkgWFybrXPgzJoat00Gya+3dd02DfBee0F2 dwDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=FoOXeJbb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id y7-20020a62ce07000000b006cbfbb64e00si3761236pfg.138.2023.12.05.19.50.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 19:50:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=FoOXeJbb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 526A3807AC6F; Tue, 5 Dec 2023 19:50:36 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376424AbjLFDtD (ORCPT + 99 others); Tue, 5 Dec 2023 22:49:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229920AbjLFDtC (ORCPT ); Tue, 5 Dec 2023 22:49:02 -0500 Received: from mail-oo1-xc30.google.com (mail-oo1-xc30.google.com [IPv6:2607:f8b0:4864:20::c30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 232F51AA; Tue, 5 Dec 2023 19:49:07 -0800 (PST) Received: by mail-oo1-xc30.google.com with SMTP id 006d021491bc7-58df5988172so3878252eaf.0; Tue, 05 Dec 2023 19:49:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701834546; x=1702439346; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=IA5narrqLRvX1tvPsZjw9uL7ABrtpaMBuRPT61b57Gw=; b=FoOXeJbbmOZqg9uv0z8DAo2xL65ahNEf1nvh3TIzTmQWoQW76eIYGUqU6n1r2ZAEJm obtGaVp9RFCKDIk6m0G5NnzhCvRyhkP4J9AL8EPPy3acxOqRC2mpiTP1lDUp1XcSnRNh JVkQPTwS3pbs9ezqfNrSquKG0grjKpYJowxLFqXKQ96+7J07Ni7NCmGC/27Ogd7sDxRi CcQPq5vjz431vY4mvV7E/vNwLdu2gFlR3XNPZqrjb5JGUZDfYaMVYtVu7cWch6hvq1B3 mnW2WzSedLYvd3FMDjO22Loke6dJIKqq0rcIGwHC1vtZMxKilSlucM+ohHKin+/D8Dgf I9bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701834546; x=1702439346; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IA5narrqLRvX1tvPsZjw9uL7ABrtpaMBuRPT61b57Gw=; b=v8OK0Z3YNbT9Z+9NNIcT7Nw7bOcxIoN3KwkicbEN1rcypLMXJbQFE9GgghW8z55V0q +BRgGaE/cm1Ku+LUzs6/XJXE2w8xd4M82YJXwFU4rlYFJG0w9olUgIaHKpyegz2A4xd7 WkUm9ssOSqfzB3YYsra4QSyefEqibh0iVx90oW5m2+WOEwIIa/gsinTk1OyZabW2kJ2i 56UsPyeKJvUv3Dk+geF2oI163zNVtjLvsmJIJhPIOrL5uLAEYAJ1nJhMaQXLEHr4rBjv nXKlw/sI0cd6mz7PCogG3l6u8fa6HbLoOvWQp/sLgbRql953yr38HaXv1BdyEyhbwgvu 1+zw== X-Gm-Message-State: AOJu0YwnfRfeB8Wwhr9Y4exVebA+3MUcop72ExsfxZ/rqB7/4LQJIy43 kPX1HDn255+JHQyCYpk1L6Y= X-Received: by 2002:a05:6358:720f:b0:16d:fd30:6d14 with SMTP id h15-20020a056358720f00b0016dfd306d14mr505003rwa.17.1701834545914; Tue, 05 Dec 2023 19:49:05 -0800 (PST) Received: from [192.168.255.10] ([43.132.98.47]) by smtp.gmail.com with ESMTPSA id j6-20020a17090276c600b001cf5422f23esm10484020plt.294.2023.12.05.19.49.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 05 Dec 2023 19:49:05 -0800 (PST) Message-ID: <8896fd2c-304f-47eb-8bec-929ada251888@gmail.com> Date: Wed, 6 Dec 2023 11:49:01 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] KVM: x86/pmu: Prevent any host user from enabling PEBS for profiling guest Content-Language: en-US To: "Liang, Kan" , Sean Christopherson Cc: Paolo Bonzini , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Peter Zijlstra , Ian Rogers References: <20231129095055.88060-1-likexu@tencent.com> <6c4bd247-1f81-4b43-9e21-012f831d26b8@linux.intel.com> <84903edf-96a7-40da-8bbb-52511b4ec893@linux.intel.com> <444c0244-e377-4b4d-b3f0-a9404f013b87@linux.intel.com> <812822c4-8f24-4fc5-81eb-335abe46baa5@gmail.com> <689846d4-7221-4c6e-bb37-a4d3561634b7@linux.intel.com> From: Like Xu In-Reply-To: <689846d4-7221-4c6e-bb37-a4d3561634b7@linux.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 05 Dec 2023 19:50:36 -0800 (PST) On 5/12/2023 11:19 pm, Liang, Kan wrote: > > > On 2023-12-05 2:24 a.m., Like Xu wrote: >> >> >> On 4/12/2023 11:19 pm, Liang, Kan wrote: >>> >>> >>> On 2023-12-04 3:32 a.m., Like Xu wrote: >>>> >>>> >>>> On 1/12/2023 10:38 pm, Liang, Kan wrote: >>>>> >>>>> >>>>> On 2023-11-30 10:59 p.m., Like Xu wrote: >>>>>> On 30/11/2023 11:49 pm, Liang, Kan wrote: >>>>>>> >>>>>>> >>>>>>> On 2023-11-30 2:29 a.m., Like Xu wrote: >>>>>>>> On 29/11/2023 10:38 pm, Liang, Kan wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2023-11-29 4:50 a.m., Like Xu wrote: >>>>>>>>>> From: Like Xu >>>>>>>>>> >>>>>>>>>> Stop using PEBS counters on host to profiling guest. Limit the >>>>>>>>>> range of >>>>>>>>>> enabled PEBS counters to only those counters enabled from the >>>>>>>>>> guest >>>>>>>>>> PEBS >>>>>>>>>> emulation perspective. >>>>>>>>>> >>>>>>>>>> If there is a perf-record agent on host that uses perf-tools >>>>>>>>>> events >>>>>>>>>> like >>>>>>>>>> "cpu-cycles:GP" (G for attr.exclude_host, P for max precise event >>>>>>>>>> counter) >>>>>>>>>> to capture guest performance events, then the guest will be >>>>>>>>>> hanged. >>>>>>>>>> This is >>>>>>>>>> because Intel DS-based PEBS buffer is addressed using the 64-bit >>>>>>>>>> linear >>>>>>>>>> address of the current {p/v}CPU context based on MSR_IA32_DS_AREA. >>>>>>>>>> >>>>>>>>>> Any perf user using PEBS counters to profile guest on host is, in >>>>>>>>>> perf/core >>>>>>>>>> implementation details, trying to set bits on >>>>>>>>>> cpuc->intel_ctrl_guest_mask >>>>>>>>>> and arr[pebs_enable].guest, much like the guest PEBS emulation >>>>>>>>>> behaviour. >>>>>>>>>> But the subsequent PEBS memory write, regardless of whether guest >>>>>>>>>> PEBS is >>>>>>>>>> enabled, can overshoot guest entry and corrupt guest memory. >>>>>>>>>> >>>>>>>>>> Profiling guest via PEBS-DS buffer on host is not supported at >>>>>>>>>> this >>>>>>>>>> time. >>>>>>>>>> Fix this by filtering the real configured value of >>>>>>>>>> arr[pebs_enable].guest >>>>>>>>>> with the emulated state of guest enabled PEBS counters, under the >>>>>>>>>> condition >>>>>>>>>> of none cross-mapped PEBS counters. >>>>>>>>> >>>>>>>>> So the counter will be silently disabled. The user never knows why >>>>>>>>> nothing is sampled. >>>>>>>>> Since we don't support the case, profiling guest via PEBS-DS >>>>>>>>> buffer on >>>>>>>>> host. Maybe we should error out when creating the event. For >>>>>>>>> example >>>>>>>>> (not tested), >>>>>>>> >>>>>>>> Test failed. >>>>>>>> >>>>>>>>> >>>>>>>>> diff --git a/arch/x86/events/intel/core.c >>>>>>>>> b/arch/x86/events/intel/core.c >>>>>>>>> index 3871267d3237..24b90c70737f 100644 >>>>>>>>> --- a/arch/x86/events/intel/core.c >>>>>>>>> +++ b/arch/x86/events/intel/core.c >>>>>>>>> @@ -3958,6 +3958,10 @@ static int intel_pmu_hw_config(struct >>>>>>>>> perf_event >>>>>>>>> *event) >>>>>>>>>              if ((event->attr.config & INTEL_ARCH_EVENT_MASK) == >>>>>>>>> INTEL_FIXED_VLBR_EVENT) >>>>>>>>>                  return -EINVAL; >>>>>>>>> >>>>>>>>> +        /* Profiling guest via PEBS-DS buffer on host is not >>>>>>>>> supported. */ >>>>>>>>> +        if (event->attr.exclude_host) >>>>>>>>> +            return -EINVAL; >>>>>>>>> + >>>>>>>> >>>>>>>> Guest PEBS emulation also sets this bit, a typical call stack looks >>>>>>>> like: >>>>>>>> >>>>>>>>        intel_pmu_hw_config+0x441/0x4d0 >>>>>>>>        hsw_hw_config+0x12/0xa0 >>>>>>>>        x86_pmu_event_init+0x98/0x370 >>>>>>>>        perf_try_init_event+0x47/0x130 >>>>>>>>        perf_event_alloc+0x446/0xeb0 >>>>>>>>        perf_event_create_kernel_counter+0x38/0x190 >>>>>>>>        pmc_reprogram_counter.constprop.17+0xd9/0x230 [kvm] >>>>>>>>        kvm_pmu_handle_event+0x1a6/0x310 [kvm] >>>>>>>>        vcpu_enter_guest+0x1388/0x19b0 [kvm] >>>>>>>>        vcpu_run+0x117/0x6c0 [kvm] >>>>>>>>        kvm_arch_vcpu_ioctl_run+0x13d/0x4d0 [kvm] >>>>>>>>        kvm_vcpu_ioctl+0x301/0x6e0 [kvm] >>>>>>>> >>>>>>> >>>>>>> Oh right, the event from the KVM guest is also exclude_host. >>>>>>> So we should only error out with the non-KVM exclude_host PEBS event. >>>>>>> >>>>>>>> Alternatively, this path is taken when using PEBS-via-PT to profile >>>>>>>> guests on host. >>>>>>> >>>>>>> There is a is_pebs_pt(event), so we can skip the PEBS-via-PT. >>>>>>> >>>>>>> Seems we just need to distinguish a KVM event and a normal host >>>>>>> event. >>>>>>> I don't have a better way to do it except using >>>>>>> event->overflow_handler_context, which is NULL for a normal host >>>>>>> event. >>>>>> >>>>>> The assumption that event->overflow_handler_context == NULL is unsafe, >>>>>> considering .overflow_handler_context hook has its acclaimed >>>>>> generality. >>>>>> >>>>> >>>>> Yes, I agree it's very hacky. >>>>> >>>>> What we need here is a way to distinguish the KVM guest request from >>>>> the >>>>> others. How about the PF_VCPU flag? >>>>> >>>>>        if (event->attr.exclude_host && >>>>>            !is_pebs_pt(event) && >>>>>            (!(event->attach_state & PERF_ATTACH_TASK) || >>>>> !(current->flags & >>>>> PF_VCPU)) >>>>>                return -EINVAL; >>>> >>>> Unfortunately, the tests are not passed w/ the above diff. >>> >>> What's the exact failed perf command? >> >> [0] one vcpu thread running //usr/bin/yes >> [1] tools/perf/perf kvm --host --guest \ >> --guestkallsyms="..." \ --guestvmlinux="..." \ >> record --raw-samples --kcore \ >> -e "cpu-cycles:GP" \ >> -p `pidof cloud-hypervisor` >> >>> >>> Is it because that the KVM guest is mistakenly killed or the host >>> command is not detected? >> >> Theoretically it should error_out, but instead of just printing >> unsupported info, >> this command even collects samples pointing to vmx_vmexit, which is also >> not >> expected, there should be no samples taken for "cpu-cycles:GP" event. > > If the event creation is rejected by the kernel, the perf tool will > automatically remove the P and give another try. > What's the perf tool output with -vvv? > Do you see "decreasing precise_ip by one"? > > If the precise_ip is decreased to 0, the event is a non-pebs which can > be created successfully. So you will still see the samples. I'm guessing this happened and the end-users need '-vvv' to notice the details. > >> >> The point at which PF_VCPU is set/cleared is not synchronised or even >> controllable with the time of perf_event being added and enabled. >> >> Going the way of PF_VCPU will bring more complexity. > > Seems we need a new flag to distinguish the KVM created event. I'd love to see you make this happen, and those perf_events callers from ebpf would appreciate the flag. > >> >>> >>>> But it's good to know that there is PF_VCPU and more things to play >>>> around with. >>>> >>>> The AMD IBS also takes the precise path, but it puts the recorded values >>>> on group of MSRs instead of the linear address-based memory. >>> >>> The check is in the intel_pmu_hw_config(). So AMD isn't impacted. >>> >>>> >>>> The root cause of this issue is the hardware limitation, just like what >>>> we did >>>> in the commit 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry"), >>>> a similar fix should also belong in the context of directly configuring >>>> VMX-switch related hardware configuration values. >>>> >>> >>> The above fix is different than this case. IIRC, It's caused by a ucode >>> issue. So once there is a guest, SW has to explicitly disable the PEBS. >>> Otherwise, the guest crashes even the host doesn't intend to profile the >>> guest. >>> >>> While the case in the patch is apparently a violation of the current >>> rules. I think it's better to detect it and error out early. >>> >>> Thanks, >>> Kan >> >> "The host does not intend to profile the guest" implies that at the time >> 26a4f3c08de4 was introduced, we didn't have the capability to use PEBS to >> profiling guest, and the error_out message was not prompted to the end-user >> when vPEBS is not supported, and the fix to prevent guest crashes appears >> in intel_guest_get_msrs(); >> >> Now vPEBS is supported and not enabled, but at this time, we're moving >> the same logic that fixes guest crashes from intel_guest_get_msrs() to >> generic intel_pmu_hw_config(). This might not be a good move. >> For me, it fixes the same issue in the same way (I should have fixed >> it in c59a1f106f5c). > > The problem is that the current KVM addicts to silently manipulating the > MSRs. It doesn't care if the end user is informed properly about the > changes. I don't think it's a proper way. I have to agree, KVM also has quirks that users are not aware of. > > Let's say if a user runs a system-wide PEBS event which tries to collect > both host and guest, they should only get samples from the host, right? > How do they know the reason why there is 0 samples from the guest? The same thing happened at pmu-unfunctional ctx like enclave and SMM. perf-report even has a parameter option that can change the denominator of all collected samples (it thinks that's the whole story for executed payload), while making the data appear distorted and conclusions absurd. > >> >> If we do more implementation to make PEBS on the host profiling guest, >> with shared memory and read/write ordering, we don't need to change >> intel_pmu_hw_config() any more, any of the required change is not out >> of the scope of intel_guest_get_msrs(). > > The intel_pmu_hw_config() is to check whether the feature is supported. > If we support the host profiling guest later, we should remove the check > in the intel_pmu_hw_config(). > >> >> To move forward, this fix won't prevent you from exploring more further >> fixes in the intel_pmu_hw_config() or elsewhere, could we agree on that ? > > I think we need a real fix (from error handling to MSR manipulation), > not just a workaround. > > I know the current error handling of perf is not perfect. Some KVM > failures may not be able to be explicitly passed to the end user, e.g., > if I recall correctly PEBS crossmapping. But it doesn't mean it's not > important. We should try our best to notify the end user when it's possible. It sounds like we need to add a little more detail to the header of the samples. I've also noticed that some users don't use perf/tool but develop directly based on syscall, but it's not that simple to abstract some generic prompt messages since '-Einval' is not enough for wider cases. > > I think we can patch both intel_pmu_hw_config() and > intel_guest_get_msrs() if necessary. Thank you and thanks for your comments. At the very least, KVM need this minor fix on intel_guest_get_msrs() to alleviate my guilt. And let's see what we will get on som thread around intel_pmu_hw_config(). > > Thanks, > Kan >> >>> >>>> I haven't find a better location than intel_guest_get_msrs(). >>>> >>>>> >>>>> >>>>>> I understand your motivation very well, but this is not the right move >>>>>> (based on my previous history of being sprayed by Peter). For >>>>>> perf/core, >>>>>> in-kernel perf_events should be treated equally, and the semantics of >>>>>> kvm_pmu should only be accounted when a perf/core API is only used for >>>>>> guest-only path. In this case for KVM perf_events, >>>>>> intel_guest_get_msrs() >>>>>> and x86_pmu_handle_guest_pebs() have this context. >>>>>> >>>>>>> >>>>>>> diff --git a/arch/x86/events/intel/core.c >>>>>>> b/arch/x86/events/intel/core.c >>>>>>> index a968708ed1fb..c93a2aaff7c3 100644 >>>>>>> --- a/arch/x86/events/intel/core.c >>>>>>> +++ b/arch/x86/events/intel/core.c >>>>>>> @@ -3958,6 +3958,16 @@ static int intel_pmu_hw_config(struct >>>>>>> perf_event >>>>>>> *event) >>>>>>>             if ((event->attr.config & INTEL_ARCH_EVENT_MASK) == >>>>>>> INTEL_FIXED_VLBR_EVENT) >>>>>>>                 return -EINVAL; >>>>>>> >>>>>>> +        /* >>>>>>> +         * Profiling guest via PEBS-DS buffer on host is not >>>>>>> supported. >>>>>>> +         * The event->overflow_handler_context is to distinguish >>>>>>> a KVM >>>>>>> +         * event and a normal host event. >>>>>>> +         */ >>>>>>> +        if (event->attr.exclude_host && >>>>>>> +            !is_pebs_pt(event) && >>>>>>> +            !event->overflow_handler_context) >>>>>>> +            return -EINVAL; >>>>>>> + >>>>>>>             if (!(event->attr.freq || (event->attr.wakeup_events && >>>>>>> !event->attr.watermark))) { >>>>>>>                 event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD; >>>>>>>                 if (!(event->attr.sample_type & >>>>>>> >>>>>>>> >>>>>>>> The status of the guest can only be queried in the NMI handler and >>>>>>>> the func >>>>>>>> intel_guest_get_msrs() in the perf/core context, where it's easier >>>>>>>> and more >>>>>>>> centrally to review this part of changes that affects vPMU for >>>>>>>> corner >>>>>>>> cases. >>>>>>>> >>>>>>>> Maybe adding print info on the perf-tool side would help. >>>>>>>> >>>>>>>> For perf-tool users, it will get 0 number of sample for >>>>>>>> "cpu-cycles:GP" >>>>>>>> events, >>>>>>>> just like other uncounted perf-tool events. >>>>>>> >>>>>>> perf-tool would never know such details, e.g., whether the platform >>>>>>> supports PEBS-DS or other PEBS method. It's hard to tell if the 0 is >>>>>>> because of an unsupported hardware or nothing sampled in the guest. >>>>>> >>>>>> It kind of looks like this use case to me: >>>>>> >>>>>>       perf record -e cycles -e cpu/event=0xf4,umask=0x10/ ./workload # >>>>>> ICX >>>>>> >>>>>> # Total Lost Samples: 0 >>>>>> # >>>>>> # Samples: 0  of event 'cpu/event=0xf4,umask=0x10/' >>>>>> # Event count (approx.): 0 >>>>>> >>>>>> A end-user has to check if the event-umask combination is supported >>>>>> or not, >>>>>> or nothing sampled for the workload. Is there any room for >>>>>> improvement in >>>>>> perf-tool to reduce the pain of this part ? If any, the same thing >>>>>> could >>>>>> be applied >>>>>> to cpu-cycles:GP, isn't it ? >>>>> >>>>> I don't think we can expect the end user knows such details. Most of >>>>> them may even don't know what's PEBS-via-DS. >>>> >>>> Maybe the following generic reminder helps: >>>> >>>> # Total Lost Samples: 0 >>>> # Note: the event is not counted or unsupported. >>>> # >>>> # Samples: 0  of event 'cpu/event=0xf4,umask=0x10/' >>>> # Event count (approx.): 0 >>>> >>>>> >>>>> Thanks, >>>>> Kan >>>>> >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Kan >>>>>>>> >>>>>>>>>              if (!(event->attr.freq || >>>>>>>>> (event->attr.wakeup_events && >>>>>>>>> !event->attr.watermark))) { >>>>>>>>>                  event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD; >>>>>>>>>                  if (!(event->attr.sample_type & >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Kan >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Cc: Peter Zijlstra (Intel) >>>>>>>>>> Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR >>>>>>>>>> emulation for extended PEBS") >>>>>>>>>> Signed-off-by: Like Xu >>>>>>>>>> --- >>>>>>>>>>      arch/x86/events/intel/core.c | 8 +++++++- >>>>>>>>>>      1 file changed, 7 insertions(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>>> diff --git a/arch/x86/events/intel/core.c >>>>>>>>>> b/arch/x86/events/intel/core.c >>>>>>>>>> index a08f794a0e79..17afd504c35b 100644 >>>>>>>>>> --- a/arch/x86/events/intel/core.c >>>>>>>>>> +++ b/arch/x86/events/intel/core.c >>>>>>>>>> @@ -4103,13 +4103,19 @@ static struct perf_guest_switch_msr >>>>>>>>>> *intel_guest_get_msrs(int *nr, void *data) >>>>>>>>>>              .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask, >>>>>>>>>>          }; >>>>>>>>>>      +    /* In any case, clear guest PEBS bits first. */ >>>>>>>>>> +    arr[global_ctrl].guest &= ~arr[pebs_enable].guest; >>>>>>>>>> + >>>>>>>>>>          if (arr[pebs_enable].host) { >>>>>>>>>>              /* Disable guest PEBS if host PEBS is enabled. */ >>>>>>>>>>              arr[pebs_enable].guest = 0; >>>>>>>>>>          } else { >>>>>>>>>>              /* Disable guest PEBS thoroughly for cross-mapped >>>>>>>>>> PEBS >>>>>>>>>> counters. */ >>>>>>>>>>              arr[pebs_enable].guest &= >>>>>>>>>> ~kvm_pmu->host_cross_mapped_mask; >>>>>>>>>> -        arr[global_ctrl].guest &= >>>>>>>>>> ~kvm_pmu->host_cross_mapped_mask; >>>>>>>>>> + >>>>>>>>>> +        /* Prevent any host user from enabling PEBS for profiling >>>>>>>>>> guest. */ >>>>>>>>>> +        arr[pebs_enable].guest &= (kvm_pmu->pebs_enable & >>>>>>>>>> kvm_pmu->global_ctrl); >>>>>>>>>> + >>>>>>>>>>              /* Set hw GLOBAL_CTRL bits for PEBS counter when >>>>>>>>>> it runs >>>>>>>>>> for guest */ >>>>>>>>>>              arr[global_ctrl].guest |= arr[pebs_enable].guest; >>>>>>>>>>          } >>>>>>>>>> >>>>>>>>>> base-commit: 6803fb00772cc50cd59a66bd8caaee5c84b13fcf >>>>