Received: by 2002:a05:6359:6284:b0:131:369:b2a3 with SMTP id se4csp5381276rwb; Wed, 9 Aug 2023 03:22:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFftpt0Kjlb3iS8vFZajln1yjSjYUL6vaiOFaDQEXEQvJQveiUjHbui65VlC6zti/Wu0EJK X-Received: by 2002:a05:6358:3a18:b0:134:d78f:67bc with SMTP id g24-20020a0563583a1800b00134d78f67bcmr2197136rwe.14.1691576560512; Wed, 09 Aug 2023 03:22:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691576560; cv=none; d=google.com; s=arc-20160816; b=yfdihWfIsfZDnB4f1UENqElGm/GOaZcOTILggB4zjz0kC2aUC7TshkXNjXcbvT3z4h SelRV27KBjRA7MmhfW22LhMrzGUtK1HQv7DKG6dcP2+t4ZtgFmvDCe9RwIwehQ/aNoLy y8+8sltlIrXuzvKbWSjfuftFlciHpooDN2B5u3d70PgWQw/72WrPtSeTl584UNpOo5Ss TLhR53pZYqXXK/fQT8l4Fl60mxkCM+XKzqkMOEokl1DCZckOme8lBqyQNG1lXfPaHpr5 UdomTbksKWf9DV7XyC9V0OzfbplYhXtqXYkel7INRZLxVUt6uuv5a7evOFKfwaiJe6Uu rzPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:in-reply-to :subject:cc:to:from:message-id:date:dkim-signature; bh=hlvJLLQ/U4d4HVzTajbBbHDUE0ihBy3Y+MyLgc0/VsE=; fh=Uu7fJl7XLLFrY9Lc87vyDK2rGBtm463FpFJmGzkSOVQ=; b=kwL/NmmzU1Mi/QcqOBSFXXxstkWYHz8wzch4xWGsKZnpMdIO0wVyrbPD0Bfo7U7Uu9 whwtNzuQQOdCkCXUoeZ55KnphCpUBpStfMcnsiI3qFJ/jndjvQsyniEsNHNfXpDm+364 9IbDBqMyryCSVbs5u7NojCmRgLYKZmMY587PC4QeL4vw00Qcgzkmvn0fBDCzUdIQoNC9 RYy09IpTe6HbtqeSGbGwEhaqDoVYDqr0MpbKXHkUlSkrucQSBLYaGLKVi8Ryjli/kyPy qh8PiGd0KYrYmdKUVlWt8S62/MgXB58hgN09U3Nng3R7pbnmQtgvfTXOFYVu+grAaDGN YKvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=NjGqEFNN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h17-20020a633851000000b00564233887a3si8551107pgn.528.2023.08.09.03.22.28; Wed, 09 Aug 2023 03:22:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=NjGqEFNN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230339AbjHIIsk (ORCPT + 99 others); Wed, 9 Aug 2023 04:48:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229886AbjHIIse (ORCPT ); Wed, 9 Aug 2023 04:48:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4E6D1BF7; Wed, 9 Aug 2023 01:48:33 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 71D5A6306B; Wed, 9 Aug 2023 08:48:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B132AC433C7; Wed, 9 Aug 2023 08:48:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1691570912; bh=ojkU8AFfZNCCP9rY6hwjRyQ/V8po/Qa/5foMGY7Gd+w=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=NjGqEFNNVBMFNto9YJHQdpY6WLBPWmrhhAVjoie/I9dMzGm2JsB7zEsr1YY24zuuT xq6i38ti/9tBZjKqVi3DP6pUQUgdFT/p/ZSSV2N7DE689f+V+nKzfHyztDRU0420YY 3ooIXXED5obvLX7TjH7ntN2zAbpylMCyrb3L0vrdwqCcscejWRf7vydMxf0c6kjEub J+hLyD1GiSGHefpIc/MVs8X6kw9+cxbImXLwS5XLR+Q+eGHYYwfTOKrHH/lQjYrzKe fUYPm532bonM3EeEbdt4J9+14iPIbTxz+pQeqytjRO/g5Yg8P7NDwBV/IsVgXchM7Y CoaYWNjDejuAw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1qTerm-003QuH-8W; Wed, 09 Aug 2023 09:48:30 +0100 Date: Wed, 09 Aug 2023 09:48:29 +0100 Message-ID: <864jl8ha8y.wl-maz@kernel.org> From: Marc Zyngier To: Huang Shijie Cc: oliver.upton@linux.dev, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, pbonzini@redhat.com, peterz@infradead.org, ingo@redhat.com, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, irogers@google.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, patches@amperecomputing.com, zwang@amperecomputing.com Subject: Re: [PATCH] perf/core: fix the bug in the event multiplexing In-Reply-To: <20230809013953.7692-1-shijie@os.amperecomputing.com> References: <20230809013953.7692-1-shijie@os.amperecomputing.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: shijie@os.amperecomputing.com, oliver.upton@linux.dev, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, pbonzini@redhat.com, peterz@infradead.org, ingo@redhat.com, acme@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, irogers@google.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, patches@amperecomputing.com, zwang@amperecomputing.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 09 Aug 2023 02:39:53 +0100, Huang Shijie wrote: For a start, please provide a sensible subject line for your patch. "fix the bug" is not exactly descriptive, and I'd argue that if there was only one bug left, I'd have taken an early retirement by now. > > 1.) Background. > 1.1) In arm64, run a virtual guest with Qemu, and bind the guest Is that with QEMU in system emulation mode? Or QEMU as a VMM for KVM? > to core 33 and run program "a" in guest. Is core 33 significant? Is the program itself significant? > The code of "a" shows below: > ---------------------------------------------------------- > #include > > int main() > { > unsigned long i = 0; > > for (;;) { > i++; > } > > printf("i:%ld\n", i); > return 0; > } > ---------------------------------------------------------- > > 1.2) Use the following perf command in host: > #perf stat -e cycles:G,cycles:H -C 33 -I 1000 sleep 1 > # time counts unit events > 1.000817400 3,299,471,572 cycles:G > 1.000817400 3,240,586 cycles:H > > This result is correct, my cpu's frequency is 3.3G. > > 1.3) Use the following perf command in host: > #perf stat -e cycles:G,cycles:H -C 33 -d -d -I 1000 sleep 1 > time counts unit events > 1.000831480 153,634,097 cycles:G (70.03%) > 1.000831480 3,147,940,599 cycles:H (70.03%) > 1.000831480 1,143,598,527 L1-dcache-loads (70.03%) > 1.000831480 9,986 L1-dcache-load-misses # 0.00% of all L1-dcache accesses (70.03%) > 1.000831480 LLC-loads > 1.000831480 LLC-load-misses > 1.000831480 580,887,696 L1-icache-loads (70.03%) > 1.000831480 77,855 L1-icache-load-misses # 0.01% of all L1-icache accesses (70.03%) > 1.000831480 6,112,224,612 dTLB-loads (70.03%) > 1.000831480 16,222 dTLB-load-misses # 0.00% of all dTLB cache accesses (69.94%) > 1.000831480 590,015,996 iTLB-loads (59.95%) > 1.000831480 505 iTLB-load-misses # 0.00% of all iTLB cache accesses (59.95%) > > This result is wrong. The "cycle:G" should be nearly 3.3G. > > 2.) Root cause. > There is only 7 counters in my arm64 platform: > (one cycle counter) + (6 normal counters) > > In 1.3 above, we will use 10 event counters. > Since we only have 7 counters, the perf core will trigger > event multiplexing in hrtimer: > merge_sched_in() -->perf_mux_hrtimer_restart() --> > perf_rotate_context(). > > In the perf_rotate_context(), it does not restore some PMU registers > as context_switch() does. In context_switch(): > kvm_sched_in() --> kvm_vcpu_pmu_restore_guest() > kvm_sched_out() --> kvm_vcpu_pmu_restore_host() > > So we got wrong result. > > 3.) About this patch. > 3.1) Add arch_perf_rotate_pmu_set() > 3.2) Add is_guest(). > Check the context for hrtimer. > 3.3) In arm64's arch_perf_rotate_pmu_set(), > set the PMU registers by the context. > > 4.) Test result of this patch: > #perf stat -e cycles:G,cycles:H -C 33 -d -d -I 1000 sleep 1 > time counts unit events > 1.000817360 3,297,898,244 cycles:G (70.03%) > 1.000817360 2,719,941 cycles:H (70.03%) > 1.000817360 883,764 L1-dcache-loads (70.03%) > 1.000817360 17,517 L1-dcache-load-misses # 1.98% of all L1-dcache accesses (70.03%) > 1.000817360 LLC-loads > 1.000817360 LLC-load-misses > 1.000817360 1,033,816 L1-icache-loads (70.03%) > 1.000817360 103,839 L1-icache-load-misses # 10.04% of all L1-icache accesses (70.03%) > 1.000817360 982,401 dTLB-loads (70.03%) > 1.000817360 28,272 dTLB-load-misses # 2.88% of all dTLB cache accesses (69.94%) > 1.000817360 972,072 iTLB-loads (59.95%) > 1.000817360 772 iTLB-load-misses # 0.08% of all iTLB cache accesses (59.95%) > > The result is correct. The "cycle:G" is nearly 3.3G now. > > Signed-off-by: Huang Shijie > --- > arch/arm64/kvm/pmu.c | 8 ++++++++ > include/linux/kvm_host.h | 1 + > kernel/events/core.c | 5 +++++ > virt/kvm/kvm_main.c | 9 +++++++++ > 4 files changed, 23 insertions(+) > > diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c > index 121f1a14c829..a6815c3f0c4e 100644 > --- a/arch/arm64/kvm/pmu.c > +++ b/arch/arm64/kvm/pmu.c > @@ -210,6 +210,14 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) > kvm_vcpu_pmu_disable_el0(events_guest); > } > > +void arch_perf_rotate_pmu_set(void) > +{ > + if (is_guest()) > + kvm_vcpu_pmu_restore_guest(NULL); > + else > + kvm_vcpu_pmu_restore_host(NULL); > +} So we're now randomly poking at the counters even when no guest is running, based on whatever is stashed in internal KVM data structures? I'm sure this is going to work really well. Hint: even if these functions don't directly look at the vcpu pointer, passing NULL is a really bad idea. It is a sure sign that you don't have the context on which to perform the action you're trying to do. This really shouldn't do *anything* when the rotation process is not preempting a guest. > + > /* > * With VHE, keep track of the PMUSERENR_EL0 value for the host EL0 on the pCPU > * where PMUSERENR_EL0 for the guest is loaded, since PMUSERENR_EL0 is switched > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 9d3ac7720da9..e350cbc8190f 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -931,6 +931,7 @@ void kvm_destroy_vcpus(struct kvm *kvm); > > void vcpu_load(struct kvm_vcpu *vcpu); > void vcpu_put(struct kvm_vcpu *vcpu); > +bool is_guest(void); Why do we need this (not to mention the poor choice of name)? We already have kvm_get_running_vcpu(), which does everything you need (and gives you the actual context). > > #ifdef __KVM_HAVE_IOAPIC > void kvm_arch_post_irq_ack_notifier_list_update(struct kvm *kvm); > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 6fd9272eec6e..fe78f9d17eba 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -4229,6 +4229,10 @@ ctx_event_to_rotate(struct perf_event_pmu_context *pmu_ctx) > return event; > } > > +void __weak arch_perf_rotate_pmu_set(void) > +{ > +} > + > static bool perf_rotate_context(struct perf_cpu_pmu_context *cpc) > { > struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); > @@ -4282,6 +4286,7 @@ static bool perf_rotate_context(struct perf_cpu_pmu_context *cpc) > if (task_event || (task_epc && cpu_event)) > __pmu_ctx_sched_in(task_epc->ctx, pmu); > > + arch_perf_rotate_pmu_set(); KVM already supports hooking into the perf core using the perf_guest_info_callbacks structure. Why should we need a separate mechanism? > perf_pmu_enable(pmu); > perf_ctx_unlock(cpuctx, cpuctx->task_ctx); > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index dfbaafbe3a00..a77d336552be 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -218,6 +218,15 @@ void vcpu_load(struct kvm_vcpu *vcpu) > } > EXPORT_SYMBOL_GPL(vcpu_load); > > +/* Do we in the guest? */ > +bool is_guest(void) > +{ > + struct kvm_vcpu *vcpu; > + > + vcpu = __this_cpu_read(kvm_running_vcpu); > + return !!vcpu; > +} > + > void vcpu_put(struct kvm_vcpu *vcpu) > { > preempt_disable(); It looks like you've identified an actual issue. However, I'm highly sceptical of the implementation. This really needs some more work. Another question is how the same thing is handled on x86? Maybe they don't suffer from this problem thanks to specific architectural features, but it'd be good to find out, as this may guide the implementation in a different way. Thanks, M. -- Without deviation from the norm, progress is not possible.