Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp36356rdf; Mon, 20 Nov 2023 15:24:30 -0800 (PST) X-Google-Smtp-Source: AGHT+IE6PWmgSotcrDjxO1/TkuPhQYSjladBkx58Ng/6FQqzyHwY8POLIw39DjeHo+xDwd1Y/L2c X-Received: by 2002:a05:6a20:394a:b0:185:66a0:4c4 with SMTP id r10-20020a056a20394a00b0018566a004c4mr12361839pzg.30.1700522670004; Mon, 20 Nov 2023 15:24:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700522669; cv=none; d=google.com; s=arc-20160816; b=04qEGbNgRR35nbtAlgwWDxy/jYDPMT+nABZD2noQtA1edUOyM1eBxM5yCZS8+WuPGT Svhtr4JlfAMkEvXWtBo29TVZ0uL966L6YCM1uFY9dnQWoFXuFmQ6+RUPSPaqFe4Uen9g GgZjahbxLpWOnK8IMkyZpR3SXQjX7AuZkGYdp3es+Mh/i2RL+2DTbjgWIqIHUXCJgVQ/ w3iK4f56IpN63KnjuSiuT2DhEkECRZr+TaA1FKLC0dG3yh2roR7ogKqIb03+SguNvxhn DS7vccJLHu+AW79DUxX6VX8kH9pUsw4c9aDZ2NNxlJWY/4jczW3jSvCunl4eD7OsvQjP 5MTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=v6LIFM3DjLDyxu9TJ4ikiMj/7WYNnBo7D6Iup3iI3Lo=; fh=E7/KJ74KDdqA1aNOho7TYEKJjm7gASIPzpBGTZ7D8/k=; b=amRePHK9C1jF7Ts2k0ilB5Koe2gM4vJhAczNv+x3Nj/Kd27cfi374NWqDXnKNCN03H RvsQuvLLMP0NUNO+HYKU1WWFlHYA4XbLSEe4lEpKUmxCFCuJN2IUypVtABy98jfeXCMP F7k0Q6Y5uPtlDlrWbL0aYAavpjVJ23HnIi2He5AQ/a39JKa5V+3iDBF84szYXS7naiN1 gpQMbGM+i1h9+tmxr6+Pb1yhsqZXJ3GSgKihDsSX39PeErMLeuJO8pYEWFMpBBPwpPTA qIDseRPhaAyjMiOwGeboGMrpoiHAVzIS3VtuD5zQGwgB0nrNSulrvaEDZLaSBl7Pcvlt nUwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=CWN3Mv00; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id k69-20020a638448000000b005bd27295abcsi8886476pgd.682.2023.11.20.15.24.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 15:24:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=CWN3Mv00; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 69F6280B122C; Mon, 20 Nov 2023 15:24:27 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229608AbjKTXYI (ORCPT + 99 others); Mon, 20 Nov 2023 18:24:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229490AbjKTXYH (ORCPT ); Mon, 20 Nov 2023 18:24:07 -0500 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E56BBBC for ; Mon, 20 Nov 2023 15:24:03 -0800 (PST) Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-6b1d1099a84so4739502b3a.1 for ; Mon, 20 Nov 2023 15:24:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700522643; x=1701127443; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=v6LIFM3DjLDyxu9TJ4ikiMj/7WYNnBo7D6Iup3iI3Lo=; b=CWN3Mv00rOzcl4Oy7sjIG0VOuzoNXrM6umjSqhDtLDzmKzQCRnHEzyieTfXCAiwfCO RYb6gJtcQGuBbGUo/te+7utGqlF/fm/v9j6hhkoxIKAPB7BxI3Y3g1NxFeAZSods0oIX XXDRKsnuKGgonvpcSVqXuG8WeSyEg0e5TcoA4DaNGZmyN+8Htc81uNZ19VXNKtlclS2O bEAf7XiiIY1ArXFvqzQb6GNoac6tpdzmfq/N+38ucLEqUVMmlB+82LEI0totmLywSZl3 vQswGfbf5Q4pUVcA47sG7NuGjl+mo5oA52AcPJPXOSgXl2fPxf4RbheMZM0eqQzS9Zke x8DA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700522643; x=1701127443; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=v6LIFM3DjLDyxu9TJ4ikiMj/7WYNnBo7D6Iup3iI3Lo=; b=HgClDhJVeTZ4zIT2t7jRHV9il8xp/gJUVSpVKsfztpEaP5pyCDiN8ncoXB/3GojxFF l98QKL177x1gP4sswdDBmoD/wjqbUly5l8epTYKCPfO7S++e/RgdLofbNPFCAz3q7Zxp duCElUuBNwcodhvlh16OWog7M9q5jYLj/ZkSlM1TWoCHCWOfGLwQO0pFsJ2KyQfeAi39 pEh4gCpJlwOpox6MMsHWrSeYArnQECg4tbodO+X1d5YY1lw+0EB3F4xA6NMybk/GpbTa mgesvbKwtxVC25XewtqZ+zJEG8wjDxfkoihHEl0Ls5RtzvoNFe6GO1VgKEd3ArP07G9r pvMA== X-Gm-Message-State: AOJu0Yzk6zxZuwsDww9IiWgtZs+gKVcjRjnb/G1neB5s8bp4QEGmRKuY qxiYHgFCJUZRdSZirNngv/R1Rw== X-Received: by 2002:a05:6a00:1ca9:b0:6cb:a434:b58f with SMTP id y41-20020a056a001ca900b006cba434b58fmr4657793pfw.33.1700522642993; Mon, 20 Nov 2023 15:24:02 -0800 (PST) Received: from google.com (60.89.247.35.bc.googleusercontent.com. [35.247.89.60]) by smtp.gmail.com with ESMTPSA id fm12-20020a056a002f8c00b006c9c0705b5csm4769179pfb.48.2023.11.20.15.24.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 15:24:02 -0800 (PST) Date: Mon, 20 Nov 2023 23:23:59 +0000 From: Mingwei Zhang To: Ian Rogers Cc: Namhyung Kim , Peter Zijlstra , Ingo Molnar , Mark Rutland , Alexander Shishkin , Arnaldo Carvalho de Melo , LKML , Kan Liang Subject: Re: [PATCH 1/3] perf/core: Update perf_adjust_freq_unthr_context() Message-ID: References: <20231120221932.213710-1-namhyung@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-7.0 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FSL_HELO_FAKE,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 20 Nov 2023 15:24:27 -0800 (PST) On Mon, Nov 20, 2023, Ian Rogers wrote: > On Mon, Nov 20, 2023 at 2:19 PM Namhyung Kim wrote: > > > > It was unnecessarily disabling and enabling PMUs for each event. It > > should be done at PMU level. Add pmu_ctx->nr_freq counter to check it > > at each PMU. As pmu context has separate active lists for pinned group > > and flexible group, factor out a new function to do the job. > > > > Another minor optimization is that it can skip PMUs w/ CAP_NO_INTERRUPT > > even if it needs to unthrottle sampling events. > > > > Signed-off-by: Namhyung Kim > > Series: > Reviewed-by: Ian Rogers > > Thanks, > Ian > Can we have "Cc: stable@vger.kernel.org" for the whole series? This series should have a great performance improvement for all VMs in which perf sampling events without specifying period. The key point behind is that disabling/enabling PMU in virtualized environment is super heavyweight which can reaches up to 50% of the CPU time, ie., When multiplxing is used in the VM, a vCPU on a pCPU can only use 50% of the resource, the other half was entirely wasted in host PMU code doing the enabling/disabling PMU. Thanks. -Mingwei > > --- > > include/linux/perf_event.h | 1 + > > kernel/events/core.c | 68 +++++++++++++++++++++++--------------- > > 2 files changed, 43 insertions(+), 26 deletions(-) > > > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > > index 0367d748fae0..3eb17dc89f5e 100644 > > --- a/include/linux/perf_event.h > > +++ b/include/linux/perf_event.h > > @@ -879,6 +879,7 @@ struct perf_event_pmu_context { > > > > unsigned int nr_events; > > unsigned int nr_cgroups; > > + unsigned int nr_freq; > > > > atomic_t refcount; /* event <-> epc */ > > struct rcu_head rcu_head; > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index 3eb26c2c6e65..53e2ad73102d 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -2275,8 +2275,10 @@ event_sched_out(struct perf_event *event, struct perf_event_context *ctx) > > > > if (!is_software_event(event)) > > cpc->active_oncpu--; > > - if (event->attr.freq && event->attr.sample_freq) > > + if (event->attr.freq && event->attr.sample_freq) { > > ctx->nr_freq--; > > + epc->nr_freq--; > > + } > > if (event->attr.exclusive || !cpc->active_oncpu) > > cpc->exclusive = 0; > > > > @@ -2531,9 +2533,10 @@ event_sched_in(struct perf_event *event, struct perf_event_context *ctx) > > > > if (!is_software_event(event)) > > cpc->active_oncpu++; > > - if (event->attr.freq && event->attr.sample_freq) > > + if (event->attr.freq && event->attr.sample_freq) { > > ctx->nr_freq++; > > - > > + epc->nr_freq++; > > + } > > if (event->attr.exclusive) > > cpc->exclusive = 1; > > > > @@ -4096,30 +4099,14 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo > > } > > } > > > > -/* > > - * combine freq adjustment with unthrottling to avoid two passes over the > > - * events. At the same time, make sure, having freq events does not change > > - * the rate of unthrottling as that would introduce bias. > > - */ > > -static void > > -perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > > +static void perf_adjust_freq_unthr_events(struct list_head *event_list) > > { > > struct perf_event *event; > > struct hw_perf_event *hwc; > > u64 now, period = TICK_NSEC; > > s64 delta; > > > > - /* > > - * only need to iterate over all events iff: > > - * - context have events in frequency mode (needs freq adjust) > > - * - there are events to unthrottle on this cpu > > - */ > > - if (!(ctx->nr_freq || unthrottle)) > > - return; > > - > > - raw_spin_lock(&ctx->lock); > > - > > - list_for_each_entry_rcu(event, &ctx->event_list, event_entry) { > > + list_for_each_entry(event, event_list, active_list) { > > if (event->state != PERF_EVENT_STATE_ACTIVE) > > continue; > > > > @@ -4127,8 +4114,6 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > > if (!event_filter_match(event)) > > continue; > > > > - perf_pmu_disable(event->pmu); > > - > > hwc = &event->hw; > > > > if (hwc->interrupts == MAX_INTERRUPTS) { > > @@ -4138,7 +4123,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > > } > > > > if (!event->attr.freq || !event->attr.sample_freq) > > - goto next; > > + continue; > > > > /* > > * stop the event and update event->count > > @@ -4160,8 +4145,39 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > > perf_adjust_period(event, period, delta, false); > > > > event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0); > > - next: > > - perf_pmu_enable(event->pmu); > > + } > > +} > > + > > +/* > > + * combine freq adjustment with unthrottling to avoid two passes over the > > + * events. At the same time, make sure, having freq events does not change > > + * the rate of unthrottling as that would introduce bias. > > + */ > > +static void > > +perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > > +{ > > + struct perf_event_pmu_context *pmu_ctx; > > + > > + /* > > + * only need to iterate over all events iff: > > + * - context have events in frequency mode (needs freq adjust) > > + * - there are events to unthrottle on this cpu > > + */ > > + if (!(ctx->nr_freq || unthrottle)) > > + return; > > + > > + raw_spin_lock(&ctx->lock); > > + > > + list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) { > > + if (!(pmu_ctx->nr_freq || unthrottle)) > > + continue; > > + if (pmu_ctx->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) > > + continue; > > + > > + perf_pmu_disable(pmu_ctx->pmu); > > + perf_adjust_freq_unthr_events(&pmu_ctx->pinned_active); > > + perf_adjust_freq_unthr_events(&pmu_ctx->flexible_active); > > + perf_pmu_enable(pmu_ctx->pmu); > > } > > > > raw_spin_unlock(&ctx->lock); > > -- > > 2.43.0.rc1.413.gea7ed67945-goog > >