Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp2483767pxb; Wed, 9 Feb 2022 21:12:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJytqduchhC7tlTQhgybiLqjGAGHb5ICH1ZX+8j6QGQ2GPoW0vzjJnJpgcUTMZm72vHnBwm/ X-Received: by 2002:a05:6402:5290:: with SMTP id en16mr6469655edb.236.1644469974658; Wed, 09 Feb 2022 21:12:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644469974; cv=none; d=google.com; s=arc-20160816; b=pg4YvvBgFGYYLkAqEcc9xQ9nX3MU8DCbsX8P2m8fMbnmpfmQ/YAH0ZUMuyax7Gty33 b+Ca6BkdWVjFwM9IKsEHD8nhI+b0NbqwivJzRz4JuMbIlPjqpJhzDDJGh57tsw3qVOR5 jVKeChc3c0bkuWwRLZ5KouvpCsItCgLVYSD6GxDo4u0FOn3pw4Jvss7xfUzvl8ahTRbA 8o3Nxi0PYzMaVUZqTY4Yjv29quESaNdJWoEejThfx2uvXg/sNKJgTrtb0xJQvUVMZhl5 DZ2rEmzUzfqsiePEzQ0fTQdDcS3dWlxkVdWAP+L6AqFeYrYOZO7OJMj6WfGdOiK6wTpf Nwag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=YcvriX3uIijA90HXNZMlVreNtTU0ZQtBqeXBj3NR2oc=; b=n64OvF6Z6oQhsC9ySTuUXEyFKTcV5TbEmIhaBpvbtsOdTnKbaC3bwux+aVakgIvEr0 rEjC97aR2aFzLAiOzlAf0c/BW0EkWWo+A1JN9WXnjZaZiYaMflScTYLCLHJ2feWd4S/E WmsKVfuVJ8gdMjIJ9fcgIqSRqeU1oSzUFBQoF3nDHale8ZQAxzc+5bFLAABaRC37LN7h nj0vMOJm7Ixt70pu3Y59iBAhNyeR3GWPP9ysfeBdnSVz1SKblF8Ec4EZq/Xrv9Ui9uFx /NVAulrI4zHwPbiOqN00NeolbvtcbjAnlZ8jYMkWbMVsHGzkISOtOyzEDJgIA1F5U+1j N4eA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=alibaba-inc.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x3si9588285edr.451.2022.02.09.21.12.19; Wed, 09 Feb 2022 21:12:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=alibaba-inc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232977AbiBJElI (ORCPT + 99 others); Wed, 9 Feb 2022 23:41:08 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:60260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229489AbiBJElG (ORCPT ); Wed, 9 Feb 2022 23:41:06 -0500 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5397313E; Wed, 9 Feb 2022 20:41:05 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R871e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=wenyang@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0V42e1.5_1644467991; Received: from localhost(mailfrom:wenyang@linux.alibaba.com fp:SMTPD_---0V42e1.5_1644467991) by smtp.aliyun-inc.com(127.0.0.1); Thu, 10 Feb 2022 12:41:02 +0800 From: Wen Yang To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Thomas Gleixner Cc: Wen Yang , Mark Rutland , Jiri Olsa , Namhyung Kim , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH] perf/x86: improve the event scheduling to avoid unnecessary pmu_stop/start Date: Thu, 10 Feb 2022 12:39:30 +0800 Message-Id: <20220210043930.34311-1-simon.wy@alibaba-inc.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This issue has been there for a long time, we could reproduce it as follows: 1, run a script that periodically collects perf data, eg: while true do perf stat -e cache-misses,cache-misses,cache-misses -C 1 sleep 2 perf stat -e cache-misses -C 1 sleep 2 sleep 1 done 2, run another one to capture the IPC, eg: perf stat -e cycles:D,instructions:D -C 1 -I 1000 Then we could observe that the counter used by cycles:D changes frequently: crash> struct cpu_hw_events.n_events,assign,event_list,events ffff88bf7f44f420 n_events = 3 assign = {33, 1, 32, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} event_list = {0xffff88bf77b85000, 0xffff88b72db82000, 0xffff88b72db85800, 0xffff88ff6cfcb000, 0xffff88ff609f1800, 0xffff88ff609f1800, 0xffff88ff5f46a800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} events = {0x0, 0xffff88b72db82000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffff88b72db85800, 0xffff88bf77b85000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} crash> struct cpu_hw_events.n_events,assign,event_list,events ffff88bf7f44f420 n_events = 6 assign = {33, 3, 32, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} event_list = {0xffff88bf77b85000, 0xffff88b72db82000, 0xffff88b72db85800, 0xffff88bf46c34000, 0xffff88bf46c35000, 0xffff88bf46c30000, 0xffff88ff5f46a800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} events = {0xffff88bf46c34000, 0xffff88bf46c35000, 0xffff88bf46c30000, 0xffff88b72db82000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffff88b72db85800, 0xffff88bf77b85000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} The reason is that NMI watchdog permanently consumes one FP, so cycles can only use one GP, and its hweight is 5, but the hweight of other events (cache misses) is 4, so the counter used by cycles will be frequently taken away, resulting in unnecessary pmu_stop/start. Signed-off-by: Wen Yang Cc: Peter Zijlstra (Intel) Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Thomas Gleixner Cc: Borislav Petkov Cc: x86@kernel.org Cc: "H. Peter Anvin" Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- arch/x86/events/core.c | 40 +++++++++++++++++++++++++++++++--------- arch/x86/events/intel/uncore.c | 2 +- arch/x86/events/perf_event.h | 3 ++- 3 files changed, 34 insertions(+), 11 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index e686c5e..1a47e31 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -796,6 +796,8 @@ struct perf_sched { int max_events; int max_gp; int saved_states; + u64 cnt_mask; + u64 evt_mask; struct event_constraint **constraints; struct sched_state state; struct sched_state saved[SCHED_STATES_MAX]; @@ -805,7 +807,7 @@ struct perf_sched { * Initialize iterator that runs through all events and counters. */ static void perf_sched_init(struct perf_sched *sched, struct event_constraint **constraints, - int num, int wmin, int wmax, int gpmax) + int num, int wmin, int wmax, int gpmax, u64 cnt_mask, u64 evt_mask) { int idx; @@ -814,6 +816,8 @@ static void perf_sched_init(struct perf_sched *sched, struct event_constraint ** sched->max_weight = wmax; sched->max_gp = gpmax; sched->constraints = constraints; + sched->cnt_mask = cnt_mask; + sched->evt_mask = evt_mask; for (idx = 0; idx < num; idx++) { if (constraints[idx]->weight == wmin) @@ -822,7 +826,10 @@ static void perf_sched_init(struct perf_sched *sched, struct event_constraint ** sched->state.event = idx; /* start with min weight */ sched->state.weight = wmin; - sched->state.unassigned = num; + sched->state.unassigned = num - hweight_long(evt_mask); + + while (sched->evt_mask & BIT_ULL(sched->state.event)) + sched->state.event++; } static void perf_sched_save_state(struct perf_sched *sched) @@ -874,6 +881,9 @@ static bool __perf_sched_find_counter(struct perf_sched *sched) for_each_set_bit_from(idx, c->idxmsk, X86_PMC_IDX_MAX) { u64 mask = BIT_ULL(idx); + if (sched->cnt_mask & mask) + continue; + if (sched->state.used & mask) continue; @@ -890,6 +900,9 @@ static bool __perf_sched_find_counter(struct perf_sched *sched) if (c->flags & PERF_X86_EVENT_PAIR) mask |= mask << 1; + if (sched->cnt_mask & mask) + continue; + if (sched->state.used & mask) continue; @@ -934,7 +947,10 @@ static bool perf_sched_next_event(struct perf_sched *sched) do { /* next event */ - sched->state.event++; + do { + sched->state.event++; + } while (sched->evt_mask & BIT_ULL(sched->state.event)); + if (sched->state.event >= sched->max_events) { /* next weight */ sched->state.event = 0; @@ -954,11 +970,11 @@ static bool perf_sched_next_event(struct perf_sched *sched) * Assign a counter for each event. */ int perf_assign_events(struct event_constraint **constraints, int n, - int wmin, int wmax, int gpmax, int *assign) + int wmin, int wmax, int gpmax, u64 cnt_mask, u64 evt_mask, int *assign) { struct perf_sched sched; - perf_sched_init(&sched, constraints, n, wmin, wmax, gpmax); + perf_sched_init(&sched, constraints, n, wmin, wmax, gpmax, cnt_mask, evt_mask); do { if (!perf_sched_find_counter(&sched)) @@ -978,7 +994,8 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign) struct perf_event *e; int n0, i, wmin, wmax, unsched = 0; struct hw_perf_event *hwc; - u64 used_mask = 0; + u64 cnt_mask = 0; + u64 evt_mask = 0; /* * Compute the number of events already present; see x86_pmu_add(), @@ -1038,10 +1055,11 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign) mask |= mask << 1; /* not already used */ - if (used_mask & mask) + if (cnt_mask & mask) break; - used_mask |= mask; + cnt_mask |= mask; + evt_mask |= BIT_ULL(i); if (assign) assign[i] = hwc->idx; @@ -1075,7 +1093,11 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign) } unsched = perf_assign_events(cpuc->event_constraint, n, wmin, - wmax, gpmax, assign); + wmax, gpmax, cnt_mask, evt_mask, assign); + if (unsched) { + unsched = perf_assign_events(cpuc->event_constraint, n, wmin, + wmax, gpmax, 0, 0, assign); + } } /* diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c index e497da9..8afff7a 100644 --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -480,7 +480,7 @@ static int uncore_assign_events(struct intel_uncore_box *box, int assign[], int /* slow path */ if (i != n) ret = perf_assign_events(box->event_constraint, n, - wmin, wmax, n, assign); + wmin, wmax, n, 0, 0, assign); if (!assign || ret) { for (i = 0; i < n; i++) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 150261d..2ae1e98 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1131,7 +1131,8 @@ static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc, void x86_pmu_enable_all(int added); int perf_assign_events(struct event_constraint **constraints, int n, - int wmin, int wmax, int gpmax, int *assign); + int wmin, int wmax, int gpmax, u64 cnt_mask, u64 evt_mask, + int *assign); int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign); void x86_pmu_stop(struct perf_event *event, int flags); -- 1.8.3.1