Received: by 2002:a05:7412:251c:b0:e2:908c:2ebd with SMTP id w28csp2312104rda; Tue, 24 Oct 2023 21:17:10 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG0lhnWsJ+ncemWPVpR15vj4WggOLlm4K0NQuHNWsZ5bG8aN9XcHQtHVg1znh+clgrtIJY9 X-Received: by 2002:a05:690c:16:b0:5a7:b973:db3c with SMTP id bc22-20020a05690c001600b005a7b973db3cmr15344000ywb.34.1698207429717; Tue, 24 Oct 2023 21:17:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698207429; cv=none; d=google.com; s=arc-20160816; b=nY9qKID5xuSlB0JCn7sHp83jEM8J9wvKLpKug0IGZ4BRkmBaOHYkBEBuRlujktKsiK xm62ftdcKFhJMbd118QXwdMboJk5mNf89OzuAuWwbLtDYUz/sic/LHeq5VOA09XH88XM bKm8T74rfFHDYOW20oooqnO8gew5CTJyzzha1lKuoyB93hbuCOByRquzGjKnL6TZf0xV O2PnVjXBFXqz7Oh53SW9NtlYZSkCUWRBF8ntrUnY84PCafVns22j24g7OTQFUIUDcPUd xMu831jJxAzVVb5Dg2tGMifsJSrNJEeG2aIwl1/YyXQEALQu2Hfr5DNiWyVh1PSPlv2O iyFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=ctRYFn6QwBh2oAHRlx6TERi+j/lB4Gy7TQ6m/2T2hwo=; fh=vXy/zKLYS8CIT5S93PT42H9kmQ6ohgipR/F8r41ABBw=; b=Z1cokayB1X4qnSEiFqXR0MCzX/JeE/8iAa9LuQbIGlzswphCf/wcE/7l1tzYsAmIwk 0F4AoZO8XOVXWkueDQgpcsAhJwFpRIbcJNw5NWSpBfeQbkVZ+pLKhulpEDvzY7F8Cj0p JNn5hz56KQVWDXkprj1BsiWWb5NaH+cGzk30t3cuT/09L5szPgymuZS+os4NbZYzmN0b tgo9VspQpqIc20BGz93JwqbgEM5nBNQomgX96GB4fk/f7MjApEC0eUQXa1KDSxrI8mmM IYxSnFpLAP4y+yNAuwMxJ4z2+ak9GndczWoRJWqeiaR2+KOSHg1hcc/4z6ZAYAAdXaMB pGkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="Wwcq/2z8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id w82-20020a817b55000000b0059f64bb885bsi10095090ywc.42.2023.10.24.21.17.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Oct 2023 21:17:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="Wwcq/2z8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 8A5D28077598; Tue, 24 Oct 2023 21:15:45 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232386AbjJYEPn (ORCPT + 99 others); Wed, 25 Oct 2023 00:15:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230093AbjJYEPl (ORCPT ); Wed, 25 Oct 2023 00:15:41 -0400 Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F0D0123 for ; Tue, 24 Oct 2023 21:15:39 -0700 (PDT) Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-507a085efa7so1624e87.0 for ; Tue, 24 Oct 2023 21:15:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698207337; x=1698812137; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ctRYFn6QwBh2oAHRlx6TERi+j/lB4Gy7TQ6m/2T2hwo=; b=Wwcq/2z8vxeuWOGH5j0gxQHgg1bLRTUueejt0qRdIFDmGhEhS8znl7npY+yI0Hbyhm eEYsiPeXjZr5bWZXsJlSbDuQHq2MdEiwLvDGHTsNoZZqcxumQbVr6NhL4E6mq3YkgK+O HhZyCdkoZ8V6MrPyfCCcLRifoDyjpM0sectp/Cb1i0M69MmRCq88i4H3E1l6cADZ8ZF9 SPw39B7XJf2pG1qdaWJ5EpPh3jXhjEC/mYLSbQttEXwsv2chMbgHqfLM5AhVpHS3spxH l43a3BQUSaPGfH9ZzYp0upuu721VVo6vUw7LatjoF9idXIsZ/NHFAcrALj5EOjEwSw5k MYGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698207337; x=1698812137; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ctRYFn6QwBh2oAHRlx6TERi+j/lB4Gy7TQ6m/2T2hwo=; b=a/qKwnH99TyTQ6XIOolbotpkcBJABHHE3zGSUZhpHjQvax4f1HjQa5/xNUCy+aIzJJ i9VYIcxYxPMAR5W7FLRzfwBiVFTNsqJv8Rmq/I7/UoJAKeiTr6tkrq4ii4QOH9YR0caY OgvAcfMoeTD/wJt3Gho6wgs6B6KfMTGTBbv2AokRkGPNhccPLBXzGWy4dE35CYb3keVc w/MBWYoW7dn09XVWnn6c8u4HsAF/b9IzksyRbLhH/3BAhsm5FFze7DV1+xRwvc/R3evL A64nqfAwI4bqraVdZjatUjHFhm8rJCg2IXUeVCOXtm69QtseCHA3Vln4vbodiT4NPAAT kHtg== X-Gm-Message-State: AOJu0YwosVMR0lFZdQPIp2edrKcM1FUvSpNDFNXH6ruJekQvBV5LuWvg NbEqPk7n4C/kQFvIsogufBrzoMearmdDOpC9+VBfAw== X-Received: by 2002:a05:6512:1187:b0:505:6e12:9e70 with SMTP id g7-20020a056512118700b005056e129e70mr27744lfr.6.1698207337078; Tue, 24 Oct 2023 21:15:37 -0700 (PDT) MIME-Version: 1.0 References: <20231020204741.1869520-1-namhyung@kernel.org> <20231020204741.1869520-3-namhyung@kernel.org> In-Reply-To: <20231020204741.1869520-3-namhyung@kernel.org> From: Ian Rogers Date: Tue, 24 Oct 2023 21:15:25 -0700 Message-ID: Subject: Re: [PATCH v3 3/3] perf lock contention: Use per-cpu array map for spinlocks To: Namhyung Kim Cc: Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Peter Zijlstra , Ingo Molnar , LKML , linux-perf-users@vger.kernel.org, Song Liu , Hao Luo , bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 24 Oct 2023 21:15:45 -0700 (PDT) On Fri, Oct 20, 2023 at 1:47=E2=80=AFPM Namhyung Kim = wrote: > > Currently lock contention timestamp is maintained in a hash map keyed by > pid. That means it needs to get and release a map element (which is > proctected by spinlock!) on each contention begin and end pair. This > can impact on performance if there are a lot of contention (usually from > spinlocks). > > It used to go with task local storage but it had an issue on memory > allocation in some critical paths. Although it's addressed in recent > kernels IIUC, the tool should support old kernels too. So it cannot > simply switch to the task local storage at least for now. > > As spinlocks create lots of contention and they disabled preemption > during the spinning, it can use per-cpu array to keep the timestamp to > avoid overhead in hashmap update and delete. > > In contention_begin, it's easy to check the lock types since it can see > the flags. But contention_end cannot see it. So let's try to per-cpu > array first (unconditionally) if it has an active element (lock !=3D 0). > Then it should be used and per-task tstamp map should not be used until > the per-cpu array element is cleared which means nested spinlock > contention (if any) was finished and it nows see (the outer) lock. > > Signed-off-by: Namhyung Kim Acked-by: Ian Rogers Thanks, Ian > --- > .../perf/util/bpf_skel/lock_contention.bpf.c | 89 +++++++++++++++---- > 1 file changed, 72 insertions(+), 17 deletions(-) > > diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/= util/bpf_skel/lock_contention.bpf.c > index 69d31fd77cd0..95cd8414f6ef 100644 > --- a/tools/perf/util/bpf_skel/lock_contention.bpf.c > +++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c > @@ -42,6 +42,14 @@ struct { > __uint(max_entries, MAX_ENTRIES); > } tstamp SEC(".maps"); > > +/* maintain per-CPU timestamp at the beginning of contention */ > +struct { > + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); > + __uint(key_size, sizeof(__u32)); > + __uint(value_size, sizeof(struct tstamp_data)); > + __uint(max_entries, 1); > +} tstamp_cpu SEC(".maps"); > + > /* actual lock contention statistics */ > struct { > __uint(type, BPF_MAP_TYPE_HASH); > @@ -311,34 +319,57 @@ static inline __u32 check_lock_type(__u64 lock, __u= 32 flags) > return 0; > } > > -SEC("tp_btf/contention_begin") > -int contention_begin(u64 *ctx) > +static inline struct tstamp_data *get_tstamp_elem(__u32 flags) > { > __u32 pid; > struct tstamp_data *pelem; > > - if (!enabled || !can_record(ctx)) > - return 0; > + /* Use per-cpu array map for spinlock and rwlock */ > + if (flags =3D=3D (LCB_F_SPIN | LCB_F_READ) || flags =3D=3D LCB_F_= SPIN || > + flags =3D=3D (LCB_F_SPIN | LCB_F_WRITE)) { > + __u32 idx =3D 0; > + > + pelem =3D bpf_map_lookup_elem(&tstamp_cpu, &idx); > + /* Do not update the element for nested locks */ > + if (pelem && pelem->lock) > + pelem =3D NULL; > + return pelem; > + } > > pid =3D bpf_get_current_pid_tgid(); > pelem =3D bpf_map_lookup_elem(&tstamp, &pid); > + /* Do not update the element for nested locks */ > if (pelem && pelem->lock) > - return 0; > + return NULL; > > if (pelem =3D=3D NULL) { > struct tstamp_data zero =3D {}; > > if (bpf_map_update_elem(&tstamp, &pid, &zero, BPF_NOEXIST= ) < 0) { > __sync_fetch_and_add(&task_fail, 1); > - return 0; > + return NULL; > } > > pelem =3D bpf_map_lookup_elem(&tstamp, &pid); > if (pelem =3D=3D NULL) { > __sync_fetch_and_add(&task_fail, 1); > - return 0; > + return NULL; > } > } > + return pelem; > +} > + > +SEC("tp_btf/contention_begin") > +int contention_begin(u64 *ctx) > +{ > + struct tstamp_data *pelem; > + > + if (!enabled || !can_record(ctx)) > + return 0; > + > + pelem =3D get_tstamp_elem(ctx[1]); > + if (pelem =3D=3D NULL) > + return 0; > > pelem->timestamp =3D bpf_ktime_get_ns(); > pelem->lock =3D (__u64)ctx[0]; > @@ -377,24 +408,42 @@ int contention_begin(u64 *ctx) > SEC("tp_btf/contention_end") > int contention_end(u64 *ctx) > { > - __u32 pid; > + __u32 pid =3D 0, idx =3D 0; > struct tstamp_data *pelem; > struct contention_key key =3D {}; > struct contention_data *data; > __u64 duration; > + bool need_delete =3D false; > > if (!enabled) > return 0; > > - pid =3D bpf_get_current_pid_tgid(); > - pelem =3D bpf_map_lookup_elem(&tstamp, &pid); > - if (!pelem || pelem->lock !=3D ctx[0]) > - return 0; > + /* > + * For spinlock and rwlock, it needs to get the timestamp for the > + * per-cpu map. However, contention_end does not have the flags > + * so it cannot know whether it reads percpu or hash map. > + * > + * Try per-cpu map first and check if there's active contention. > + * If it is, do not read hash map because it cannot go to sleepin= g > + * locks before releasing the spinning locks. > + */ > + pelem =3D bpf_map_lookup_elem(&tstamp_cpu, &idx); > + if (pelem && pelem->lock) { > + if (pelem->lock !=3D ctx[0]) > + return 0; > + } else { > + pid =3D bpf_get_current_pid_tgid(); > + pelem =3D bpf_map_lookup_elem(&tstamp, &pid); > + if (!pelem || pelem->lock !=3D ctx[0]) > + return 0; > + need_delete =3D true; > + } > > duration =3D bpf_ktime_get_ns() - pelem->timestamp; > if ((__s64)duration < 0) { > pelem->lock =3D 0; > - bpf_map_delete_elem(&tstamp, &pid); > + if (need_delete) > + bpf_map_delete_elem(&tstamp, &pid); > __sync_fetch_and_add(&time_fail, 1); > return 0; > } > @@ -406,8 +455,11 @@ int contention_end(u64 *ctx) > case LOCK_AGGR_TASK: > if (lock_owner) > key.pid =3D pelem->flags; > - else > + else { > + if (!need_delete) > + pid =3D bpf_get_current_pid_tgid(); > key.pid =3D pid; > + } > if (needs_callstack) > key.stack_id =3D pelem->stack_id; > break; > @@ -428,7 +480,8 @@ int contention_end(u64 *ctx) > if (!data) { > if (data_map_full) { > pelem->lock =3D 0; > - bpf_map_delete_elem(&tstamp, &pid); > + if (need_delete) > + bpf_map_delete_elem(&tstamp, &pid); > __sync_fetch_and_add(&data_fail, 1); > return 0; > } > @@ -452,7 +505,8 @@ int contention_end(u64 *ctx) > __sync_fetch_and_add(&data_fail, 1); > } > pelem->lock =3D 0; > - bpf_map_delete_elem(&tstamp, &pid); > + if (need_delete) > + bpf_map_delete_elem(&tstamp, &pid); > return 0; > } > > @@ -466,7 +520,8 @@ int contention_end(u64 *ctx) > data->min_time =3D duration; > > pelem->lock =3D 0; > - bpf_map_delete_elem(&tstamp, &pid); > + if (need_delete) > + bpf_map_delete_elem(&tstamp, &pid); > return 0; > } > > -- > 2.42.0.655.g421f12c284-goog >