Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp3445724rdb; Wed, 13 Sep 2023 12:13:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFFDXmHC2ybUaIFjNX5S16PstVrUI3PthBroqiY6IKY8mSV5/H5N4GEWkiXjMp0bfgPktaR X-Received: by 2002:a05:6a00:b95:b0:68e:3bc7:3101 with SMTP id g21-20020a056a000b9500b0068e3bc73101mr3917450pfj.2.1694632393692; Wed, 13 Sep 2023 12:13:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694632393; cv=none; d=google.com; s=arc-20160816; b=eJO0riVSNLJvIYsdWAYJyCqflbuioJ8exDWm8xhUMyGL0N0QhPgpPUHRpA1O+m3pQE sPslg3pKW7uWy1jnHXEWT+hSBb8/1mrr/EQin1LZjUPbdCRAPD+qTVeNIxHAq/4OkESv 8jRjr/82a9dT7+wWdJUOtMPJxr4eAuEvepxW3H3HvSHzGHQyDaPVc2cGuv/ywEhYwBU4 so6/Ak+n1CcGQPlBEjC3lBC0Tpr5sz7iZGBV+YwBMRpJ2A/VwmZtM5t9xRxRe66EjOY8 u7DV9BiMv71UogyV2BR6cJSUUVWC+9869p5tsvs0H7+8k9HwXsoQnv3xoFkXtZKyeCTK U91A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=asTT2XBzDAvCymnsddbZfuEFjnNzqFtFMyBqPDTsn2k=; fh=5xaSDqV5uA8AelQVu639+tJztCn7WDxxRVsilyzLokY=; b=XVR2aS8YTu67vXCHOVTLN8JMaBXVuCmZk4pzqYo/HqJFWkG4fk/Du+m67LQNMdZUch PDqFB1AVjSwQ/BmZiqGHkpleBR9qmBwGueRAtygme1b6TV+srvpq+6DDu+5porOlHVzz mhGqwc3GNrVG/B5xu/7y+UUG/aBwYm6Xoga2jpAtfQEByn3tvwoMNin5Jh1ZoJynlrdx peOUMxywfnBYe/F51gQlSPF2mdSHfmxlXO0+SZafIvB8T+8Q8hTPo2/XLV+3QC4j45wl Dee6jywZecvweDQLnrJviFdqpws9x+BySoMCsHNLjwMaGqzJaE3g6L1oF7SR3Ft+/C5k ObLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b="rkGmj/1w"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id ce7-20020a056a002a0700b0068fea05ecdasi3699829pfb.182.2023.09.13.12.13.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 12:13:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b="rkGmj/1w"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 4843E8023719; Wed, 13 Sep 2023 06:46:27 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241006AbjIMNqK (ORCPT + 99 others); Wed, 13 Sep 2023 09:46:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237931AbjIMNqH (ORCPT ); Wed, 13 Sep 2023 09:46:07 -0400 Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD32619B6; Wed, 13 Sep 2023 06:46:03 -0700 (PDT) Received: by mail-lf1-x136.google.com with SMTP id 2adb3069b0e04-502e0b7875dso724820e87.0; Wed, 13 Sep 2023 06:46:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694612762; x=1695217562; darn=vger.kernel.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=asTT2XBzDAvCymnsddbZfuEFjnNzqFtFMyBqPDTsn2k=; b=rkGmj/1wQbn8lfsCNOSUf/LTSOVGRdkIL98JseDbd+Xx29NG0hnZBeZhJYf0/mU8rK QtDfFEiDFuH0fLgq+A9ZHaoPO5Z5g69FTPhYpW8LOKcGNXE7pX9jfZRS+YdPk5tpQ/7Z HXqrZCCageMrlyXzndN8M87Do8VW5jtL+Al57pclFyE69lFrMotjnjhjES9SNXNV2/aQ hwL7x0z+amoKz1t9yJbHrMHzf9pdmkDtnznHXzqxkhxu0zbAYJdiu+BQ94ys6hM3S1s2 tEvA2kGnQ1Nib0L/MjEHp3UhQ7JE5HwxsL8og0vONpv4lZFZF+I8SkIx7xAeT7DE2g8L SgSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694612762; x=1695217562; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=asTT2XBzDAvCymnsddbZfuEFjnNzqFtFMyBqPDTsn2k=; b=F/nou8Ohb2P2unE1JG1UOwSfteTFU94LYcZ0vXSi2rrL4mGwhu4mptTyntvgytPxON OVlM+arXqU276DXb1oKYPiNUgieeFSHMxNhXf/kKMEDlx4EvkMxcxw5b5UZtiIPmWHGd NXX31johJYlJrCzw+slPYowzLN2eXhB9l45xV54dNqMqpXCN8B9qr7RV2Td063Hz4DcP tEwtu+ajU8puGLYHidXFs1FD0z/oU7422dOl1PWEIs2+9EP0SfZDlz/bIhFxfP6lQNCr zm8iWRs3zOyVXsekf9k5Nyoc4eB4FzYXhZ24BG4EcN7W8B50wgST/hfK/qXA3FMGdIK/ tAVQ== X-Gm-Message-State: AOJu0YwaAfot8key9dJLwWbT9ACTQQTmN5BoaNkvfu9Hpdizstqqp6ri 0EsWH+9EcDXzV7CMr8G9sA8= X-Received: by 2002:a05:6512:3d8e:b0:500:bb99:69a9 with SMTP id k14-20020a0565123d8e00b00500bb9969a9mr2310285lfv.64.1694612761027; Wed, 13 Sep 2023 06:46:01 -0700 (PDT) Received: from [192.168.1.95] (host-176-36-0-241.b024.la.net.ua. [176.36.0.241]) by smtp.gmail.com with ESMTPSA id t26-20020a056402241a00b0052febc781bfsm715340eda.36.2023.09.13.06.45.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 06:46:00 -0700 (PDT) Message-ID: <0bef11aa061a425f276a539a47b786ec6b661987.camel@gmail.com> Subject: Re: [RFC PATCH v2] bpf: Using binary search to improve the performance of btf_find_by_name_kind From: Eduard Zingerman To: Alan Maguire , pengdonglin , Alexei Starovoitov Cc: Martin KaFai Lau , Alexei Starovoitov , Song Liu , Yonghong Song , Steven Rostedt , Masami Hiramatsu , dinghui@sangfor.com.cn, huangcun@sangfor.com.cn, bpf , LKML Date: Wed, 13 Sep 2023 16:45:58 +0300 In-Reply-To: References: <20230909091646.420163-1-pengdonglin@sangfor.com.cn> <20ef8441084c9d5fd54f84987afa77eed7fe148e.camel@gmail.com> <035ab912d7d6bd11c54c038464795da01dbed2de.camel@gmail.com> <5f8d82c3-838e-4d75-bb25-7d98a6d0a37c@sangfor.com.cn> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.4-0ubuntu2 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Wed, 13 Sep 2023 06:46:27 -0700 (PDT) X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email On Wed, 2023-09-13 at 14:34 +0100, Alan Maguire wrote: > On 13/09/2023 11:32, pengdonglin wrote: > > On 2023/9/13 2:46, Alexei Starovoitov wrote: > > > On Tue, Sep 12, 2023 at 10:03=E2=80=AFAM Eduard Zingerman > > > wrote: > > > >=20 > > > > On Tue, 2023-09-12 at 09:40 -0700, Alexei Starovoitov wrote: > > > > > On Tue, Sep 12, 2023 at 7:19=E2=80=AFAM Eduard Zingerman > > > > > wrote: > > > > > >=20 > > > > > > On Tue, 2023-09-12 at 16:51 +0300, Eduard Zingerman wrote: > > > > > > > On Sat, 2023-09-09 at 02:16 -0700, Donglin Peng wrote: > > > > > > > > Currently, we are only using the linear search method to fi= nd the > > > > > > > > type id > > > > > > > > by the name, which has a time complexity of O(n). This chan= ge > > > > > > > > involves > > > > > > > > sorting the names of btf types in ascending order and using > > > > > > > > binary search, > > > > > > > > which has a time complexity of O(log(n)). This idea was ins= pired > > > > > > > > by the > > > > > > > > following patch: > > > > > > > >=20 > > > > > > > > 60443c88f3a8 ("kallsyms: Improve the performance of > > > > > > > > kallsyms_lookup_name()"). > > > > > > > >=20 > > > > > > > > At present, this improvement is only for searching in vmlin= ux's and > > > > > > > > module's BTFs, and the kind should only be BTF_KIND_FUNC or > > > > > > > > BTF_KIND_STRUCT. > > > > > > > >=20 > > > > > > > > Another change is the search direction, where we search the= BTF > > > > > > > > first and > > > > > > > > then its base, the type id of the first matched btf_type wi= ll be > > > > > > > > returned. > > > > > > > >=20 > > > > > > > > Here is a time-consuming result that finding all the type i= ds of > > > > > > > > 67,819 kernel > > > > > > > > functions in vmlinux's BTF by their names: > > > > > > > >=20 > > > > > > > > Before: 17000 ms > > > > > > > > After:=C2=A0=C2=A0=C2=A0=C2=A0 10 ms > > > > > > > >=20 > > > > > > > > The average lookup performance has improved about 1700x at = the > > > > > > > > above scenario. > > > > > > > >=20 > > > > > > > > However, this change will consume more memory, for example, > > > > > > > > 67,819 kernel > > > > > > > > functions will allocate about 530KB memory. > > > > > > >=20 > > > > > > > Hi Donglin, > > > > > > >=20 > > > > > > > I think this is a good improvement. However, I wonder, why di= d you > > > > > > > choose to have a separate name map for each BTF kind? > > > > > > >=20 > > > > > > > I did some analysis for my local testing kernel config and go= t > > > > > > > such numbers: > > > > > > > - total number of BTF objects: 97350 > > > > > > > - number of FUNC and STRUCT objects: 51597 > > > > > > > - number of FUNC, STRUCT, UNION, ENUM, ENUM64, TYPEDEF, DATAS= EC > > > > > > > objects: 56817 > > > > > > > =C2=A0=C2=A0 (these are all kinds for which lookup by name mi= ght make sense) > > > > > > > - number of named objects: 54246 > > > > > > > - number of name collisions: > > > > > > > =C2=A0=C2=A0 - unique names: 53985 counts > > > > > > > =C2=A0=C2=A0 - 2 objects with the same name: 129 counts > > > > > > > =C2=A0=C2=A0 - 3 objects with the same name: 3 counts > > > > > > >=20 > > > > > > > So, it appears that having a single map for all named objects= makes > > > > > > > sense and would also simplify the implementation, what do you= think? > > > > > >=20 > > > > > > Some more numbers for my config: > > > > > > - 13241 types (struct, union, typedef, enum), log2 13241 =3D 13= .7 > > > > > > - 43575 funcs, log2 43575 =3D 15.4 > > > > > > Thus, having separate map for types vs functions might save ~1.= 7 > > > > > > search iterations. Is this a significant slowdown in practice? > > > > >=20 > > > > > What do you propose to do in case of duplicates ? > > > > > func and struct can have the same name, but they will have two > > > > > different > > > > > btf_ids. How do we store them ? > > > > > Also we might add global vars to BTF. Such request came up severa= l > > > > > times. > > > > > So we need to make sure our search approach scales to > > > > > func, struct, vars. I don't recall whether we search any other ki= nds. > > > > > Separate arrays for different kinds seems ok. > > > > > It's a bit of code complexity, but it's not an increase in memory= . > > > >=20 > > > > Binary search gives, say, lowest index of a thing with name A, then > > > > increment index while name remains A looking for correct kind. > > > > Given the name conflicts info from above, 99% of times there would = be > > > > no need to iterate and in very few cases there would a couple of > > > > iterations. > > > >=20 > > > > Same logic would be necessary with current approach if different BT= F > > > > kinds would be allowed in BTF_ID_NAME_* cohorts. I figured that the= se > > > > cohorts are mainly a way to split the tree for faster lookups, but > > > > maybe that is not the main intent. > > > >=20 > > > > > With 13k structs and 43k funcs it's 56k * (4 + 4) that's 0.5 Mbyt= e > > > > > extra memory. That's quite a bit. Anything we can do to compress = it? > > > >=20 > > > > That's an interesting question, from the top of my head: > > > > pre-sort in pahole (re-assign IDs so that increasing ID also would > > > > mean "increasing" name), shouldn't be that difficult. > > >=20 > > > That sounds great. kallsyms are pre-sorted at build time. > > > We should do the same with BTF. > > > I think GCC can emit BTF directly now and LLVM emits it for bpf progs > > > too, > > > but since vmlinux and kernel module BTFs will keep being processed > > > through pahole we don't have to make gcc/llvm sort things right away. > > > pahole will be enough. The kernel might do 'is it sorted' check > > > during BTF validation and then use binary search or fall back to line= ar > > > when not-sorted =3D=3D old pahole. > > >=20 > >=20 > > Yeah, I agree and will attempt to modify the pahole and perform a test. > > Do we need > > to introduce a new macro to control the behavior when the BTF is not > > sorted? If > > it is not sorted, we can use the method mentioned in this patch or use > > linear > > search. > >=20 > >=20 >=20 > One challenge with pahole is that it often runs in parallel mode, so I > suspect any sorting would have to be done after merging across threads. > Perhaps BTF deduplication time might be a useful time to re-sort by > name? BTF dedup happens after BTF has been merged, and a new "sorted" > btf_dedup_opts option could be added and controlled by a pahole > option. However dedup is pretty complicated already.. Hi Alan, libbpf might be the right place to do this, however, I think that it is also doable in pahole's btf_encoder__encode(), e.g. as follows: - after a call to btf__dedup(): - create a sorted by name IDs ordering; - create a new BTF object; - add records to the new BTF according to the sorted ordering; - remap id references while adding; - use the new BTF object instead of old one to write BTF output. I assume that implementation would be similar regardless whether it is done in pahole or in libbpf. Thanks, Eduard > One thing we should weigh up though is if there are benefits to the > way BTF is currently laid out. It tends to start with base types, > and often-encountered types end up being located towards the start > of the BTF data. For example >=20 >=20 > [1] INT 'long unsigned int' size=3D8 bits_offset=3D0 nr_bits=3D64 encodin= g=3D(none) > [2] CONST '(anon)' type_id=3D1 > [3] VOLATILE '(anon)' type_id=3D1 > [4] ARRAY '(anon)' type_id=3D1 index_type_id=3D21 nr_elems=3D2 > [5] PTR '(anon)' type_id=3D8 > [6] CONST '(anon)' type_id=3D5 > [7] INT 'char' size=3D1 bits_offset=3D0 nr_bits=3D8 encoding=3DSIGNED > [8] CONST '(anon)' type_id=3D7 > [9] INT 'unsigned int' size=3D4 bits_offset=3D0 nr_bits=3D32 encoding=3D(= none) > [10] CONST '(anon)' type_id=3D9 > [11] TYPEDEF '__s8' type_id=3D12 > [12] INT 'signed char' size=3D1 bits_offset=3D0 nr_bits=3D8 encoding=3DSI= GNED > [13] TYPEDEF '__u8' type_id=3D14 >=20 > So often-used types will be found quickly, even under linear search > conditions. >=20 > When we look at how many lookups by id (which are O(1), since they are > done via the btf->types[] array) versus by name, we see: >=20 > $ grep btf_type_by_id kernel/bpf/*.c|wc -l > 120 > $ grep btf_find_by_nam kernel/bpf/*.c|wc -l > 15 >=20 > I don't see a huge number of name-based lookups, and I think most are > outside of the hotter codepaths, unless I'm missing some. All of which > is to say it would be a good idea to have a clear sense of what will get > faster with sorted-by-name BTF. Thanks! >=20 > Alan