Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp3872606rdb; Thu, 14 Sep 2023 05:31:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IElsSv5WKcgN/px1B6EiZytF1Y0b0Ks6A1F6bocaikgaT7kmysbSvaaq9mC04fLHWb+/Ogv X-Received: by 2002:a17:90b:1003:b0:26f:4685:5b69 with SMTP id gm3-20020a17090b100300b0026f46855b69mr4847304pjb.7.1694694718346; Thu, 14 Sep 2023 05:31:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694694718; cv=none; d=google.com; s=arc-20160816; b=dobMeU9kvyfa76K4hP2Zr90OFs5EclMu0EKnMSLtukugHJFKwazMlu7KoXzm731bJU Z/gGefjFqjuR2NzE26rNyL5dRWQnhlzRvaX18uZKKtRacrQT9LW3udxfHWcvBA/JIxpn +N5xxRZe1V096ofR6MIO6lriwafZB+lYFD2lJOa6sF31yVbZvFMXovzJrODt9xSSiSDr 1PqRZEx+a5FJ678cuPjRGdjLT2B0C0WbiZIJci5OBrXlwBMxQ4Tc368bdDB+IHLzL4PT Q01UpDomnQFopuxOCMXzhXZPRt8O1BjQH2YlYrYV+nuQLf75jqNRo0w6DLZM5zVTiM1G 78kQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=DhS4f4GMagoJSnz/1ZWSvL5CNP8VqY6OCaQuP1mf864=; fh=sm1/fmu9q4WvK1+26074x6u3a7SWVuXS5XjEUG6X+CI=; b=oCK5px9uy4lXpdu49R6MOn7BjM40e49nflWVhM9GkTa//9NnjdkQglnLV+L5qs7Vp3 NdKtMYfAgWb/CelzLzyFbYqqaQsIaonbl9d00sIImPxY3wj3s+Hx0vte91GPwLUkXDcd GP+Oy4BSQwvnKKMPe4akBFr3mcO5a/JBeXhb17Um1kww9fU8J31PVdTQgtqklj/Ly5hw a90+p7tI0+d2nDwoC29uj9t85hJV4wYZ3YIXHFEWxHnWqen4PuTuURuwvKgOqNnT2CAt laLgJwIJ49QYWz0+DlnE9a6bg3aw9tFEbkR0EnJ2IvCggbx0wnQIqBBl0XuEe8ol190D CLnQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sangfor.com.cn Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id u1-20020a170902bf4100b001b89613687bsi1510618pls.439.2023.09.14.05.31.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 05:31:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sangfor.com.cn Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 6BA218060CAE; Thu, 14 Sep 2023 05:21:25 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237341AbjINMVN (ORCPT + 99 others); Thu, 14 Sep 2023 08:21:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234522AbjINMVN (ORCPT ); Thu, 14 Sep 2023 08:21:13 -0400 X-Greylist: delayed 594 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Thu, 14 Sep 2023 05:21:08 PDT Received: from mail-m11877.qiye.163.com (mail-m11877.qiye.163.com [115.236.118.77]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 325921BF0; Thu, 14 Sep 2023 05:21:07 -0700 (PDT) Received: from [192.168.0.100] (unknown [113.87.233.110]) by mail-m12769.qiye.163.com (Hmail) with ESMTPA id E69B42803EC; Thu, 14 Sep 2023 20:10:28 +0800 (CST) Message-ID: <11c01e2e-5e6a-442a-868e-b55f73ebcd7e@sangfor.com.cn> Date: Thu, 14 Sep 2023 20:10:28 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2] bpf: Using binary search to improve the performance of btf_find_by_name_kind To: Eduard Zingerman , Alan Maguire , Alexei Starovoitov Cc: Martin KaFai Lau , Alexei Starovoitov , Song Liu , Yonghong Song , Steven Rostedt , Masami Hiramatsu , dinghui@sangfor.com.cn, huangcun@sangfor.com.cn, bpf , LKML References: <20230909091646.420163-1-pengdonglin@sangfor.com.cn> <20ef8441084c9d5fd54f84987afa77eed7fe148e.camel@gmail.com> <035ab912d7d6bd11c54c038464795da01dbed2de.camel@gmail.com> <5f8d82c3-838e-4d75-bb25-7d98a6d0a37c@sangfor.com.cn> <0bef11aa061a425f276a539a47b786ec6b661987.camel@gmail.com> From: pengdonglin In-Reply-To: <0bef11aa061a425f276a539a47b786ec6b661987.camel@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFITzdXWS1ZQUlXWQ8JGhUIEh9ZQVlCGUpMVkgZGh1MSRhMGk4dTVUTARMWGhIXJBQOD1 lXWRgSC1lBWUpKSFVDTFVJSEhVSkpLWVdZFhoPEhUdFFlBWU9LSFVKSktISkNVSktLVUtZBg++ X-HM-Tid: 0a8a9399908db243kuuue69b42803ec X-HM-MType: 1 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6Mj46Pxw4Tj1KMwxWDwJPNxMT Gj4aCT9VSlVKTUJPTUJIT0lCQklJVTMWGhIXVQseFRwfFBUcFxIVOwgaFRwdFAlVGBQWVRgVRVlX WRILWUFZSkpIVUNMVUlISFVKSktZV1kIAVlBSkpKTUk3Bg++ Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 14 Sep 2023 05:21:25 -0700 (PDT) On 2023/9/13 21:45, Eduard Zingerman wrote: > On Wed, 2023-09-13 at 14:34 +0100, Alan Maguire wrote: >> On 13/09/2023 11:32, pengdonglin wrote: >>> On 2023/9/13 2:46, Alexei Starovoitov wrote: >>>> On Tue, Sep 12, 2023 at 10:03 AM Eduard Zingerman >>>> wrote: >>>>> >>>>> On Tue, 2023-09-12 at 09:40 -0700, Alexei Starovoitov wrote: >>>>>> On Tue, Sep 12, 2023 at 7:19 AM Eduard Zingerman >>>>>> wrote: >>>>>>> >>>>>>> On Tue, 2023-09-12 at 16:51 +0300, Eduard Zingerman wrote: >>>>>>>> On Sat, 2023-09-09 at 02:16 -0700, Donglin Peng wrote: >>>>>>>>> Currently, we are only using the linear search method to find the >>>>>>>>> type id >>>>>>>>> by the name, which has a time complexity of O(n). This change >>>>>>>>> involves >>>>>>>>> sorting the names of btf types in ascending order and using >>>>>>>>> binary search, >>>>>>>>> which has a time complexity of O(log(n)). This idea was inspired >>>>>>>>> by the >>>>>>>>> following patch: >>>>>>>>> >>>>>>>>> 60443c88f3a8 ("kallsyms: Improve the performance of >>>>>>>>> kallsyms_lookup_name()"). >>>>>>>>> >>>>>>>>> At present, this improvement is only for searching in vmlinux's and >>>>>>>>> module's BTFs, and the kind should only be BTF_KIND_FUNC or >>>>>>>>> BTF_KIND_STRUCT. >>>>>>>>> >>>>>>>>> Another change is the search direction, where we search the BTF >>>>>>>>> first and >>>>>>>>> then its base, the type id of the first matched btf_type will be >>>>>>>>> returned. >>>>>>>>> >>>>>>>>> Here is a time-consuming result that finding all the type ids of >>>>>>>>> 67,819 kernel >>>>>>>>> functions in vmlinux's BTF by their names: >>>>>>>>> >>>>>>>>> Before: 17000 ms >>>>>>>>> After:     10 ms >>>>>>>>> >>>>>>>>> The average lookup performance has improved about 1700x at the >>>>>>>>> above scenario. >>>>>>>>> >>>>>>>>> However, this change will consume more memory, for example, >>>>>>>>> 67,819 kernel >>>>>>>>> functions will allocate about 530KB memory. >>>>>>>> >>>>>>>> Hi Donglin, >>>>>>>> >>>>>>>> I think this is a good improvement. However, I wonder, why did you >>>>>>>> choose to have a separate name map for each BTF kind? >>>>>>>> >>>>>>>> I did some analysis for my local testing kernel config and got >>>>>>>> such numbers: >>>>>>>> - total number of BTF objects: 97350 >>>>>>>> - number of FUNC and STRUCT objects: 51597 >>>>>>>> - number of FUNC, STRUCT, UNION, ENUM, ENUM64, TYPEDEF, DATASEC >>>>>>>> objects: 56817 >>>>>>>>    (these are all kinds for which lookup by name might make sense) >>>>>>>> - number of named objects: 54246 >>>>>>>> - number of name collisions: >>>>>>>>    - unique names: 53985 counts >>>>>>>>    - 2 objects with the same name: 129 counts >>>>>>>>    - 3 objects with the same name: 3 counts >>>>>>>> >>>>>>>> So, it appears that having a single map for all named objects makes >>>>>>>> sense and would also simplify the implementation, what do you think? >>>>>>> >>>>>>> Some more numbers for my config: >>>>>>> - 13241 types (struct, union, typedef, enum), log2 13241 = 13.7 >>>>>>> - 43575 funcs, log2 43575 = 15.4 >>>>>>> Thus, having separate map for types vs functions might save ~1.7 >>>>>>> search iterations. Is this a significant slowdown in practice? >>>>>> >>>>>> What do you propose to do in case of duplicates ? >>>>>> func and struct can have the same name, but they will have two >>>>>> different >>>>>> btf_ids. How do we store them ? >>>>>> Also we might add global vars to BTF. Such request came up several >>>>>> times. >>>>>> So we need to make sure our search approach scales to >>>>>> func, struct, vars. I don't recall whether we search any other kinds. >>>>>> Separate arrays for different kinds seems ok. >>>>>> It's a bit of code complexity, but it's not an increase in memory. >>>>> >>>>> Binary search gives, say, lowest index of a thing with name A, then >>>>> increment index while name remains A looking for correct kind. >>>>> Given the name conflicts info from above, 99% of times there would be >>>>> no need to iterate and in very few cases there would a couple of >>>>> iterations. >>>>> >>>>> Same logic would be necessary with current approach if different BTF >>>>> kinds would be allowed in BTF_ID_NAME_* cohorts. I figured that these >>>>> cohorts are mainly a way to split the tree for faster lookups, but >>>>> maybe that is not the main intent. >>>>> >>>>>> With 13k structs and 43k funcs it's 56k * (4 + 4) that's 0.5 Mbyte >>>>>> extra memory. That's quite a bit. Anything we can do to compress it? >>>>> >>>>> That's an interesting question, from the top of my head: >>>>> pre-sort in pahole (re-assign IDs so that increasing ID also would >>>>> mean "increasing" name), shouldn't be that difficult. >>>> >>>> That sounds great. kallsyms are pre-sorted at build time. >>>> We should do the same with BTF. >>>> I think GCC can emit BTF directly now and LLVM emits it for bpf progs >>>> too, >>>> but since vmlinux and kernel module BTFs will keep being processed >>>> through pahole we don't have to make gcc/llvm sort things right away. >>>> pahole will be enough. The kernel might do 'is it sorted' check >>>> during BTF validation and then use binary search or fall back to linear >>>> when not-sorted == old pahole. >>>> >>> >>> Yeah, I agree and will attempt to modify the pahole and perform a test. >>> Do we need >>> to introduce a new macro to control the behavior when the BTF is not >>> sorted? If >>> it is not sorted, we can use the method mentioned in this patch or use >>> linear >>> search. >>> >>> >> >> One challenge with pahole is that it often runs in parallel mode, so I >> suspect any sorting would have to be done after merging across threads. >> Perhaps BTF deduplication time might be a useful time to re-sort by >> name? BTF dedup happens after BTF has been merged, and a new "sorted" >> btf_dedup_opts option could be added and controlled by a pahole >> option. However dedup is pretty complicated already.. > > Hi Alan, > > libbpf might be the right place to do this, however, I think that it is > also doable in pahole's btf_encoder__encode(), e.g. as follows: > - after a call to btf__dedup(): > - create a sorted by name IDs ordering; > - create a new BTF object; > - add records to the new BTF according to the sorted ordering; > - remap id references while adding; > - use the new BTF object instead of old one to write BTF output. > > I assume that implementation would be similar regardless whether it is > done in pahole or in libbpf. Sounds good ;). We can place named objects at the start and may also need a variable to keep trace of the number. > > Thanks, > Eduard > >> One thing we should weigh up though is if there are benefits to the >> way BTF is currently laid out. It tends to start with base types, >> and often-encountered types end up being located towards the start >> of the BTF data. For example >> >> >> [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none) >> [2] CONST '(anon)' type_id=1 >> [3] VOLATILE '(anon)' type_id=1 >> [4] ARRAY '(anon)' type_id=1 index_type_id=21 nr_elems=2 >> [5] PTR '(anon)' type_id=8 >> [6] CONST '(anon)' type_id=5 >> [7] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED >> [8] CONST '(anon)' type_id=7 >> [9] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none) >> [10] CONST '(anon)' type_id=9 >> [11] TYPEDEF '__s8' type_id=12 >> [12] INT 'signed char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED >> [13] TYPEDEF '__u8' type_id=14 >> >> So often-used types will be found quickly, even under linear search >> conditions. >> >> When we look at how many lookups by id (which are O(1), since they are >> done via the btf->types[] array) versus by name, we see: >> >> $ grep btf_type_by_id kernel/bpf/*.c|wc -l >> 120 >> $ grep btf_find_by_nam kernel/bpf/*.c|wc -l >> 15 >> >> I don't see a huge number of name-based lookups, and I think most are >> outside of the hotter codepaths, unless I'm missing some. All of which >> is to say it would be a good idea to have a clear sense of what will get >> faster with sorted-by-name BTF. Thanks! >> >> Alan >