Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp3057198rdb; Wed, 13 Sep 2023 00:00:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH5j4S4ZbjuF4zIUdTC+F8tVpq/XfP5SbcOkBzljrLFkOpP7DYl3Kzl58K/kuozf5o82JKh X-Received: by 2002:a17:90a:8b0d:b0:269:621e:a673 with SMTP id y13-20020a17090a8b0d00b00269621ea673mr1426835pjn.1.1694588427230; Wed, 13 Sep 2023 00:00:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694588427; cv=none; d=google.com; s=arc-20160816; b=tCbNCuVK4EiepFD8ETaS/Dqeay3GrDmGBPWNsc1anLYfL9OE3frPsLx29NwN1Wteea FMmHtY1okcEKHDn/DNuqy9Hhb6s6YnuSoK9Y9Xz5ra/e1xSDJRK1TlTM+EJCfqEG76/7 MrzcE3q/3zb8tHlQt2Jy8rYGDbEmd1zqanhwfZ7mL94ZX9kAKXgo2wmCJfsINGIW/m7E IkHyUQ7FXIOXewyXbvkiN5BBfWOJJu7Uks+TTez89pgK3AgP2selAnafCz3vCi03ldU3 1ZTx2BNw7hoTBKLy/QD8rcVUoOkc7l2tKG1Ifb6S25mPEZNYLfTiy/pCOhcWzO0TCjbr 53aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=IkHo4j2Je41GsmDMW8omOLUe6wtjQHu4mDM0qIbuMqs=; fh=c3UaASw6aCBPxaklyTrdF/dO9yo87Lb/mnamDaLMz1w=; b=vwtH1UjYwOC3HiNz/564CzkJXaWdg4IiUN9pxQNwq2IZllzLDmI4fehNrvSfF8AY2+ zLZqIIk8ndQmjLtNVL3kQ4315ukcrKv2ne1tUknrOCPp074Y2YGImdHvchM1xiaJ8r2E vjIzEasf3KaO9gmt4DFFXzqxmTsWlXDlnGs4hQcsQThvHsAOlVQ8j6a3ZXMvSqU608Qf d/Yqu/reI7NWgvk9qsPXrWhCVp9LFDoD9YVzazjvvKQO5ALlZx8tFJXD+22t/drCArHI VcgU0kq06NaSetczI+FjpsKXuzFdlsty49XS6ExfdPTJf81fUdyLbjib8fJpkaC3o1oU b/Jg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=SedgNbGJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id e1-20020a17090ab38100b0026b0c3f1506si935007pjr.155.2023.09.13.00.00.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 00:00:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=SedgNbGJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 8BF518082A4D; Tue, 12 Sep 2023 09:40:40 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236981AbjILQkh (ORCPT + 99 others); Tue, 12 Sep 2023 12:40:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233132AbjILQkg (ORCPT ); Tue, 12 Sep 2023 12:40:36 -0400 Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B246E115; Tue, 12 Sep 2023 09:40:32 -0700 (PDT) Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2bcb89b4767so96476011fa.3; Tue, 12 Sep 2023 09:40:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694536831; x=1695141631; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=IkHo4j2Je41GsmDMW8omOLUe6wtjQHu4mDM0qIbuMqs=; b=SedgNbGJUdpouI6Fp0Ew6HIzOhRpclZ1t536iJrJ5p/027mIgoS46IDyhwxxRxpCiQ 05uI34EFsmLuUkumngG1PHfQZhzqsuMMLnqPRMq/5eBy79dcwX4x+W6GsDgs4fyrjqUT g0WgcM5oqqwLbOrOSxdzhnMdbQP3tXwloUA/9WAWu2UcoheNC5+NgQgL72Eon9JPy4za 0EQ/bH9YIlYwGpW0q9MrVEVES89zgqhqzCKu6E49bmCPYxHEaAipPRPpg8RDhaVo78lA dbJnOjruDuifdMB6qcWeUrTaOGmFFpjm3Z9yIAlv/TceOYg+A8Z9ozAeq/4l32Pf087m iBlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694536831; x=1695141631; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IkHo4j2Je41GsmDMW8omOLUe6wtjQHu4mDM0qIbuMqs=; b=Ic9yuDlyvFeQSwyryjW0O3FUqOpukwkDdN8UjA4vlI3HmTB7ijsCVpbvaJP8kiBcmg k/YvbtV6OXJsaXd8lSnkh+sTJ5/ybTk2/A/lfOCzdP6T4PpdLcp6lO0b1dG8t7jFxyua 1MtNxmlW7Xi2TbIAPDPu8LG5eevoM9kp1oQlEOH5I6sMwym2Pv3fMilqoEbkOqf4U+bw iXEMtiUCiJ+RP3RgjGJPW0KSmTsCAs0KLtWoorxYYPYdkKWyiAE8Ebn8olVgGvHI+mJU K+vpbJcI3Ih1uomzRlaZCMPdkvcDSsSO81oQVecEMg+B/LIKEWBc6CDHpbhnhpRCb2rC Xm5g== X-Gm-Message-State: AOJu0Yw9Kc3/Fz9Z2OJKuRXHim9SU2ghfIna+7NILVMn6Z7WwzOkFPTd Vw9kylxCSDvXImkak8vAnv+RyCvLnwb/C0T2ofQ= X-Received: by 2002:a05:6512:220d:b0:502:d639:22ed with SMTP id h13-20020a056512220d00b00502d63922edmr17122lfu.48.1694536830618; Tue, 12 Sep 2023 09:40:30 -0700 (PDT) MIME-Version: 1.0 References: <20230909091646.420163-1-pengdonglin@sangfor.com.cn> <20ef8441084c9d5fd54f84987afa77eed7fe148e.camel@gmail.com> In-Reply-To: From: Alexei Starovoitov Date: Tue, 12 Sep 2023 09:40:19 -0700 Message-ID: Subject: Re: [RFC PATCH v2] bpf: Using binary search to improve the performance of btf_find_by_name_kind To: Eduard Zingerman Cc: Donglin Peng , Martin KaFai Lau , Alexei Starovoitov , Song Liu , Yonghong Song , Steven Rostedt , Masami Hiramatsu , dinghui@sangfor.com.cn, huangcun@sangfor.com.cn, bpf , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 12 Sep 2023 09:40:40 -0700 (PDT) X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email On Tue, Sep 12, 2023 at 7:19=E2=80=AFAM Eduard Zingerman wrote: > > On Tue, 2023-09-12 at 16:51 +0300, Eduard Zingerman wrote: > > On Sat, 2023-09-09 at 02:16 -0700, Donglin Peng wrote: > > > Currently, we are only using the linear search method to find the typ= e id > > > by the name, which has a time complexity of O(n). This change involve= s > > > sorting the names of btf types in ascending order and using binary se= arch, > > > which has a time complexity of O(log(n)). This idea was inspired by t= he > > > following patch: > > > > > > 60443c88f3a8 ("kallsyms: Improve the performance of kallsyms_lookup_n= ame()"). > > > > > > At present, this improvement is only for searching in vmlinux's and > > > module's BTFs, and the kind should only be BTF_KIND_FUNC or BTF_KIND_= STRUCT. > > > > > > Another change is the search direction, where we search the BTF first= and > > > then its base, the type id of the first matched btf_type will be retu= rned. > > > > > > Here is a time-consuming result that finding all the type ids of 67,8= 19 kernel > > > functions in vmlinux's BTF by their names: > > > > > > Before: 17000 ms > > > After: 10 ms > > > > > > The average lookup performance has improved about 1700x at the above = scenario. > > > > > > However, this change will consume more memory, for example, 67,819 ke= rnel > > > functions will allocate about 530KB memory. > > > > Hi Donglin, > > > > I think this is a good improvement. However, I wonder, why did you > > choose to have a separate name map for each BTF kind? > > > > I did some analysis for my local testing kernel config and got such num= bers: > > - total number of BTF objects: 97350 > > - number of FUNC and STRUCT objects: 51597 > > - number of FUNC, STRUCT, UNION, ENUM, ENUM64, TYPEDEF, DATASEC objects= : 56817 > > (these are all kinds for which lookup by name might make sense) > > - number of named objects: 54246 > > - number of name collisions: > > - unique names: 53985 counts > > - 2 objects with the same name: 129 counts > > - 3 objects with the same name: 3 counts > > > > So, it appears that having a single map for all named objects makes > > sense and would also simplify the implementation, what do you think? > > Some more numbers for my config: > - 13241 types (struct, union, typedef, enum), log2 13241 =3D 13.7 > - 43575 funcs, log2 43575 =3D 15.4 > Thus, having separate map for types vs functions might save ~1.7 > search iterations. Is this a significant slowdown in practice? What do you propose to do in case of duplicates ? func and struct can have the same name, but they will have two different btf_ids. How do we store them ? Also we might add global vars to BTF. Such request came up several times. So we need to make sure our search approach scales to func, struct, vars. I don't recall whether we search any other kinds. Separate arrays for different kinds seems ok. It's a bit of code complexity, but it's not an increase in memory. With 13k structs and 43k funcs it's 56k * (4 + 4) that's 0.5 Mbyte extra memory. That's quite a bit. Anything we can do to compress it? Folks requested vmlinux BTF to be a module, so it's loaded on demand. BTF memory consumption is a concern to many. I think before we add these per-kind search arrays we better make BTF optional as a module.