Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp4044612rwi; Mon, 17 Oct 2022 00:08:39 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6f8Lkl6FniiIGi6dSOMkJTUQoH9Tb+foFHG8fwymc2JA/0xGOirniGDOPUX8kQN21F9NtV X-Received: by 2002:a17:90b:1643:b0:20c:c7c7:d598 with SMTP id il3-20020a17090b164300b0020cc7c7d598mr32028876pjb.97.1665990519354; Mon, 17 Oct 2022 00:08:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665990519; cv=none; d=google.com; s=arc-20160816; b=rWirY/oGOaoYRcVjdivHBh5hNLzsj9P680J27Tx906nS7yNYBqW0oUhhZ/tr7lQir0 w/5ZNH3Q6ocMtCQkCqs5oPEmvMt6Lg48PNiv9cLydPLou388oVo+GMZGeOmE74/tPmzW OrXwafhvkeV2xllEJ4H7y9RhtNXtZZ+sBaeBLlCK+j36vfXvP8oUiSTnYldgz8F3nsH3 SKYA0p3Bjz9R1xphXrCgGlpWDPncxkpjaPMf6KAY0ljVKsltyKrYeGdNyeRVC/B0M9JU haIJq9BYnqgi2a0Uq9gPheo0PiAm0sQr3MAQ0X578jdrfD4hfmrxOnRtnCx7GpAHwrqM QOHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=UJ4RaFcNRf3O7KVx1GQhKaki5L1mPXcRCU0jG+T33N4=; b=Cp4T8mFNIWDZaucQbhlTZu2pLXZB2Tv3GthANXzVQ3Oy+sru/iGWqMolqFcJyqnJwE kNl2wQdt/qVpx4MUe3BD1oluXNKLmZ4J99FUZO4RU6yUoSDOd6j6MfV7naBTmC+nyEBg Kcpk6aS/cGrnloA4bdJHVmW9ybWrgLge3cUfJ2ZjUr6AwxPZN96h5QyfPCihA8pzYOVa J2lZjkddLvebcPJxb+yZIGQAR0fUj46eJUleeGg0ccOXxIgv3YBRudH9N5DyCVyX1yC2 VPwdkEYK4PUrVK0EHge9torzhB5GweTo4D1edsHQ60qL7jOhvZLXZ8I/s9IANyV7Bdjv rQmQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s187-20020a632cc4000000b0045e96393e37si10945085pgs.20.2022.10.17.00.08.27; Mon, 17 Oct 2022 00:08:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230250AbiJQGxl (ORCPT + 99 others); Mon, 17 Oct 2022 02:53:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230187AbiJQGxT (ORCPT ); Mon, 17 Oct 2022 02:53:19 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A9F759261; Sun, 16 Oct 2022 23:52:11 -0700 (PDT) Received: from dggpemm500021.china.huawei.com (unknown [172.30.72.53]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4MrSJl5bwDzJn55; Mon, 17 Oct 2022 14:49:11 +0800 (CST) Received: from dggpemm500006.china.huawei.com (7.185.36.236) by dggpemm500021.china.huawei.com (7.185.36.109) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 17 Oct 2022 14:50:40 +0800 Received: from thunder-town.china.huawei.com (10.174.178.55) by dggpemm500006.china.huawei.com (7.185.36.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 17 Oct 2022 14:50:39 +0800 From: Zhen Lei To: Josh Poimboeuf , Jiri Kosina , Miroslav Benes , Petr Mladek , Joe Lawrence , , , Masahiro Yamada , Alexei Starovoitov , Jiri Olsa , Kees Cook , Andrew Morton , "Luis Chamberlain" , , "Steven Rostedt" , Ingo Molnar CC: Zhen Lei Subject: [PATCH v7 04/11] kallsyms: Add helper kallsyms_compress_symbol_name() Date: Mon, 17 Oct 2022 14:49:43 +0800 Message-ID: <20221017064950.2038-5-thunder.leizhen@huawei.com> X-Mailer: git-send-email 2.37.3.windows.1 In-Reply-To: <20221017064950.2038-1-thunder.leizhen@huawei.com> References: <20221017064950.2038-1-thunder.leizhen@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.174.178.55] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemm500006.china.huawei.com (7.185.36.236) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org To speed up the lookup of a symbol in the kernel, we'd better compress the searched symbol first and then make a quick comparison based on the compressed length and content. But the tokens in kallsyms_token_table[] have been expanded, a more complex process is required to complete the compression of a symbol. So generate kallsyms_best_token_table[] helps us to compress a symbol in the kernel using a process similar to compress_symbol(). The implementation of kallsyms_compress_symbol_name() is almost the same as that of compress_symbols() in scripts/kallsyms.c. Some minor changes have been made to reduce memory usage and improve compression performance. 1. Some entries in best_table[] are single characters, and most of them are clustered together. such as a-z, A-Z, 0-9. These individual characters are not used in the process of compressing a symbol. Let kallsyms_best_token_table[i][0] = 0x00, [i][0] = number of consecutive single characters (for exampe, a-z is 26). When [i][0] = 0x00 is encountered, we can skip to the next token with two elements. 2. Now ARRAY_SIZE(kallsyms_best_token_table) is not fixed, we store the content of best_table[] to kallsyms_best_token_table[] in reverse order. That is, the higher the frequency, the lower the index. The modifier '__maybe_unused' of kallsyms_compress_symbol_name() is temporary and will be removed in the next patch. Signed-off-by: Zhen Lei --- kernel/kallsyms.c | 80 ++++++++++++++++++++++++++++++++++++++ kernel/kallsyms_internal.h | 1 + scripts/kallsyms.c | 18 +++++++++ 3 files changed, 99 insertions(+) diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 60c20f301a6ba2c..f1fe404af184047 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -95,6 +95,86 @@ static unsigned int kallsyms_expand_symbol(unsigned int off, return off; } +static unsigned char *find_token(unsigned char *str, int len, + const unsigned char *token) +{ + int i; + + for (i = 0; i < len - 1; i++) { + if (str[i] == token[0] && str[i+1] == token[1]) + return &str[i]; + } + return NULL; +} + +static int __maybe_unused kallsyms_compress_symbol_name(const char *name, char *buf, size_t size) +{ + int i, j, n, len; + unsigned char *p1, *p2; + const unsigned char *token; + + len = strscpy(buf, name, size); + if (WARN_ON_ONCE(len <= 0)) + return 0; + + /* + * For each entry in kallsyms_best_token_table[], the storage + * format is: + * 1. For tokens that cannot be used to compress characters, the value + * at [j] is 0, and the value at [j+1] is the number of consecutive + * tokens with this feature. + * 2. For each token: the larger the token value, the higher the + * frequency, and the lower the index. + * + * ------------------------------- + * | j | [j] [j+1] | token | + * -----|---------------|---------| + * | 0 | ?? ?? | 255 | + * | 2 | ?? ?? | 254 | + * | ... | ?? ?? | ... | + * | n-2 | ?? ?? | x | + * | n | 00 len | x-1 | + * | n+2 | ?? ?? | x-1-len | + * above '??' is non-zero + */ + for (i = 255, j = 0; i >= 0; i--, j += 2) { + if (!kallsyms_best_token_table[j]) { + i -= kallsyms_best_token_table[j + 1]; + if (i < 0) + break; + j += 2; + } + token = &kallsyms_best_token_table[j]; + + p1 = buf; + + /* find the token on the symbol */ + p2 = find_token(p1, len, token); + if (!p2) + continue; + + n = len; + + do { + *p2 = i; + p2++; + n -= (p2 - p1); + memmove(p2, p2 + 1, n); + p1 = p2; + len--; + + if (n < 2) + break; + + /* find the token on the symbol */ + p2 = find_token(p1, n, token); + + } while (p2); + } + + return len; +} + /* * Get symbol type information. This is encoded as a single char at the * beginning of the symbol name. diff --git a/kernel/kallsyms_internal.h b/kernel/kallsyms_internal.h index 2d0c6f2f0243a28..d9672ede8cfc215 100644 --- a/kernel/kallsyms_internal.h +++ b/kernel/kallsyms_internal.h @@ -26,5 +26,6 @@ extern const char kallsyms_token_table[] __weak; extern const u16 kallsyms_token_index[] __weak; extern const unsigned int kallsyms_markers[] __weak; +extern const unsigned char kallsyms_best_token_table[] __weak; #endif // LINUX_KALLSYMS_INTERNAL_H_ diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c index 60686094f665164..9864ce5e6c5bfc1 100644 --- a/scripts/kallsyms.c +++ b/scripts/kallsyms.c @@ -548,6 +548,24 @@ static void write_src(void) for (i = 0; i < 256; i++) printf("\t.short\t%d\n", best_idx[i]); printf("\n"); + + output_label("kallsyms_best_token_table"); + for (i = 255, k = 0; (int)i >= 0; i--) { + if (best_table_len[i] <= 1) { + k++; + continue; + } + + if (k) { + printf("\t.byte 0x00, 0x%02x\n", k); + k = 0; + } + + printf("\t.byte 0x%02x, 0x%02x\n", best_table[i][0], best_table[i][1]); + } + if (k) + printf("\t.byte 0x00, 0x%02x\n", k); + printf("\n"); } -- 2.25.1