Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp3498597pxv; Sun, 4 Jul 2021 21:37:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyu6BFtk/1nKutJ9YLqkY6AFdAOXAcPR6sAcgtObyF4WPrK3+f/oilpNWdDZB7ym5XPKJ6q X-Received: by 2002:a17:906:7009:: with SMTP id n9mr11794704ejj.66.1625459866249; Sun, 04 Jul 2021 21:37:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625459866; cv=none; d=google.com; s=arc-20160816; b=yo0q1qx6SjXboX4eS1nQYf+Q3h075QyVxyD96bB+Nau5/mE+kcklAM5DVo298TF5Q4 Nj0L58BT6uEFa5YkL49paqrkBtPnYXOKabH7eI+zJsou1NrLtBpRgqSC5dtdc/YLtIwH VRL+BwB0SDHF09R7Yj+tBM1wHHHFpoUxj5/IW9Z9aEfi4l9f7UslxvT5kzfXXCrgukyE Gwugbnl9qST6EUifnVLVCVJlLTSn8msadoF1JYcp20tiSh3x+uykyEZzAY2kOecwOL13 AOoo0Y7WTEwEyY0KIeBQR8f8kcjH8L6QL5EOkSPpuCQRopFDvby+9EwIVJrHy622AHIh ZwFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=5rGdjWrrPePsKYslSI3tkE5Qol5hViZ2Dj3nPjYnK+E=; b=UpH0jPBqMdakrJcgJQAe/eZqoYyK6QmoR1hE6bFFQmILPCtSlg0xKwCPVA2EBMqaMn Jy+F/tjap4xRb8Rm/9SGLt4+8KKu39CVfGzgtTFY0R6udCVchDSaScX9J4HBcoqpRI5a hntTlN1J2HKVHrVWh4rs1gFjjaOSPQbgUHUGt2a16lfDdLd4oS8H312GnE1znV3xz6OB 2IfOtbQS6rfwBhF5eM2DVpsjMU92CaDoRwQmlGAvrPGCSiByZRpjyNEgLsP7Pxw6DSjy /h/532HsDSKHXLzQGqgnz5MYMvIvRX3vbBtce0V2drZA6UYPDelH5pFzM02UY7USQo6U vxIQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k21si10312408edq.85.2021.07.04.21.37.22; Sun, 04 Jul 2021 21:37:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229686AbhGEEiw (ORCPT + 99 others); Mon, 5 Jul 2021 00:38:52 -0400 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:57221 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229447AbhGEEiv (ORCPT ); Mon, 5 Jul 2021 00:38:51 -0400 Received: (from willy@localhost) by pcw.home.local (8.15.2/8.15.2/Submit) id 1654ZWv9030978; Mon, 5 Jul 2021 06:35:32 +0200 Date: Mon, 5 Jul 2021 06:35:32 +0200 From: Willy Tarreau To: Gary Guo Cc: Matthew Wilcox , ojeda@kernel.org, Linus Torvalds , Greg Kroah-Hartman , rust-for-linux@vger.kernel.org, linux-kbuild@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Alex Gaynor , Geoffrey Thomas , Finn Behrens , Adam Bratschi-Kaye , Wedson Almeida Filho Subject: Re: [PATCH 01/17] kallsyms: support big kernel symbols (2-byte lengths) Message-ID: <20210705043532.GA30964@1wt.eu> References: <20210704202756.29107-1-ojeda@kernel.org> <20210704202756.29107-2-ojeda@kernel.org> <20210704222043.000026b3@garyguo.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210704222043.000026b3@garyguo.net> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jul 04, 2021 at 10:20:43PM +0100, Gary Guo wrote: > On Sun, 4 Jul 2021 22:04:49 +0100 > Matthew Wilcox wrote: > > > On Sun, Jul 04, 2021 at 10:27:40PM +0200, ojeda@kernel.org wrote: > > > From: Miguel Ojeda > > > > > > Rust symbols can become quite long due to namespacing introduced > > > by modules, types, traits, generics, etc. > > > > > > Increasing to 255 is not enough in some cases, and therefore > > > we need to introduce 2-byte lengths to the symbol table. We call > > > these "big" symbols. > > > > > > In order to avoid increasing all lengths to 2 bytes (since most > > > of them only require 1 byte, including many Rust ones), we use > > > length zero to mark "big" symbols in the table. > > > > What happened to my suggestion from last time of encoding symbols < > > 128 as 0-127 and symbols larger than that as (data[0] - 128) * 256 + > > data[1]) ? > > Yeah, I agree ULEB128 or similar encoding scheme would be better than > using 0 as an escape byte. If ULEB128 is used and we restrict number of > bytes to 2, it will encode numbers up to 2**14 instead of 2**16 like the > current scheme, but that should be sufficient anyway. Actually plenty of variants of such encodings exist. You can split the first byte on 192 to keep 6 upper bits, 224 for 5, 240 for 4, etc. It all depends how long the maximum string is expected to be and how often we expect to see large strings. For example when splitting around 240, all sizes from 0 to 239 take one byte, and sizes from 240 to 4335 take two bytes. But if strings >128 are already extremely rare we don't really care about the extra byte needed to encode them. Willy