Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp628450pxk; Wed, 16 Sep 2020 12:41:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyM0XuqdEz3mjzWLDh/sC9idj+g3TZ5KUINWWQZ4rmLtbCCFVbR7O7j8j071T91wfVVcqsd X-Received: by 2002:aa7:dd8d:: with SMTP id g13mr29974606edv.324.1600285270143; Wed, 16 Sep 2020 12:41:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600285270; cv=none; d=google.com; s=arc-20160816; b=bKZF+hDx25CN3txh192tII81uSv2W4WGSAFXegp3JOMkTFJ77GbZCwWRB8HS4ZjaWv dcJRPvLiwfehJCExMeLD5Zj2fKOQEWkuCWdBbbA1jq8Cmhz+dnf0+TYIKc3U9T2D6yJh BkUM86xcBA5UJF+++Zpp4cyVcqnN2uqnMUccbwLJAv2WEdSenN4Eut+inPUpaMH6D7Iq A+zebFHVoOKq4TbpbA781R3Ogj/z3LSj5gLplDAVX6BBMnhwJcxtN5PO8Cuqg2MeE+XU bKbDsC49LyqybIEVRIKsGTaVgVpIpdPCJjafktKVSy4Pxv8J1WZJ/D65IYQs2pdWY04S A6Hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=EZp5toX9T1gQJ7KRJFQSQsUwLhYGKZg5A/ENVGfokLk=; b=srM0O5rSiq5aAVp4Hph319u2NUUTr4ppXYeC29yKtysyUAlp4ZWgrab6ov+Mi/GsPV EoU//umKxYPOhVcexdYoglVwz0a6NEcGNmqLQtQgmwlB0IIbhfaZL3uKNiDO3rCveG/j qPgqe07soMAX166w2SPMpk7HAiQdie++XNGtyhI/BUBm5BvvFGfY9ANcxvRxmfiKw++L pl1GjJdUkHQ6Cv72c1PgZ0bWzFsZnnBsjmISU6aTZI3MyuZDjxCwpeGKBsJWDTh3A5AZ 04/N1NDDZazEV8hO0+q4Spmovanuvn+grSad9B3myQL0Uzu2T9eY3bH76CqKrCexTxp9 HSxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rasmusvillemoes.dk header.s=google header.b=detMbyNf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f2si12284201ejq.262.2020.09.16.12.40.47; Wed, 16 Sep 2020 12:41:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@rasmusvillemoes.dk header.s=google header.b=detMbyNf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727525AbgIPTfg (ORCPT + 99 others); Wed, 16 Sep 2020 15:35:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727287AbgIPTby (ORCPT ); Wed, 16 Sep 2020 15:31:54 -0400 Received: from mail-ej1-x643.google.com (mail-ej1-x643.google.com [IPv6:2a00:1450:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BDD4C06174A for ; Wed, 16 Sep 2020 12:31:13 -0700 (PDT) Received: by mail-ej1-x643.google.com with SMTP id gr14so12436264ejb.1 for ; Wed, 16 Sep 2020 12:31:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rasmusvillemoes.dk; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=EZp5toX9T1gQJ7KRJFQSQsUwLhYGKZg5A/ENVGfokLk=; b=detMbyNfda+sMtEgJ8wbC4HrMzYk3BcgRMdbhpVeAGepbF+kat9yNAUGo/3Y5oYxAx JcIhsomGq6FmSan401dR/dMwAyJEwXSOZ8yvdg4oon4aO0l550Fkx0JPETQjkQtJ/vDj bAcprtRQGkkCRh9W7AnYhAjPcDLnvGJlcNA2w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=EZp5toX9T1gQJ7KRJFQSQsUwLhYGKZg5A/ENVGfokLk=; b=S7QQB0gFukWqd5QZKD1/HdjSkjoKk2xOWbiKw3UaTl/48kWVTs3RnpAuc0Hc/Ef0k/ zQc2jG6ZVVmOI5zWgk6gXUg4pAFVVTVV3ej8nniK0xpaU58Qc60lrZeO7SmEirjALMxk JaLBapQ8rdnhbtwWVkiqOO/tFto732yiz+WFrJaXR3BWIHXWuDN7O8acB7/itGE0Q8z4 Su9aC62ZXrXc1jru9ILC04f6CSXewTyBHZ51OS714how2fCFzm9XlnwB3JtXTkOIFKs3 ZCXvu1X+Tb7oCJtmYATR4INq4vkfWiFcG8eAVdeDnJdhaUmQeyNxZ3zAbDc0pvlY+ZlP WioQ== X-Gm-Message-State: AOAM533AI/m32VGrkXkfD1Xwp+m8mf04JKBCL/O10zCcMTDp/ZrscuwY 9wMqt9OXxwpSkM8Cjmz5h4Xnh2K7SM6jDXVQ X-Received: by 2002:a17:906:8258:: with SMTP id f24mr26292491ejx.551.1600284671213; Wed, 16 Sep 2020 12:31:11 -0700 (PDT) Received: from [192.168.1.149] (5.186.115.188.cgn.fibianet.dk. [5.186.115.188]) by smtp.gmail.com with ESMTPSA id r16sm14987362edc.57.2020.09.16.12.31.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 16 Sep 2020 12:31:10 -0700 (PDT) Subject: Re: [PATCH] scripts/setlocalversion: make git describe output more reliable To: Masahiro Yamada Cc: Brian Norris , Randy Dunlap , Guenter Roeck , Bhaskar Chowdhury , Linux Kernel Mailing List References: <20200910112701.13853-1-linux@rasmusvillemoes.dk> From: Rasmus Villemoes Message-ID: <0873059d-2912-3ef2-bab1-9db0cbe904a0@rasmusvillemoes.dk> Date: Wed, 16 Sep 2020 21:31:09 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 16/09/2020 20.01, Masahiro Yamada wrote: > On Thu, Sep 17, 2020 at 12:23 AM Rasmus Villemoes > wrote: >> >> On 16/09/2020 16.28, Masahiro Yamada wrote: >>> On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes >>> wrote: >>>> >>> Why do we care about the uniqueness of the abbreviated >>> hash in the whole git history? >> >> Because when I ask a customer "what kernel are you running", and they >> tell me "4.19.45-rt67-00567-123abc8", I prefer that 123abc8 uniquely >> identifies the right commit, instead of me having to dig around each of >> the commits starting with that prefix and see which one of them matches >> "is exactly 567 commits from 4.19.45-rt67". 7 hexchars is nowhere near >> enough for that today, which is why Linus himself did that "auto-tune >> abbrev based on repo size" patch for git.git. > > I like: > > git rev-parse --verify HEAD 2>/dev/null | cut -c1-15 > > better than: > > git rev-parse --verify --short=15 HEAD 2>/dev/null > > The former produces the deterministic kernelrelease string. > > > But, let's reconsider if we need as long as 15-digits. > > There are a couple of kinds of objects in git: commit, tree, blob. > > I think you came up with 15-digits to ensure the uniqueness > in _all_ kinds of objects. > > But, when your customer says "4.19.45-rt67-00567-123abc8", > 123abc8 is apparently a commit instead of a tree or a blob. > > > > In the context of the kernel version, we can exclude > tree and blob objects. > > In other words, I think "grows ~60000 objects per release" > is somewhat over-estimating. > > We have ~15000 commits per release. So, the difference > is just 4x factor, though... > > > > BTW, I did some experiments, and I noticed > "git log" accepts a shorter hash > than "git show" does presumably because > "git log" only takes a commit. > > > > For example, "git show 06a0d" fails, but > "git log 06a0d" works. > > > masahiro@oscar:~/ref/linux$ git show 06a0d > error: short SHA1 06a0d is ambiguous > hint: The candidates are: > hint: 06a0df4d1b8b commit 2020-09-08 - fbcon: remove now unusued > 'softback_lines' cursor() argument > hint: 06a0d81911b3 tree > hint: 06a0dc5a84d2 tree > hint: 06a0d1947c77 blob > hint: 06a0df434249 blob > fatal: ambiguous argument '06a0d': unknown revision or path not in the > working tree. > Use '--' to separate paths from revisions, like this: > 'git [...] -- [...]' > masahiro@oscar:~/ref/linux$ git log --oneline -1 06a0d > 06a0df4d1b8b fbcon: remove now unusued 'softback_lines' cursor() argument > > > > > What is interesting is, if you prepend -- > to the abbreviated hash, "git show" accepts as short as a commit > "git log" does. > > masahiro@oscar:~/ref/linux$ git show v5.9-rc5-00002-g06a0d | head -1 > commit 06a0df4d1b8b13b551668e47b11fd7629033b7df > > > I guess git cleverly understands it refers to a commit object > since "v5.9-rc5-00002-" prefix only makes sense > in the commit context. > Well, yes, but unfortunately only so far as applying the same "the user means a commit object" logic; git doesn't do anything to actually use the prefix to disambiguate. In fact, looking at a semi-random commit in 4.19-stable v4.19.114-7-g236c445eb352: $ git show 236c44 error: short SHA1 236c44 is ambiguous hint: The candidates are: hint: 236c445eb352 commit 2020-03-13 - drm/bochs: downgrade pci_request_region failure from error to warning hint: 236c448cb6e7 commit 2011-09-08 - usbmon vs. tcpdump: fix dropped packet count hint: 236c445b1930 blob Now you're right that we get $ git show v4.19.114-7-g236c445 | head -n1 commit 236c445eb3529aa7c976f8812513c3cb26d77e27 so it knows we're not looking at that blob. But "git show v4.19.114-7-g236c44" still fails because there are two commits. Adding to the fun is that "git show v4.19.114-7-g236c448" actually shows the usbmon commit, which is old v3.2 stuff and certainly not a descendant of v4.19.114. I didn't actually know that git would even accept those prefixed strings as possible refspecs, and for a moment you had me hoping that git would actually do the disambiguation using that prefix, which would certainly make 7 hex chars enough; unfortunately that's not the case. > Keeping those above in mind, I believe 15-digits is too long. Well, as you indicated, commits are probably ~1/4 of all objects, i.e. half a hexchar worth of bits. So the commit/object distinction shouldn't matter very much for the choice of suitable fixed length. > So, I propose two options. > > > [1] 7 digits > > The abbreviated hash part may not uniquely identify > the commit. In that case, you need some extra git > operations to find out which one is it. > > As for the kernel build, > --<7-digits> is enough > to provide the unique kernelrelease string. > > > > [2] 12 digits > > This matches to the Fixes: tag policy specified in > Documentation/process/submitting-patches.rst > > The abbreviated hash part is very likely unique > in the commit object space. > (Of course, it is impossible to guarantee the uniqueness) I can live with 12. It would be extremely rare that I would have to do some extra git yoga to figure out which commit they actually mean. I think we can still use git describe to do the bulk of the work (the 'git rev-parse | cut ... is easy, but it's somewhat tedious to find the closest tag, then compute the #commits-on-top part), then just have the awk script used for pretty-printing (the %05d of the #commits-on-top) can probably also be used to chop the abbreviated sha1 at 12 chars, should git have needed to make it longer. Please see below for an updated patch+commit log. >> Alternatively, would you consider a Kconfig knob, LOCALVERSION_ABBREV? > > > No, I do not think LOCALVERSION_ABBREV is a good idea. Neither do I; I considered it before sending the patch but decided that yes: > We should eliminate this problem > irrespective of the kernel configuration. Rasmus ==== something like this, then? scripts/setlocalversion: make git describe output more reliable When building for an embedded target using Yocto, we're sometimes observing that the version string that gets built into vmlinux (and thus what uname -a reports) differs from the path under /lib/modules/ where modules get installed in the rootfs, but only in the length of the -gabc123def suffix. Hence modprobe always fails. The problem is that Yocto has the concept of "sstate" (shared state), which allows different developers/buildbots/etc. to share build artifacts, based on a hash of all the metadata that went into building that artifact - and that metadata includes all dependencies (e.g. the compiler used etc.). That normally works quite well; usually a clean build (without using any sstate cache) done by one developer ends up being binary identical to a build done on another host. However, one thing that can cause two developers to end up with different builds [and thus make one's vmlinux package incompatible with the other's kernel-dev package], which is not captured by the metadata hashing, is this `git describe`: The output of that can be affected by (1) git version: before 2.11 git defaulted to a minimum of 7, since 2.11 (git.git commit e6c587) the default is dynamic based on the number of objects in the repo (2) hence even if both run the same git version, the output can differ based on how many remotes are being tracked (or just lots of local development branches or plain old garbage) (3) and of course somebody could have a core.abbrev config setting in ~/.gitconfig So in order to avoid `uname -a` output relying on such random details of the build environment which are rather hard to ensure are consistent between developers and buildbots, make sure the abbreviated sha1 always consists of exactly 12 hex characters. That is consistent with the current rule for -stable patches, and is almost always enough to identify the head commit unambigously - in the few cases where it does not, the v5.4.3-00021- prefix would certainly nail it down. diff --git a/scripts/setlocalversion b/scripts/setlocalversion index 20f2efd57b11..b51072167df1 100755 --- a/scripts/setlocalversion +++ b/scripts/setlocalversion @@ -45,7 +45,7 @@ scm_version() # Check for git and a git repo. if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then + head=$(git rev-parse --verify HEAD 2>/dev/null); then # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore # it, because this version is defined in the top level Makefile. @@ -59,11 +59,22 @@ scm_version() fi # If we are past a tagged commit (like # "v2.6.30-rc5-302-g72357d5"), we pretty print it. - if atag="$(git describe 2>/dev/null)"; then - echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' - - # If we don't have a tag at all we print -g{commitish}. + # + # Ensure the abbreviated sha1 has exactly 12 + # hex characters, to make the output + # independent of git version, local + # core.abbrev settings and/or total number of + # objects in the current repository - passing + # --abbrev=12 ensures a minimum of 12, and the + # awk substr() then picks the 'g' and first 12 + # hex chars. + if atag="$(git describe --abbrev=12 2>/dev/null)"; then + echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),substr($(NF),0,13))}' + + # If we don't have a tag at all we print -g{commitish}, + # again using exactly 12 hex chars. else + head="$(echo $head | cut -c1-12)" printf '%s%s' -g $head fi fi